In order to clarify the phylogeny of haplogroups M, N and R in South Asia, we focused our study on the lineages with recognized or potential likely origin in the Subcontinent, belonging to macrohaplogroups M (M2, M3, M4’67, M5, M6, M13’46’61, M31, M32’56, M33, M34’57, M35, M36, M39, M40, M41, M42b, M44, M48, M49, M50, M52, M53, M58, M62), R (R5, R6, R7, R8, R30 and R31) and N (N1’5).We also studied U2 (excluding U2e due to its West Eurasian origin) in a complementary analysis.We combined these with other published data from South Asia and neighbouring areas, including a total of 1478 samples (Additional file 1: Table S1).

In addition, we generated 13 new sequences (accession numbers: KY686204 -KY686216) belonging to the aforementioned haplogroups from Southeast Asia: seven from Myanmar, one from Vietnam, one from Thailand and four from Indonesia.This was part of a much wider process of Indo-European expansion, with an ultimate source in the Pontic-Caspian region, which carried closely related Y-chromosome lineages, a smaller fraction of autosomal genome-wide variation and an even smaller fraction of mitogenomes across a vast swathe of Eurasia between 5 and 3.5 ka.Following the out-of-Africa (OOA) migration, South Asia (or the Indian Subcontinent, here comprising India, Pakistan, Bangladesh, Sri Lanka, Nepal and Bhutan) was probably one of the earliest corridors of dispersal taken by anatomically modern humans (AMH) [1,2,3].India is a patchwork of tribal and non-tribal populations that speak many different languages from various language families.Indo-European, spoken across northern and central India, and also in Pakistan and Bangladesh, has been frequently connected to the so-called “Indo-Aryan invasions” from Central Asia ~3.5 ka and the establishment of the caste system, but the extent of immigration at this time remains extremely controversial.

