In the 1000 Genomes, there is a Punjabi dataset. Here is the description:
These cell lines and DNA samples were prepared from blood samples collected in Lahore, Pakistan. The samples are from a mix of parent- adult child trios and unrelated individuals who identified themselves and their parents as Punjabi.
A few years ago I did an analysis of the population structure in the 1000 Genomes dataset. In the Chinese data, there seemed to be some curious structure (there were two clusters of South Chinese). But the biggest issues predictably were in the South Asians. To give concrete examples, there were a few Brahmins in the Telugu data. A subset of Tamils and Telugus were highly ASI shifted. The Gujurati were highly heterogeneous, and one subcluster were almost certainly Patels (the samples were collected in Houston). The ASI shifted groups were almost certainly Scheduled Castes (Dalits) because I could see that they clustered with those samples from Estonian Biocentre dataset.
There was something curious about the samples from Pakistan and Bangladesh. Aside from a small number of individuals, whose samples were collected at the same time judging by their IDs (these individuals cluster with Scheduled Castes), the Bangladeshi sample didn’t have much South Asian style structure. That is, there wasn’t a cline or lots substructure within the ethnicity.
As noted by some commenters, the Punjabi samples were very different. Like the Gujurati samples, there was a huge variance along the ANI-ASI cline. To me, this was somewhat surprising. To make the 1000 Genomes more useful I used PCA and divided both Gujuratis and Punjabis into groups based on their position on the ANI-ASI cline. So that ANI_1 is the subpopulation with the most ANI and ANI_4 the least.
Using Treemix produced some weird results. As you can see above Punjabi_ANI_1 looks like an Iranian population with gene flow from Punjabi_ANI_3. Punjabi ANI_2 looks like a North Indian population with Iranian gene flow (so it is more ASI). Punjabi_ANI_3 are less ANI shifted than Uttar Pradesh Brahmins, but more than Uttar Pradesh Kshatriya. Finally, Punjabi_ANI_4 actually is very similar to Punjabi_ANI_2, except it has gene flow from a Dalit-like population.
With the South Asian Genotype Project I have a few Punjabi samples. All of them are within Punjabi_ANI_1.
I don’t know what’s going on here. Is this really caste-like structure in Punjab? Or are we see lots of admixture of people who are called “Punjabi” today? For example, the gene flow edges suggest lots of mixing between quite South Asian types of groups and an Iranian sort. Perhaps this is the absorption of Pathans into South Asian groups? Could it be Muhajir people who mixed with local Punjabis and identified as such?
I was curious to see if I could find something similar in relation to the three Jatts. As you can see with Treemix, no. Jatts are just very ANI-shifted. I added Lithuanians and Georgians, and you can see that Uttar Pradesh Brahmins get gene flow from a Lithuanian shifted group, while South Indian Brahmins have a more Georgian gene flow. This is just an artifact I suspect of the fact that South Indian Brahmins have a lot of admixture from non-Brahmin South Indians, who are more Georgian than Lithuanian (Iran_N as opposed to Yamnaya).
Finally, going back to the Bengali (Bangladeshi) vs. Punjabi contrast, it is really interesting. If Punjab has such deep caste-like structures it really goes to show how within South Asia caste is a very very powerful institution, and ~1,000 years of Muslim rule and in western Punjab a majority Muslim population did not break down the institution. In contrast, in Bangladesh, there doesn’t seem to be much caste structure. I am routinely the most East Asian shifted Bengali in datasets, but my family is also from the eastern edge of eastern Bengal. Why the difference?
in The Rise of Islam and the Bengal Frontier the author posits that the Islamicization of eastern Bengal was to a great extent the function of the opening up of lands for cultivation under the supervision of Muslim elites under the rule of Afghans and later Mughals. This would explain the lack of caste structure because presumably, caste structure would be difficult to maintain in a frontier landscape, where the cultural elite does not promote or accept caste (though the elite West Asian Muslims were racially exclusive, they were also a very small minority).
In contrast, the Punjab has long been settled by Indo-Aryan peoples, and despite its long history of Islam, it was not recently a frontier society.
Anyway, that’s all I got to say for that. I’m sure readers will have more insight on this pattern than I do….