SNP confusion

A single nucleotide polymorphism is a nucleotide that is variable – not everyone has the same one at that site.  To be more precise, it’s one that has significant variation – at least a few percent of people (in the population you’re studying) have something other than the most common nucleotide at that site.  Generally, the most common nucleotide is the ancestral one.

You see such a variable site every few hundred nucleotides.  Checking out the SNPs is less informative than sequencing the whole genome, but it’s much easier.

Most of these  SNPs are in neutral parts of the genome and don’t do anything, one way or the other. If you’re investigating ancient population splits and such, this is useful, easier to analyze: neutral means unloaded dice, adaptive evolution means dice loaded in unpredictable ways.

In order to pick a set of SNPs, you look at members of some population and see which sites in the genome are sufficiently variable.  They may still be good SNPs in another population, as  long as that population is sufficiently similar – that is, shares most of the same drift history.

On the other hand, if the two populations have been separated for a long, long time,  they won’t have the same SNPS, at least not entirely: sites that are variable in population 1, the population  you used to pick your SNPS, may be fixed in the second population, while sites that are essentially fixed in the first population may be SNPs in population 2.  The long the separation, the more this happens.

Some years ago, some friends of mine looked at how ancestral (on average) SNPs were in different populations.


The average degree of ancestralness was almost exactly the same in Eurasian populations, but was noticeably higher in typical farming populations of sub-Saharan Africa such as the Yoruba, and was higher still in African hunter-gatherers such as the Pygmies and Bushmen.

We found this puzzling, and it took a while to figure it out. The root cause turned out to be  that the drift histories of Eurasians were pretty much all the same, but the drift history of the Yoruba was different enough that they had somewhat different SNPs than Eurasians, while the Bushmen & Pygmies had split off even earlier and had an even more different drift history.

This point is technical, even boring once you figure out the answer, but add some misunderstanding&ideology  and the story gets hilarious.

Now this following story I mostly got second hand, so take it with a grain of salt. But I think it’s basically accurate.

You see, somewhat earlier, people working with the same datasets noticed the same phenomenon, and instead of puzzling through it, they panicked.  They thought that those higher levels of ancestrality in Africans meant that Africans were genetically closer to chimpanzees  than other humans – which is not the case.  One of them came up with the memorable phrase ‘chimp index‘ for average ancestrality.

Around the same time, there was a lot of odd activity at NIH, and I strongly suspect that activity was connected to the chimp index crisis. Even at the time, before we looked at this problem, I thought that something strange was going on. They called a special meeting, where those receiving NIH grants in human genetics were told to be careful to make sure that reporters didn’t get the wrong idea about genetic differences between populations.  Marc Feldman proposed organizing ‘flying squads’ to rush out and counteract such misunderstandings – which inspired me to design a shoulder patch for those flying squads:


Eventually someone who knew some theory must have figured it out and explained it, which would have calmed things down.  But for a while, the powers that be at NIH must have been shitting in their pants.  How awful.

  1. AnonymousCoward says:

    Y-DNA hgs A and B, on the other hand, are probably not a misunderstanding, but markers of sub-Saharan-African admixture with “Archaic 3”. A hominid so touchy that there’s no good consensus on what to name it.

    One might think that this was at least as much reason for panic as the chimp-index. But I suppose the admixture is ameliorated by the reciprocal admixture of Eurasians with “cavemen”. So nobody can misunderstand that Archaic 3 admixture in SSA makes Africans less human.

    But I suspect that if we saw pictures of Neanderthals next to pictures of… whatever Archaic 3 looked like, that alone would be plenty to send some people through a spell of the vapours. They probably were plucked straight out of Lovecraft’s early nightmares.

    • dearieme says:

      From Oliver and Fage’s A Short History of Africa (4/e 1972):

      “There is no need to postulate a war to the death in which rival ‘species [of man] were extinguished. Probably the earliest example of the ‘sapient’ type of man, being more successful, multiplied faster and so were able to absorb the representatives of divergently specialised types, such as the Neandertal type.
      In the west-central forested regions [of Africa] the triumph of Homo sapiens over slightly different pre-existent strains had perhaps already produced a distinct negroid type.”

      So there’s a name for Archaic 3: “slightly different pre-existent strain”.

    • bb753 says:

      Are we talking seriouly archaic like Homo Rhodesiensis or later species like H. Sapiens Omo or Idaltu?

      • Matt says:

        As far as I know, we only have the Iwo Eleru calvarium – Dated 11,800 BP (that’s 8000 BC!).

        The morphometric analysis (scroll down in the link) places IE around the “upper” end of Neanderthal variation along a shape based PCA dimension (PC1) separating at the maximum points Erectus and Anatomically Modern Humans.

        Neanderthals are generally intermediate, but closer to Erectus, like Heidelbergensis, plus their own distinctions (certain kinds of bulging of a basically Erectus like braincase, I think) that wouldn’t show up as well on that PCA, dominated in the first PC by archaic vs modern and the second PC by the largest dimensions of common intra-species variance, both within modern human populations and to a lesser extent between Andaman Islanders vs Oceanian, Khoi San, African and European Upper Paleolithic (reflecting the short, broad crania of Andamanese vs the longer and narrower crania of Upper Paleolithic people, Africans and Oceanians – “antero-posteriorly short and medio-laterally wide shapes” vs “antero-posteriorly long and medio-laterally narrow vaults”).

        How representative IE is and of who exactly, we don’t know.

  2. Whyvert says:

    Current head of the NIH, Francis Collins, says they won’t be funding crispr embryo gene editing. That distasteful knowledge is being left to China. Collins’s previous job, ironically, was boss of the Human Genome Project.
    I for one hail our new Chinese overlords.

  3. The Chinese now have the keys to the castle, particularly if they crack the genetics of very high intelligence from the samples sent to them from across the world. Supremacy of Ming dynasty proportions.

    • Yeah, I was smart enough to get in, but too stupid to notice that. An excellent illustration that candlepower and wisdom are not synonymous. And now that I have the sucker downloaded to my computer, I still don’t know how to make good use of it myself. That’s likely just laziness, though.

  4. Matt says:

    As an index of “Is this population’s drift history similar to Orcadian?” that graphic has some interest / strangeness – Balochi is further from Orcadian on its measure than Han, for ex, Native Americans are closer than Druze. Strange.

    • gcochran9 says:

      There’s a fair amount of sub-Saharan African ancestry in the Balochi. But I wouldn’t necessarily take tiny differences too seriously. Although the Papuans are a bit different – that could be their Denisovan component talking.

  5. TWS says:

    Don’t worry. Superior ability breeds superior ambition. I’m sure the Chinese overlords will be duking it out amongst themselves in no time. The survivors (some ninety of them) will take the first interstellar space craft and name it ‘Botany Bay’ only to be found hundreds of years later in suspended animation.

  6. TWS says:

    Can you imagine being one of those faceless NIH guys biting his nails every morning worrying about what genetics might tell us and how they would have to spin it? It must be like the guys on Hybrasil singing while the island sank.

  7. harpend says:

    We had a post about this funny meeting several years ago here:
    including my letter of invitation.

    (Incidentally does anyone know how to put proper links in blog comments?)

    We did come up with a back-of-envelope grade explanation for why the average frequency of the ancestral snp was about 64%. In a single population of constant size over many generations under the infinite sites model (every mutation occurs at a new locus) the relative frequency of derived mutations at frequency p should be proportional to 1/p. In a sample of n chromosomes the number of singletons (p=1/n) should be twice the number of doubleton SNPS (p=2/n) and so on. Now make the simple assumption that the probability that a locus was “discovered” and put on a chip was proportional to heterozygosity at that locus, i.e. proportional to p(1-p). The product of the underlying spectrum, proportional to 1/p and the ascertainment probability p(1-p) is simply proportional to 1-p, and the mean of this distribution is 1/3 implying the mean ancestrally should be 2/3, close to the observed 64% outside Africa. The conditions of the coalescent model are far far from true but it suggests that the observed 64% is not at all implausible.

    • j says:

      The San seem to have 72% ancestral SNP and not 64% as the rest of Homo sapiens, Why is that? May be the question should not be asked in polite society. Not that any San will be reading this comment and feel offended.

      • j says:

        Why did the San drift less than others? Was it chance?

      • harpend says:

        No, not at all, but your reaction was widely shared and feared in the genetics community. Let me give a toy example of what must have happened.

        Imagine that in a sample of humanity we can find 10 SNP loci, A through J. It turns out that we make a chip by collecting DNA from a sample of Cambridge University lab techs. In that sample we find A through G but not H, I, and J because they are unique to Bushmen but are not present in the lab tech sample. The Bushmen lack A,B, and C because these occurred in the ancestry of the lab techs after the separation of Bushman ancestors from the rest of humanity.

        We then use our new chip to survey world populations. When we use this chip on Bushman populations we find that they have fewer SNPs and it seems that they are more ancestral. But we completely missed their own unique SNPs H, I, and J because they are not on the chip! This is called ascertainment bias.

        Pretty obvious after you get it but it sure puzzled folks for a while.

        • Ilya says:

          Thanks for the explanation, Prof. Harpending!

        • epoch says:

          That is very interesting. Bushmen are a very small population. With just a slightly different history one could imagine they could have been wiped away entirely, just like the mystical Negrito’s of Taiwan. Makes me wonder how much genetic gems – like y-dna A00 – are hiding in extremely small populations. And how much is lost forever.

  8. austmann says:

    So basically you’re saying you focused too much on SNP’s in your book, and forgot the polygenetic side of the story. Don’t know shit about the mathematics, just wondering;)

  9. Toddy Cat says:

    “They thought that those higher levels of ancestrality in Africans meant that Africans were genetically closer to chimpanzees than other humans”

    Lots of lefties secretly believe that something like this is true, and they are terrified that it will be confirmed, which is part of why they hate HBD so much. They are terrified that their secret fears are true. Meanwhile, lots of HBD’ers go around bragging about how much Neanderthal ancestry they have. Pretty funny, actually.

    • MawBTS says:

      “For my own part I would as soon be descended from that heroic little monkey […] as from a savage who delights to torture his enemies, offers up bloody sacrifices, practices infanticide without remorse, treats his wives like slaves, knows no decency, and is haunted by the grossest superstitions.” – Charles Darwin

      • TWS says:

        Darwin didn’t know enough about Chimps. They’re nasty pieces of work.

        • ursiform says:

          Which helps explain the “savage who delights to torture his enemies, offers up bloody sacrifices, practices infanticide without remorse, treats his wives like slaves, [and] knows no decency”. Perhaps what defines humans is to be “haunted by the grossest superstitions”. Or do chimps have superstitions?

        • RBaldini says:

          Darwin probably wasn’t referring to chimps here (he was a better naturalist than to call a chimp a monkey!). He had quite a penchant for telling stories about how Professor So-and-so had observed a particular monkey do something unusual. Very anecdotal stuff. I suspect he was referring to one of these stories. Maybe this one: “Rengger observed an American monkey (a Cebus) carefully driving away the flies which plagued her infant.”

    • Jim says:

      It’s hard to know what people really believe about all this since they are unlikely to candidly discuss it. Sort of like visiting North Korea and asking people what they really think of the dear leader.

  10. Fourth doorman of the apocalypse says:

    You can see why they were worried when Nature prints stuff like this:

  11. Brian says:

    Ridley WSJ C1 2015.05.02

  12. Grumpy Old Man says:

    Nice that you’re not talking about the Scottish National Party.

  13. That is indeed a funny story regarding suppression of science when it sells the absolutely wrong politically correct version of what we are supposed to believe. It was a false alarm as Cochran has explained.

    One of these years I expect to read what actually happened to Bruce Lahn when he asked “Could interbreeding with Neanderthals have led to an Enhanced Human Brain?”

    An interesting question, one that would not launch the flying squads of politically correct science police to come and get you. But this little add on in Bruce Lahns’ paper effectively ended his promising career as a lead researcher at the prestigious Howard Hughes Medical Institute. In the actual paper, behind a paywall now, he showed that a specific gene that regulates brain growth had entered the human gene pool from an archaic population approximately 37,000 years ago and had spread rapidly, presumably because of it’s intelligence enhancing properties had not spread south of the Sahara. Meaning that everyone in the world had benefitted from this gene except those folks living south of the Sahara.

    You do not propose that people around the world are different because evolution works on our brains. You especially don’t back it up with facts. Bruce Lahn now works in China and nobody is taking up where he left off nearly 10 years ago.

