Phonemes probably can't reveal the ancient origins of language after all

Last April, a linguistic study likened the spread of the sounds of language to the human gene pool, and used this information to suggest language arose once in Africa 10000 years ago. If only it were that simple.

The study was the work of University of Auckland researcher Quentin Atkinson, who argued that phonemes undergo a phenomenon similar to the founder effect in genetics. When part of an initial population breaks away to form a separate group, this smaller subset will naturally have less genetic diversity than the entire original group. Over time, each new population will have slightly less genetic diversity than each group it left, and studying the genetic diversity of different regions can pinpoint where humans likely originated - in this case, East Africa, which tallies well with all available paleontological evidence.

Atkinson argued that the phoneme diversity of the 504 languages broke along similar lines to human genetic diversity, with greater phoneme diversity in Africa and less in South America or Oceania. The data seemed to reveal a fair consistent pattern of decreased diversity the further one got from East Africa. Assuming there really was a linguistic founder effect, that strongly suggested the spread of language mirrored that of our species in general - language emerged only once about 100,000 years ago in East Africa, then spread out from there.

But now a trio of linguists from Germany's Ludwig Maximilian University and the Max Planck Institute of Psycholinguistics are arguing that this analysis just doesn't work. Part of the problem, they argue in Science, is how the data was put together from UCLA's Phonological Segment Inventory Database, or UPSID. Atkinson's phoneme estimates don't actually to seem to correlate well with the estimates from UPSID itself, instead giving what the linguists argue is "unjustified weight" towards vowels and tones instead of consonants.

There also seems to be some trouble with definitions, as the linguists argue the original study conflates different types of "phoneme diversity", one referring to the variations in phoneme use among the individuals of a population and the other referring to the variation between languages. Finally, there's the question of why phonemes should be the main marker of the spread of language, particularly since they are capable of dropping in and out of languages in ways that could easily obscure or throw off an apparent founder effect. Analyzing language spread based on the construction of subordinate clauses or the passive voice could provide equally valid, not to mention completely contradictory results.

When the linguists reran the original analysis using the first paper's data set, they found that the data didn't just point to an origin in Africa - it also supported an origin point in the Caucasus region straddling Europe and Asia. Besides, the data suggested the lowest variability in languages was found in New Guinea, Australia...and west Africa, which is one of the places that really should have among the highest variability for the analysis to hold much water.

You can check out the whole debate, along with some useful supporting data, over at Science. While it doesn't look at all good for the results of Atkinson's paper, it will be interesting to see whether there is anything worth salvaging in his methods. Unfortunately, I wouldn't bet on it - as a lot of other language researchers have found out, there just isn't much you can say about the birth of language 100,000 years ago when our very oldest data barely reaches back 10,000 years. This may just have to remain one of humanity's great unsolvable puzzles.

Via Science. Top image by Sangoiri, via Shutterstock.