A heat map showing locations of previously unknown DNA variants. Red indicates higher number of discoveries, black fewer.Credit: Harvard Medical School
A study of hundreds of new genomes from across the globe has yielded insights into modern genetic diversity and ancient population dynamics, including compelling evidence that essentially all non-Africans today descend from a single migration out of Africa.
The multinational research effort, led by Harvard Medical School geneticists and published Sept. 21 in Nature, also suggests that no single gene can explain the significant cultural and cognitive progress in human development that occurred about 50,000 years ago.
The study represents the largest data set yet of high-quality genome sequences from understudied populations, adding nearly 6 million DNA base pairs to the “canonical” human genome sequence published in 2001.
The data identify millions of previously unknown population-specific mutations that may help scientists develop precision-targeted diagnostic tests and treatments on their quest to improve the health of the world’s underserved populations.
Most genome-wide population sequencing studies to date have focused on a handful of large populations. The HMS-led study, by comparison, sequenced samples from 142 smaller populations, most of which were previously understudied.
“As humans, we are not just the people who live in industrialized countries, and we are not just the people who live in numerically large groups,” said David Reich, professor of genetics at HMS and senior author of the study. “If we want to understand who we really are, we have to realize that some of the most interesting aspects of human variation are only present in underrepresented, small populations.”
“We wanted to go out into the world and pull together as many of the ethnically, linguistically and anthropologically diverse samples as we possibly could,” said Swapan Mallick, bioinformatic systems director in the Reich lab and first author of the study.
The team’s analyses are already answering questions about various populations’ genetic origins, but, the researchers note, these insights are only a milestone on a longer journey.
“Of course, there are thousands of ethnically distinct populations in the world, and much more work needs to be done,” said Mallick.
Reich, Mallick and their international team of colleagues began by selecting two genomes each from 51 populations represented in a collection called the Human Genome Diversity Project. Next, they assembled samples from members of 91 other groups, including diverse Native American, South Asian, and African populations not previously included in genome-wide studies, and sent the DNA for sequencing. In all, the project analyzed the genomes of 300 people.
A key conclusion — that the vast majority of modern human ancestry in non-Africans derives from a single population that migrated out of Africa — is also supported by two other whole-genome sequencing studies appearing simultaneously in Nature. One, led by an Estonian group, focused on 379 whole genome sequences; the other, led by a Danish group, analyzed 108 Australians and New Guineans.
Together, the three studies put to rest a lingering question about whether indigenous peoples of Australia, New Guinea and the Andaman Islands descend in large part from a second group that left Africa earlier and skirted the coast of the Indian Ocean. They do not, the HMS researchers say.
“Our best estimate for the proportion of ancestry from an early-exit population is zero,” said Reich, who is also an investigator of the Howard Hughes Medical Institute and associate member of the Broad Institute. “Taken together, all three studies leave wiggle room for, at most, around two percent.”
The HMS-led study further revealed that the common ancestors of modern humans began to differentiate at least 200,000 years ago, long before the out-of-Africa dispersal occurred.
“It had been unclear whether the group that expanded out of Africa represented a large subset of the populations within Africa,” said Mallick. “This really shows that there was a lot of substructure prior to the expansion.”
The additional discovery that genetics alone can’t account for the acceleration of cultural, economic and intellectual progress in the last 50,000 years runs contrary to a popular hypothesis in the field.
“There does not seem to have been one or a few enabling mutations that suddenly appeared among our ancestors and allowed them to think in profoundly different ways,” said Reich.
Instead, the researchers say, a constellation of factors, including environment, lifestyle, and possibly genes, precipitated the rapid changes that occurred.
“Geneticists often search for examples where genetics is the explanation. Here, paradoxically, genetic data are showing that there will be no clear genetic answers,” Reich said.
Mallick and colleagues overcame significant logistical hurdles posed by sharing and processing an enormous amount of data.
Often, in studies of this size, data are collected in many laboratories that use different sequencing machines and different experimental protocols. This can create so-called batch effects that make it difficult to distinguish true differences among samples. The current study minimized batch effects by sending all of the samples to a single center to be sequenced at the same time.
The team made much of the data set publicly available in 2014; multiple research groups have already used it for their studies.
In a way, the authors say, the findings reported thus far are just the tip of the iceberg.
“It’s impossible for our group to analyze even a tiny fraction of what the data represents,” said Mallick. “Our goal is to push the data out and let people use it to consider their own questions.”
Primary funding for the study, called the Simons Genome Diversity Project, was provided by the Simons Foundation (SFARI 280376) and the National Science Foundation (BCS-1032255).
Source: Harvard Medical School
- Swapan Mallick, Heng Li, Mark Lipson, Iain Mathieson, Melissa Gymrek, Fernando Racimo, Mengyao Zhao, Niru Chennagiri, Susanne Nordenfelt, Arti Tandon, Pontus Skoglund, Iosif Lazaridis, Sriram Sankararaman, Qiaomei Fu, Nadin Rohland, Gabriel Renaud, Yaniv Erlich, Thomas Willems, Carla Gallo, Jeffrey P. Spence, Yun S. Song, Giovanni Poletti, Francois Balloux, George van Driem, Peter de Knijff, Irene Gallego Romero, Aashish R. Jha, Doron M. Behar, Claudio M. Bravi, Cristian Capelli, Tor Hervig, Andres Moreno-Estrada, Olga L. Posukh, Elena Balanovska, Oleg Balanovsky, Sena Karachanak-Yankova, Hovhannes Sahakyan, Draga Toncheva, Levon Yepiskoposyan, Chris Tyler-Smith, Yali Xue, M. Syafiq Abdullah, Andres Ruiz-Linares, Cynthia M. Beall, Anna Di Rienzo, Choongwon Jeong, Elena B. Starikovskaya, Ene Metspalu, Jüri Parik, Richard Villems, Brenna M. Henn, Ugur Hodoglugil, Robert Mahley, Antti Sajantila, George Stamatoyannopoulos, Joseph T. S. Wee, Rita Khusainova, Elza Khusnutdinova, Sergey Litvinov, George Ayodo, David Comas, Michael F. Hammer, Toomas Kivisild, William Klitz, Cheryl A. Winkler, Damian Labuda, Michael Bamshad, Lynn B. Jorde, Sarah A. Tishkoff, W. Scott Watkins, Mait Metspalu, Stanislav Dryomov, Rem Sukernik, Lalji Singh, Kumarasamy Thangaraj, Svante Pääbo, Janet Kelso, Nick Patterson, David Reich. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature, 2016; DOI:10.1038/nature18964