For months, AncestryDNA has been quietly reprocessing their entire database of more than 10 million customers behind the scenes, and on Wednesday, September 12th, 2018, they rolled out a major enhancement to their ethnicity estimates. New testers will automatically be given the revamped version starting Wednesday, and existing customers will have the option to see the updated estimates when they check their accounts and click on the green button. You will be asked about your expectations, then you’ll get to see the new estimates.
Mine look like this:
The display highlights changes in percentages (e.g., Ireland and England, above), which regions have been refined (France, Sweden), which are new to AncestryDNA’s estimates (Sardinia), and which have been removed from your personal profile (Europe South, Iberian Peninsula). Only the main regions will be affected, not the subregions variously shown as Migrations or Genetic Communities (Acadian in the example). Regions previously labeled “Low Confidence” may disappear entirely.
For a limited time you’ll be able to switch back to the earlier version if you like.
What’s Different About the New Estimates?
Users will see some regions with revised names, entirely new regions, and changed percentages for existing regions. For example, there are now 15 regions within Asia, where previously there were five. In Europe, “Scandinavia” has been separated into Norway and Sweden, and “Europe West” is now differentiated into France and Germany. A table of the previous versus new regions is presented at the bottom of this post.
Those of African descent may see slightly different regional maps and ethnicity percentages, but not as drastic a change as Europeans and Asians will get. Ancestry’s scientists explained it this way: Africa is massive. It also has the highest human genetic diversity in the world because our species originated in Africa and has had hundreds of thousands of years to form distinct populations. Not many Africans have tested yet, so that immense diversity is currently poorly sampled. The good news is that AncestryDNA is launching an African Diversity Project to improve their sampling.
Changes Behind the Scenes
The updated estimates are the product of two major changes in how the percentages are calculated. The first is a five-fold expansion of their “reference panel”, the group of people with known ancestry to which the rest of us are compared. Want to know whether you’re part Melanesian or Senegalese or European Jewish? To determine that, we’d need to compare your DNA to people who are known to be predominantly from those populations.
The more genetic diversity in the source population, the larger the reference panel must be to accurately reflect the region. Coming full circle, that’s why the estimates for African ancestry are still a work on progress.
Prior to this update, the reference panel comprised 3,000 people from 26 main geographic regions divided into 363 subregions. It’s now made up of 16,000 people from 43 main regions and 380 subregions. The additions to the panel come from AncestryDNA’s own customers who consented to participate in research and whose trees documented their regional associations.
With more individuals per region, the reference panel can better capture the genetic diversity in a given population.
Think of a population like a bag of, say, 1,000 colored marbles in 12 different colors. If you reach in and grabbed six marbles, you might by chance pick six different colors, or you might get two blues, three blacks, and a yellow. In neither case would you have a good estimate of what’s really in the bag. Even if you grabbed 20 marbles, you might double up on some colors and miss others. The colors, of course, represent the genetic markers in the population, and the more marbles (individuals) you sample, the better your estimate of the whole population. And the better the estimate, the better the comparison of you to that population.
The other major change behind the scenes involves how the DNA itself is analyzed. Previously, AncestryDNA’s computers looked at each individual point (or SNP) in our DNA separately. Since the SNPs we use for genealogy are usually present in only two versions, there’s only so much “signal” one can carry. The new approach analyzes strings of SNPs as a unit (i.e., a haplotype), meaning there are more unique combinations and more precision to the estimates.
The new algorithm looks at 1,001 windows, or haplotypes, in our DNA. AncestryDNA compares about 3,475 cM and 650,000 SNPs in the genome, so each window should be roughly 3.5 cM and 650 SNPs. These numbers are very general approximations, of course.
A final consideration that increases the accuracy of the ethnicity estimates is that the reference panel data are phased. That means that the maternal and paternal contributions to a child’s DNA are identified as such. Phasing improves the identification of the haplotypes used in the analysis. (Only the reference panel is phased, not the customer’s data.)
AncestryDNA has issued a new White Paper on the updated methodology and eventually will publish a scientific paper as well.
How the Ethnicity Estimates Stack Up for Me
My family tree is complete, documented, and supported by DNA evidence through my 3rd great grandparents. Beyond that, some lines are traced back many more generations while others dead-end in their European countries of origin. Thus, I have a reasonable feel for what my ethnicity estimates should be, at least within the past couple of centuries.
Granted, my DNA won’t reflect the exact proportions as my pedigree because of how DNA is passed on. While I inherited exactly 50% of my autosomal DNA from each parent, I didn’t inherit exactly 25% from each grandparent, 12.5% from each great grandparent, and so on. The biological process of recombination causes those proportions to vary.
From a pedigree perspective, I am 38.3% French, 34.4% German, 17.2% Irish, 5.5% Spanish (mostly Canary Islander), 1.6% English, and 3.1% unknown.
[Aside: The unknown is a woman name Elizabeth Richard (1843–1881), who married Gilbert Cesaire Elmer (1838–1889). If you come across her in your tree, please contact me! Her parents are brick walls. Her surname could be either English (RIH-chard) or Acadian (REE-shard).]
How well did AncestryDNA do? Here’s a side-by-side view of my estimates before and after the update:
And this table shows how they compare to what I expect from my pedigree:
|Ethnicity||Expected||Before Update||After Update|
Overall, a vast improvement! The biggest discrepancy is that all of my expected German seems to be counted as French. That’s not surprising given that my German ancestors were originally from southern and southwestern Germany (Bavaria, Baden–Württenberg, and Rhineland–Palatinate), areas that have had a long history of interchange with what is now France. The region zoned as “France” in my personalized map doesn’t quite cover those states of Germany, however.
The decreases in Irish, English, and Scandinavian are all much more in line with my known ancestry, as is the loss of Italian/Greek.
I was disappointed to see the Iberian estimate disappear. I grew up near a town called New Iberia, Louisiana, and my tree contains Dominguez, Miguez, Romero, Viator (originally Villatoro), and similar names. I’m proud of my Spanish ancestry. On the other hand, my mother’s estimate for Spain is now 8% (down from 21%; expected 11%), which is reasonable.
The Sardinian is a complete head-scratcher.
From a scientific perspective, the new estimates are a huge improvement over the previous version. Both the five-fold increase in the reference panel and the use of haplotypes instead of SNPs should give much more accurate estimates. Are they perfect? Of course not! They’re just one step forward in a rapidly advancing field. AncestryDNA’s scientists experimented with several new methods before settling on this one as the best. This is exactly what we hope to see in science: successive attempts that each build upon the lessons of the previous ones.
From my mixed-European perspective, AncestryDNA’s done a pretty good job. The discrepancies are minor and easy to explain, especially when we consider that the percentages from my pedigree date back only 200 years or so while the DNA-based estimates are meant to reflect my ancestry 500–1,000 years ago.
How do your new estimates compare to your documented pedigree? Feel free to leave comments here, both positive and negative, and also to report your thoughts to AncestryDNA. They’ve left a feedback region just for that, and they want to know what we think!
Comparison of Old and New Regions
The table below shows how the new ethnicity regions correspond to the previous ones. Regions in bold are either entirely new or more precisely circumscribed from the previous version.
|Previous Regions||New Regions||No. of Samples|
|Africa North||Northern Africa||41|
|Africa South-Central Hunter-Gatherers||Africa South-Central Hunter-Gatherers||34|
|Benin & Togo||Benin & Togo||224|
|Cameroon & Congo and Africa Southeastern Bantu||Cameroon, Congo, & Southern Bantu Peoples||579|
|Ivory Coast & Ghana||Ivory Coast & Ghana||124|
|Not covered in the previous version||Eastern Africa||82|
|Native American||Native American—North, Central, South||146|
|Native American||Native American—Andean||63|
|Asia South||Southern Asia||600|
|Asia South||Western & Central India||65|
|Asia Central||Central & Northern Asia||186|
|Asia East||Southeast Asia–Dai (Thai)||80|
|Asia East||Korea & Northern China||261|
|Asia East||Southeast Asia–Vietnam||159|
|Middle East||Middle East||271|
|Middle East and Caucasus||Iran / Persia||459|
|Middle East and Caucasus||Turkey & the Caucasus||101|
|Great Britain||England, Wales & Northwestern Europe||1519|
|Ireland/Scotland/Wales||Ireland & Scotland||500|
|Europe East||Baltic States||194|
|Europe East||Eastern Europe & Russia||1959|
|European Jewish||European Jewish||200|
|Europe West||Germanic Europe||2072|
|Europe South||Greece & the Balkans||242|