It’s that time of year, folks, when AncestryDNA does their annual update of ethnicity estimates! These are the percentages accompanied by colorful maps that indicate where your ancestors may have lived centuries ago. Like all scientific disciplines, this one evolves over time as new data and new methods are developed, so estimates are refined periodically. I eagerly await the yearly update.
How It Works
Ethnicity estimates reflect your ancestral origins from about 500–1000 years ago. Lacking DNA samples from that far back, the companies compare your DNA to modern-day individuals with deep roots in specific geographic regions. This so-called reference panel or reference dataset is assumed to reflect the historical genetics of the area.
All of the genealogy DNA companies base their reference panels on publicly available datasets like the 1000 Genomes Project, then supplement that with their own proprietary samples. That’s one reason you are likely to get different estimates from each company, even though you are always you.
Accuracy depends on (1) the size of the reference panel, (2) how many regions of the world are represented, (3) the number of samples from each region, and (4) how genetically distinct each population is. A company with 50,000 individuals in their reference panel is likely to give better estimates than a competitor with only 5,000. Similarly, a reference panel with people from around the globe should be more accurate than one with the same number of people who are all from the same place. And a large panel with similar numbers from each region is best of all.
AncestryDNA had the largest reference panel in the industry even before this update, and they’ve just added 12,134 people. Their panel is now nearly five times larger than 23andMe’s, the next largest.
Despite adding more than 12,000 people to their panel, AncestryDNA only has eight new regions:
- Central & Eastern China (358 individuals)
- Southwestern China (266)
- Nepal & the Himalayan Foothills (399)
- Tibetan Peoples (200)
- Nigeria—East Central (471)
- Nilotic Peoples (233)
- Hawaii (392)
- New Zealand Māori (206)
In other words, they’ve increased the average number of reference individuals per region overall. In fact, the previous average was 735 and it’s now 818. Compare to an average of 320 for 23andMe, ≈119 for MyHeritage, and 90 for FamilyTreeDNA (FTDNA).
However, the eight new populations are all well below the AncestryDNA regional average. In practice, I suspect that the estimates we get for those regions today will be good-not-great and that they’ll improve with future updates. For example, the new Nigeria—East Central group will probably show quite a bit of overlap with the earlier Nigeria group until the reference panel increases for those regions.
Case in Point: Māori and Hawaiian Estimates
I don’t have access to DNA kits for individuals from most of the new groups. However, Kalani Mondoy reports suspect results for Hawaiian testers on his Polynesian DNA blog. For example, Kalani’s fully Hawaiian cousin is estimated at 70% Hawaiian and 30% New Zealand Māori. Conversely, I’ve seen a Māori tester who is estimated as 23% Hawaiian.
This suggests two things: Hawaiians and Māori are very similar genetically—they have similar ancestral origins, and they were treated as a single population in the previous version—and the reference samples aren’t yet big enough to distinguish them clearly.
There’s no cause for alarm, though. We’ve seen this before. There was a time when AncestryDNA didn’t differentiate French from German. The two countries share a long, historically labile border, they are genetically similar, and both are at the crossroads of European migration and gene flow.
Back in 2018, AncestryDNA estimated me at a combined 77% from France & Germany, similar to the 73% I expected from my well-documented tree. In 2020, they began trying to separate the two regions and updated me to 36% and 12% respectively. That was a big shift in the wrong direction. Subsequent updates have ratcheted my estimates closer and closer to what I consider accurate.
The same should happen with Māori versus Hawaiian estimates over time. Consider that AncestryDNA currently has only 206 reference Māori and 392 Hawaiians, compared with 1,407 French and 2,072 Germans when they first attempted to differentiate French and German and did it poorly. They now have 5,471 and 3,382, respectively and are doing a much better job. The larger the reference sample, the better the genetic profile they can establish for each population.
How Do the Estimates Compare to My Tree?
The best way to gauge the estimates is to compare them to known ancestral backgrounds, as I summarized above for Hawaiian and Māori people. I know my own tree best, and I’m always curious how new ethnicity estimates stack up.
I know the names of all 32 of my 3rd great grandparents, and for all but one of them, I know their ethnic origins. The holdout is Eliza Louisa Richard (1845–1881). In Louisiana, a surname like Richard could be French (pronounced REE-shar) or English (RICH-urd). I suspect she was English, but her parents are both brick wall ancestors.
We don’t inherit equal amounts of DNA from each ancestor, but for the purposes of this example, assume that I did. Across all 32 third great grandparents, that would put me at roughly 38% French, 34% German, 17% Irish, 5% Spanish, and 5% English (including Eliza).
How did AncestryDNA do this time? Not bad. They overestimated my French (47%) a bit and underestimated my German (26%). Then again, it’s also possible that my ethnicities don’t add up to the exact percentages I expected because of the randomness of DNA inheritance. Plus, some of my German ancestors lived very close to what is now France. In any case, AncestryDNA is a lot closer than they’ve been in the past.
Last year, I developed a metric to summarize how far off ethnicity estimates are from the known tree. I describe it here. With this “E-score” you can compare one testing company to another as well as one company’s estimates over time. The lower the score, the better the estimate.
At the moment, both 23andMe and AncestryDNA are doing quite well, with the former slightly nudging out the latter with a score of 173 to 234. Both have improved substantially over time, despite some backsliding when AncestryDNA first tried to divvy up French and German. MyHeritage, on the other hand, hasn’t updated their estimates since 2018 and does a poor job on my ancestral origins.
(23andMe updated their estimates yesterday for some customers, but my kit is not eligible.)
¿Donde Esta Mi Etnia Española?
AncestryDNA now completely misses my Spanish heritage. I should be about 5% Spanish from my Canary Island ancestors, who settled in Louisiana between 1778 and 1783. 23andMe estimates roughly 10%.
I suspect this omission is due to two things. First, AncestryDNA’s Spanish reference only has 970 individuals, versus 5,471 for France. In other words, I think my Spanish is incorrectly getting pulled into the French category.
Second, AncestryDNA is still assigning most of my chromosomes to single ethnicities in the updated chromosome painting. As I discussed in an earlier post, I think a flaw in their algorithm is preventing the estimate from switching from one ethnicity to another, even when it should. My paternal background is far too mixed, generationally speaking, for me to have entire chromosome that are, for example, all German or all Irish. On my mother’s side, my French appears to be overriding a switch to Spanish, resulting in an overestimate of one and the elimination of the other.
Moral of the Story
Coming full circle, science progresses in fits and starts. Ethnicity estimates are no different. We don’t expect them to be perfect—or even close to perfect—for many years.
What we do expect, though, is that the companies will continue to improve their offerings over time. That’s why I’m always eager to see these regular updates.
How do your estimates stack up?