It’s happened again, folks! Another company has updated their ethnicity estimates, and people are all aflutter. They replaced my German with Sweden & Denmark! Why can’t they give me better estimates from Africa? Mine are 100% accurate! It’s all a scam!
So why are ethnicity estimates so hit-or-miss? Why do different companies give you different estimates? And which company is best?
Last Question First
I’ll address some of the science nitty gritty below, but let’s cut to the chase: Which test is best for ethnicity? To answer that, we need to know what the “right” answer is, and we need an objective way of comparing estimates from different companies.
If your tree is reasonably well documented out to 2nd or 3rd great grandparents (or further) and you know where those ancestors were from, you can easily calculate a “score” to compare your ethnicity estimates at different companies.
In my case, I noted the origins of each of my 32 great-great-great grandparents. On my father’s side, all were either immigrants themselves or the children of immigrants, so their origins were straightforward. Most of my mother’s ancestors have been in Louisiana since before the Revolutionary War and are mixes of Acadian French, other French, Spanish, and Irish.
I used this information to estimate that I am roughly 38% French (including Acadian), 34% German, 17% Irish, 6% Spanish, 2% English, and 3% unknown (probably French or English). These are my “expected” percentages.
How does that stack up to the company estimates? Here’s a simple scoring system:
- First, treat each percentage as a regular number, so 38% becomes 38, not 0.38.
- Next, for each of your known ethnicities, subtract your company estimate from your expected percentage. For example, AncestryDNA currently estimates me at 42% French when I expected 38%. That’s a difference of 42 – 38 = 4. Some values will be negative; don’t worry about that.
- Third, square your differences. In my case, 42 is 16. I do the same for each ethnicity. (Squaring the values is a trick borrowed from curve-fitting in statistics. It increases the penalty on estimates that are further off the mark.)
- Finally, sum up all the squared values.
This gives you a “score” for each ethnicity estimate from each company. Here are my scores for four different companies. Lower scores are better, like golf.
By this metric, 23andMe gives me the best ethnicity estimates, with a score of 173. The recent update from AncestryDNA was next, scoring 1,460. Living DNA and MyHeritage trailed quite a bit, at 5,319 and 7,104, respectively.
I’ve been tracking my ethnicity estimates since I first tested. The trends are interesting. 23andMe has gotten better for me with every update, from an initial score of 2,459 to 173 now. AncestryDNA, on the other hand, has been all over the board, starting at 5,594 and reaching a low of 70 before veering upward again. That score, from 2018, was the best ethnicity estimate I’ve had from any company. Living DNA was getting progressively better (dropping from 4,195 to 519) until this latest update, which puts me at 100% French. That is biologically impossible, given that my father is German and Irish.
What works for me might not work for you, though. I encourage you to calculate your own scores and post the results in the comments. Comparing which companies work best for which ethnicities is a valuable exercise. That information will be especially helpful for those with unnknowns in their trees, like adoptees, people with unknown grandfathers, and so on.
The Science Behind Ethnicity
Our ethnicity results change with updates and vary among testing companies because ethnicity estimates are hard. Much like the Babylonians estimated the value of π (pi) as 3, and mathematicians in Egypt, China, India, and Europe refined that value over the centuries, ethnicity estimates improve (or not) at variable rates depending on the company and its methods.
Ethnicity estimates are a lot harder than π. After all, there’s only one π, but we each get our own ethnicity estimates drawn from thousands of genetically distinct ethnic groups around the world. And π doesn’t change. The true value of π was the same 1,000 years ago as it is today, while the genetic makeup of, say, someone in the Levant in 1500 CE wasn’t the same as a living Palestinian.
That brings up another challenge: ethnicity estimates reflect your ancestors from about 500–1000 years ago, but the companies are using modern-day people as their reference points. After all, they don’t have a TARDIS to travel back in time to ask our ancestors to spit in a tube.
Those modern-day individuals make up what’s called a reference panel or reference dataset. They are carefully chosen to have ancestors who lived in the same area for generations, under the assumption that those living individuals reflect the genetics of the area 500–1000 years ago. Your DNA is compared to that reference panel to get your ethnicity estimates.
The companies all draw from some publicly available datasets like the 1000 Genomes Project. They then supplement that public data with their own proprietary samples. Accuracy depends on how many reference individuals a company has and how many regions of the world they represent. That’s one reason you are likely to get different estimates from each company, even though you are always you.
The table below compares the reference panels for the five biggest genealogy databases, as of 2021.
To generate your ethnicity estimate, a company look at each bit of DNA in your sample, compares it to each of their reference populations, and estimates the likelihood that your bit came from that population. They do that for each of ≈700,000 bits of DNA (called SNPs, or snips) then calculate percentages based on how much of your genome falls into each population.
23andMe will even show you a “chromosome painting” of which regions are assigned to each part of your chromosomes.
Of course, the reference populations can be quite similar to one another. For example, France and Germany have been at the crossroads of Europe for centuries, so it’s not surprising that their populations have incorporated genetic diversity from southern, northern, and eastern Europe. That can make some areas harder to distinguish than others. And if your parents were French and German, respectively, like mine, it can be especially hard for the algorithms to tell them apart in you.
You Are Not (Quite) Who You Think You Are
You inherit exactly half of your autosomal DNA from each parent, but you don’t inherit equal amounts of atDNA from each grandparent, great grandparent, etc. The way DNA is divvied up during reproduction is too random for such precision.
As a result, you might genuinely have a Korean great grandmother but not 12.5% Korean DNA. That randomness is built in to the biology; there’s nothing the testing companies can do about it. It doesn’t mean the testing company got it wrong if they only estimate 7%, or that your great grandmother wasn’t fully Korean.
This variability gets worse with each generation back. There’s a small chance you won’t inherit any DNA at all from a specific 5th great grandparent, and the chances of that happening for any given ancestor increase with each generation. This is the difference between a genealogical ancestor (in your tree) and genetic ancestor (in your DNA).
As you can see, there are many reasons ethnicity estimates aren’t as exact as we’d like them to be. It’s a developing science, and I personally appreciate the ongoing efforts the companies put into it.