Major Enhancement to AncestryDNA’s Ethnicity Estimates

For months, AncestryDNA has been quietly reprocessing their entire database of more than 10 million customers behind the scenes, and on Wednesday, September 12th, 2018, they rolled out a major enhancement to their ethnicity estimates.  New testers will automatically be given the revamped version starting Wednesday, and existing customers will have the option to see the updated estimates when they check their accounts and click on the green  button. You will be asked about your expectations, then you’ll get to see the new estimates.

Mine look like this:

The display highlights changes in percentages (e.g., Ireland and England, above), which regions have been refined (France, Sweden), which are new to AncestryDNA’s estimates (Sardinia), and which have been removed from your personal profile (Europe South, Iberian Peninsula). Only the main regions will be affected, not the subregions variously shown as Migrations or Genetic Communities (Acadian in the example).  Regions previously labeled “Low Confidence” may disappear entirely.

For a limited time you’ll be able to switch back to the earlier version if you like.

 

What’s Different About the New Estimates?

Users will see some regions with revised names, entirely new regions, and changed percentages for existing regions.  For example, there are now 15 regions within Asia, where previously there were five.  In Europe, “Scandinavia” has been separated into Norway and Sweden, and “Europe West” is now differentiated into France and Germany. A table of the previous versus new regions is presented at the bottom of this post.

Those of African descent may see slightly different regional maps and ethnicity percentages, but not as drastic a change as Europeans and Asians will get. Ancestry’s scientists explained it this way:  Africa is massive.  It also has the highest human genetic diversity in the world because our species originated in Africa and has had hundreds of thousands of years to form distinct populations.  Not many Africans have tested yet, so that immense diversity is currently poorly sampled.  The good news is that AncestryDNA is launching an African Diversity Project to improve their sampling.

Map adapted from “Africa Dwarfs China, Europe and the U.S.” by Mark Fischetti. Scientific American, July 1, 2015.

 

Changes Behind the Scenes

The updated estimates are the product of two major changes in how the percentages are calculated.  The first is a five-fold expansion of their “reference panel”, the group of people with known ancestry to which the rest of us are compared.  Want to know whether you’re part Melanesian or Senegalese or European Jewish?  To determine that, we’d need to compare your DNA to people who are known to be predominantly from those populations.

The more genetic diversity in the source population, the larger the reference panel must be to accurately reflect the region.  Coming full circle, that’s why the estimates for African ancestry are still a work on progress.

Prior to this update, the reference panel comprised 3,000 people from 26 main geographic regions divided into 363 subregions.  It’s now made up of 16,000 people from 43 main regions and 380 subregions.  The additions to the panel come from AncestryDNA’s own customers who consented to participate in research and whose trees documented their regional associations.

With more individuals per region, the reference panel can better capture the genetic diversity in a given population.

Think of a population like a bag of, say, 1,000 colored marbles in 12 different colors.  If you reach in and grabbed six marbles, you might by chance pick six different colors, or you might get two blues, three blacks, and a yellow.  In neither case would you have a good estimate of what’s really in the bag.  Even if you grabbed 20 marbles, you might double up on some colors and miss others.  The colors, of course, represent the genetic markers in the population, and the more marbles (individuals) you sample, the better your estimate of the whole population.  And the better the estimate, the better the comparison of you to that population.

The other major change behind the scenes involves how the DNA itself is analyzed.  Previously, AncestryDNA’s computers looked at each individual point (or SNP) in our DNA separately.  Since the SNPs we use for genealogy are usually present in only two versions, there’s only so much “signal” one can carry.  The new approach analyzes strings of SNPs as a unit (i.e., a haplotype), meaning there are more unique combinations and more precision to the estimates.

The new algorithm looks at 1,001 windows, or haplotypes, in our DNA.  AncestryDNA compares about 3,475 cM and 650,000 SNPs in the genome, so each window should be roughly 3.5 cM and 650 SNPs.  These numbers are very general approximations, of course.

A final consideration that increases the accuracy of the ethnicity estimates is that the reference panel data are phased.  That means that the maternal and paternal contributions to a child’s DNA are identified as such.  Phasing improves the identification of the haplotypes used in the analysis.  (Only the reference panel is phased, not the customer’s data.)

AncestryDNA has issued a new White Paper on the updated methodology and eventually will publish a scientific paper as well.

 

How the Ethnicity Estimates Stack Up for Me

My family tree is complete, documented, and supported by DNA evidence through my 3rd great grandparents.  Beyond that, some lines are traced back many more generations while others dead-end in their European countries of origin.  Thus, I have a reasonable feel for what my ethnicity estimates should be, at least within the past couple of centuries.

Granted, my DNA won’t reflect the exact proportions as my pedigree because of how DNA is passed on.  While I inherited exactly 50% of my autosomal DNA from each parent, I didn’t inherit exactly 25% from each grandparent, 12.5% from each great grandparent, and so on.  The biological process of recombination causes those proportions to vary.

From a pedigree perspective, I am 38.3% French, 34.4% German, 17.2% Irish, 5.5% Spanish (mostly Canary Islander), 1.6% English, and 3.1% unknown.

[Aside:  The unknown is a woman name Elizabeth Richard (1843–1881), who married Gilbert Cesaire Elmer (1838–1889). If you come across her in your tree, please contact me!  Her parents are brick walls.  Her surname could be either English (RIH-chard) or Acadian (REE-shard).]

How well did AncestryDNA do? Here’s a side-by-side view of my estimates before and after the update:

And this table shows how they compare to what I expect from my pedigree:

Ethnicity Expected Before Update After Update
French 38.3% 6.0%
Europe West
77.0%
German 34.4% 0%
Ireland 17.2% 27.0% 15.0%
Iberian Peninsula 5.5% 14.0% 0.0%
England 1.6% 23.0% 4.0%
Europe South 0.0% 21.0% 0%
Scandinavia 0.0% 9.0% 3.0%
Sardinia 0.0% 0.0% 1.0%

 

Overall, a vast improvement!  The biggest discrepancy is that all of my expected German seems to be counted as French.  That’s not surprising given that my German ancestors were originally from southern and southwestern Germany (Bavaria, Baden–Württenberg, and Rhineland–Palatinate), areas that have had a long history of interchange with what is now France.  The region zoned as “France” in my personalized map doesn’t quite cover those states of Germany, however.

The decreases in Irish, English, and Scandinavian are all much more in line with my known ancestry, as is the loss of Italian/Greek.

I was disappointed to see the Iberian estimate disappear.  I grew up near a town called New Iberia, Louisiana, and my tree contains Dominguez, Miguez, Romero, Viator (originally Villatoro), and similar names.  I’m proud of my Spanish ancestry.  On the other hand, my mother’s estimate for Spain is now 8% (down from 21%; expected 11%), which is reasonable.

The Sardinian is a complete head-scratcher.

 

Summary

From a scientific perspective, the new estimates are a huge improvement over the previous version. Both the five-fold increase in the reference panel and the use of haplotypes instead of SNPs should give much more accurate estimates.  Are they perfect?  Of course not!  They’re just one step forward in a rapidly advancing field.  AncestryDNA’s scientists experimented with several new methods before settling on this one as the best.  This is exactly what we hope to see in science:  successive attempts that each build upon the lessons of the previous ones.

From my mixed-European perspective, AncestryDNA’s done a pretty good job.  The discrepancies are minor and easy to explain, especially when we consider that the percentages from my pedigree date back only 200 years or so while the DNA-based estimates are meant to reflect my ancestry 500–1,000 years ago.

How do your new estimates compare to your documented pedigree?  Feel free to leave comments here, both positive and negative, and also to report your thoughts to AncestryDNA.  They’ve left a feedback region just for that, and they want to know what we think!

 

Comparison of Old and New Regions

The table below shows how the new ethnicity regions correspond to the previous ones.  Regions in bold are either entirely new or more precisely circumscribed from the previous version.

Previous Regions New Regions No. of Samples
Africa North Northern Africa 41
Africa South-Central Hunter-Gatherers Africa South-Central Hunter-Gatherers 34
Benin & Togo Benin & Togo 224
Cameroon & Congo and Africa Southeastern Bantu Cameroon, Congo, & Southern Bantu Peoples 579
Ivory Coast & Ghana Ivory Coast & Ghana 124
Not covered in the previous version Eastern Africa 82
Mali Mali 169
Nigeria Nigeria 111
Senegal Senegal 31
Native American Native American—North, Central, South 146
Native American Native American—Andean 63
Asia South Southern Asia 600
Asia South Balochistan 53
Asia South Burusho 23
Asia South Western & Central India 65
Asia Central Central & Northern Asia 186
Asia East China 620
Asia East Southeast Asia–Dai (Thai) 80
Asia East Japan 592
Asia East Korea & Northern China 261
Asia East Philippines 538
Asia East Southeast Asia–Vietnam 159
Middle East Middle East 271
Middle East and Caucasus Iran / Persia 459
Middle East and Caucasus Turkey & the Caucasus 101
Great Britain England, Wales & Northwestern Europe 1519
Ireland/Scotland/Wales Ireland & Scotland 500
Europe East Baltic States 194
Europe East Eastern Europe & Russia 1959
Iberian Peninsula Basque 22
Iberian Peninsula Spain 270
Iberian Peninsula Portugal 404
European Jewish European Jewish 200
Europe West France 1407
Europe West Germanic Europe 2072
Europe South Greece & the Balkans 242
Europe South Italy 1000
Europe South Sardinia 30
Scandinavia Norway 367
Scandinavia Sweden 372
Finland/Northwest Russia Finland 361
Melanesia Melanesia 49
Polynesia Polynesia 58

38 thoughts on “Major Enhancement to AncestryDNA’s Ethnicity Estimates”

  1. Get ready for a bazillion posts about why someone’s ethnicity changed to something else from what they had been told before. I can hear it now.

    1. Haha! I brewed an extra pot of coffee for just that eventuality. It’s good to get people thinking about how the science evolves.

  2. Thank you for the heads-up on the new ethnicities. I might not have known that for many months, otherwise.

    They took my 25% Scandavian and broke it into 6 % Sweden and put the rest with the GB category.

    They also gave me 3% German which is reasonable since each of my grandfathers were German. And, yes, I know the ethnicities are considered “deep ancestry.”

    1. There’s still a lot of room for improvement with their “Germanic Europe” category, in my opinion. If both of your grandfathers were German, you should have a lot more than 3%. The country of Germany also falls under their “England, Wales, & Northwestern Europe” and their “Eastern Europe & Russia” categories on their map. I suspect a lot of southwestern Germany will fall into “France”, as well. Does that fit what you’re seeing?

      1. To be exact:

        England, Wales, NW Europe 70%
        Increased by 41%

        Ireland % Scotland 20%
        Decreased by 10%

        Sweden 7%
        Refined from Scandanavia 25%

        Germanic Europe 3%
        Refined from Europe West 3%

      2. …and there is still a lot of room for improvement with their “England, Wales & NW Europe” category, which overlaps their “Ireland/Scotland” category. They took 20% of my Irish, which exists in my DNA plus some, to increase my “England, Wales & NW Europe” percentage from TWELVE to SEVENTY-FOUR. Obviously, either my original Irish/Scottish estimate was way off, or my current one is. By the way, I have a Celtic haplotype, which did not help me to retain my Irish descent in their analyses.

        1. Do you have any ancestry from northwest continental Europe? That might also explain your bump in that category.

  3. 25% of my ancestry is Acadian and Basque.85-90% of my matches are people with Acadian surnames. I’ve gone from 33% Europe West to 7% French and 1% Basque. Makes no sense. And my Irish/Scotland/Wales has increased from 40% to 50%. Should be about 25% based on the paper.

    1. You’re right, that doesn’t make much sense. My French increased a lot (6% Western Europe to 70% French) and I’m hearing similar reports from other Acadians, Cajuns, and French Canadians. Are you pretty mixed with different ethnicities?

  4. I just looked at my updated origins and it matches very closely to my documented pedigree information but then again, the before update map was very close to what I was expecting as well. The update has just clarified and redefined the broad Western Europe category that I had before. This updated map has taken some of that previous somewhat vague Western Europe data and moved it to the England,Wales and Northwestern Europe category. This is in line with what I would expect from the pedigree research. My paternal lineage is almost exclusively from the Great Britain area and so is half of my Mother’s lineage. My Mother’s German ancestry at the 3rd generation point comes from western Germany in areas bordering with France, Belgium and Luxembourg. This would put them into the England, Wales and Northwestern Europe area on updated map and explains why my percentage went up from 51 % Great Britain to 70% England, Wales and Northwestern Europe.

    1. I’d say that the new Ancestry update is a vast improvement. Previously, I had less than 1% british and 1% western european. Now, I have 74% england, wales and northwestern europe. My iberian, southern europe and eastern europe all went away (should have never been there in the first place). And my my scandinavian dropped from 63% to 11%. I’m not thrilled that my dutch and german is now lumped with my british… Still, I’d rate Ancestry just behind 23&me and well ahead of FTDNA and My Heritage.

      1. I do not have results from 23&me but have to agree on the ranking in regards to FTDNA and My Heritage! The My Heritage information was so limited it basically just says 95% North and West Europe, 5% Scandinavia. The FTDNA report does say British Isles but puts it at only 37% with another 32% going to West and Central Europe and 13% going to Scandinavia.

        I do wish they could narrow down the Dutch/German portion a bit more but I think Ancestry’s information is a very close approximation.

  5. How did you get such a specific breakdown country by country? Mine came back 100% European Jewish (same as before) with a further break down to Germany and Poland/Slovakia/Hungary/Romania. Germany was considered a very likely connection, the other a possible connection, but there were no further breakdowns by percentage or country. In fact I know that my paternal relatives were almost all from Germany except one line that I traced to Amsterdam but probably also came from Germany. My maternal relatives came from Poland and Romania, so far as I know. So why so heavily tilted to Germany and why no breakdowns beyond what I’ve described?

    1. The percentages are based on a comparison of your DNA to specific reference populations. The reference populations are determined not by country borders but by genetic similarity. It’s only after the references populations are determined that they go in and label them based on some identifiable feature. That they often label the populations with country names is unfortunate, because the population boundaries rarely correlate perfectly with geopolitical boundaries (which, after all, weren’t the same when the genetic signatures came into being).

      The genetic similarity among Ashkenazim results not from country boundaries but from the practice of marrying within the religion. You can see that in the map for European Jewish: http://thednageek.com/wp-content/uploads/2018/09/Screen-Shot-2018-09-13-at-9.41.13-AM.png

      What’s more confusing is that these maps are showing two very different types of analysis: ethnicity estimates (which give you percentages and represent genetic signal 500–1000 years ago) and Genetic Communities (which don’t give percentages and reflect connections within the past 200 years or so). The blob with the solid boundary in the European Jewish map is the region for the ethnicity estimate while the ones with dashed boundaries are Genetic Communities. (If you zoom in on a GC, you’ll see that each of those is further divided into three smaller communities.) GCs are defined by both genetic similarity and trees, so it may be that your stronger connection to the German Jews has more to do with the availability of genealogical records than anything else.

      1. Thanks, Leah—that makes sense to me. As I often have explained to people who are totally new to DNA testing, these ethnicity estimates are not based on the idea that there is a Jewish gene or a French gene or an Irish gene, but on statistical analysis of the test takers and their own self-identified backgrounds.

        Also, I looked again and realized that I was wrong about the countries listed in the Poland group—it does NOT include Romania. Romania is in a separate group, and Ancestry says it found no connection to that group. That makes me wonder: 25% of my DNA (more or less) should be from my maternal grandfather, who was born in Romania. I’ve never determined, however, where his parents or earlier ancestors came from. Is it crazy to think that the absence of a connection to Romania means his ancestors came from Poland or Germany and thus his DNA (as represented in my DNA) is being correlated to one of those places? Or is this just another example of what you said above—that these are not real borders, just estimates of where ancestors MIGHT have lived.

        1. Great question, and I don’t know the answer. I wouldn’t take it as proof that his parents were from Poland or Germany, more a possibility that’s worth more digging.

  6. Vast improvement and aligns more specifically with my documentary record. Ireland and Scotland, with an emphasis on Central Scotland and Ulster at 38% (check); England, Wales and NW Europe at 34% (having maternal family in the UK and paternal family in northern France and movement between, this makes sense); France 3% (the rest of the French, check); and, in the best new sorting of ethnicity estimates for me…Southeast Asia – Vietnam 14%; Phillipines 7% (my 2x GGM was Indonesian); and China 2%. Scandinavia is now 2% Norway and this still baffles me a teeny bit (my Dad has a bunch of Scandinavian matches on MyHeritage especially that I haven’t started to tackle). Good stuff, and helpful post!

  7. Great post and very informative.
    We couldn’t dream of writing up a post to match yours so we have just linked to you from Evidentree and have posted this to our Facebook page and group too.

    Now, off to check my new ethnicity results. Lucky we have you to tell us about this!

    1. Thank you for the kind words. I’d love to hear how your new results compare to your tree. Northern Europeans seem to be quite happy with the new estimates; southern Europeans less so.

      1. I have just had a chance to assess my new results. I am happy with the refinement and see that they come more into line with what I know from tree research and also other genetic studies.

        Besides having also tested with FTDNA and 23andme, because we have a deaf son (and to see if we could find a genetic cause of his deafness) my wife, son and I – along with our parents – have all had in-depth genetic studies made through medical laboratories where I am based in Switzerland.
        Other users who are less happy with the new results should keep in mind that this is an iterative process and those new refinements will continue to be made for many years to come. Only as more people test and more samples become available can we expect to see improved analysis.
        In the case of our deafness mystery, all currently known genetic causes have been ruled out. But we now wait – on the advice of the genetic counselors – for more deaf families to test so that the sample can grow and new trends can be found.

        The same applies to ethnicity calculations.

  8. I’m anxious to get my DNA redone after seeing the changes in yours. I hope I don’t lose my Spanish ancestry and get more French from the Western Europe.

  9. Today’s update significantly improved the accuracy of Ancestry’s estimates for me. Also, Ancestry’s region labels now make much more sense to me. Well done.

  10. My Italian estimate is way off… my grandfather’s parents were from Sant’Angelo dei Lombardi, Avellino, Campania Italy. I have documented in the Italian records ALL of my 1st, 2nd, 3rd, and 4th great grandparents. I also have almost all of my 5th ggp’s listed and I have 3 couples of my 6th ggp’s. All of those ancestors were from Sant’Angelo dei Lombardi. Additionally, my DNA lists are loaded with DNA cousins whose family lines and names are all from the same commune of Sant’Angelo dei Lombardi.
    Despite my documented tree and the multitude of DNA matches from Sant
    Angelo, AncestryDNA says I am 1% Italian.

      1. My father’s family is from northern Italy for many generations. I show up as 48% Italian. But my 2 children have no Italian DNA (0%), and their Italian DNA appears to show up as French and Northwestern European DNA. DNA Geek: Do you have an explanation?

        1. Thanks for your response. My guess (and it is a guess) is that the DNA of Italians may be influenced by the DNA of non-Italian parents. For example, my mother is Lebanese, so the Middle Eastern DNA may skew the Italian result to the south, resulting in Italian DNA. In contrast, my son’s mother is 93% British/Scottish, so that may skew the result to the north, resulting in French DNA.

  11. My first problem with the results, is that when you re-draw boundaries, then show something like English is up 30%, its kinda silly. English is not up 30%, your comparing apples and oranges. Its a new boundary and completely new result set. I think it is a misleading representation. Second, the updated results for me were such a big change that if I were on the fence with believing the science, this change would actually hurt not help. It basically says Ancestry was way off before, so why believe them now. My takeaway, is use it for matches but not much else.

    1. The boundaries are determined by the genetic populations themselves, not by geopolitical borders, so we expect them to change as the science improves. Science is a process of discovery, so we’ll continue to see changes as the collective knowledge and reference populations improve.

  12. Thanks for the info. Do you think people from England could be in the Germanic Europe group? My dad’s family are all from England and one line from Scotland. My mum’s is from Romania, Hungary and going back to Germany. My mum has tested and she has only 16% Germanic Europe but mine has been refined to 52%.

    1. The population maps do not show the “Germanic Europe” region extending to England. There’s historically been a lot of interchange between those regions, though, so perhaps that accounts for what you’re seeing. Has your father tested for comparison? Or one of his close relatives?

      1. My father hasn’t tested and the closest relative is 3 rd cousin. That person still has the old ethnicities up. Most people seem to have not updated yet. That’s a good clue though, thanks for thinking of it. I will keep checking to see when they have their new ethnicities listed.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.