AncestryDNA’s 2020 Matching White Paper

This post has been updated

On 15 July, 2020, AncestryDNA updated their “Matching White Paper“, which is a detailed document describing how they use our DNA data to match us to our genetic relatives.  The previous Matching White Paper was released in 2016.

On the surface, it seems like finding identical DNA segments shared by two people would be simple.  It’s anything but.  For biological, technical, economic, and computational reasons, all of the genetic genealogy companies have to use sophisticated algorithms to achieve the goal.  In fact, differences in how each company approaches the problem are why you can match the exact same person at different sites and appear to share more or less DNA.

How AncestryDNA Matches Users

As a very broad overview, DNA matching at AncestryDNA involves four main steps:

  1. We have two copies of each autosomal chromosome, but the laboratory technique used by the companies (called a microarray) doesn’t analyze each one individually.  Instead, your two copies of chromosome 1 are analyzed as a unit, your two copies of chromosome 2 are analyzed together, etcetera.  After the fact, the computer algorithm has to determine which data came from your maternal copy of chromosome 1 versus your paternal copy, and so on for each chromosome pair.  This step is called phasing.
  2. After the raw data is phased, people in the database are compared to one another to determine whether they share matching DNA sequences as a result of recent common ancestry. Such segments are called “identical by descent” or IBD to distinguish them from DNA that might appear to be identical by chance or because the matching DNA dates back dozens of generations.  This matching is complicated by the sheer size of AncestryDNA’s database.  There are hundred of trillions of comparisons to be made, and the database is growing all the while.
  3. Segments of DNA can be shared for reasons other than recent common ancestry. For example, there is a cluster of three genes on chromosome 4 around position 38,800,000 that appear to give resistance to the plague.  Two people could share this segment of DNA not because they are recent cousins but because both came from populations that survived the Black Death 1000 years ago.  AncestryDNA applies an algorithm called Timber to adjust for population-level segments, sometimes called pile-ups,
  4. Once AncestryDNA has determined how much DNA two people share, the final step is relationship estimation.  It’s all well and good to say that cousins Tyneka and William share 192 cM of DNA, but what does that mean for how they’re related to one another?  Here, AncestryDNA sorts our matches into broad categories of relationship meant to be a starting point for us to look for the connection in our family trees.  To use our example, Tyneka and William would appear in the 3rd Cousin category in one another’s lists, but they might well be 2nd cousins instead.  (AncestryDNA tends to err on the side of underestimating relationships, so you’re far more likely to see a true 2nd cousin estimated as a 3rd than vice versa.)

So What’s New?

That’s the general overview of how AncestryDNA provides us matches.  What’s changed in this new White Paper?  There are three key updates that you should be aware of.  They go into effect in early August.  Ultimately, they’ll position AncestryDNA to start incorporating NextGen sequence data into their database.

 

The Number of Shared Segments Will Be More Accurate

First, the number of unique segments shared by two people will be more accurate.  Because of a strict matching algorithm, AncestryDNA sometimes reports a single long segment as two separate segments.  This occurs when one of the people being compared has a random error in their data, making it appear that they don’t match at a single spot within the segment when they really do.

The effect is most obvious when comparing a parent and child.  For example, AncestryDNA currently says that my mother and I share 3,475 cM of DNA across 44 segments.

That’s impossible.  I only have 22 autosomal chromosomes, and I match her all the way across each one.  In reality, she and I share 22 segments, each of which is an entire chromosome.  There must be 22 errors in either my data or hers causing some of our chromosomes to match in discrete chunks rather than across the entire thing.  (Given that AncestryDNA analyzes more than 600,000 markers in our DNA, it’s remarkable that there are only 22 such errors!)

The update will allow such “single SNP mismatches” nested within otherwise matching regions to be ignored, so the segment correctly appears as continuous rather than broken in two.  By August, my mom and I should be reported to share 3,475 cM of DNA across 22 segments rather than the current 44.

This is good news but will only affect the subset of genetic genealogists who use the number of matching segments in their work.  I calculate average segment size when working with endogamous populations, so I am very pleased to see this update.  More accurate averages are always better.

 

AncestryDNA Will Report the Length of Longest Shared Segment

Thus far, AncestryDNA has only shown us the total amount of shared DNA and the number of shared segments.  With the pending update, they will also report how long the longest shared segment is.  For many users, this won’t make a difference in how they work with their matches, but for those of us from endogamous populations, this will be a huge benefit.

Endogamy occurs when people marry within the same group for many generations.  This eventually causes complicated webs of relationship, in which individuals are cousins many times over.

Endogamous matches often share far more DNA than would be expected given their closest relationship, because those more distant connections are also adding DNA to the total.  Those additional genetic contributions, though, tend to be very small segments.  That’s why knowing the size of the largest segment is so important.  A match who shares 70 cM with a largest segment of 35 cM is far more likely to be a recent cousin than a match of 70 cM whose largest segment is 12 cM.

 

Minimum Match Raised from 6 cM to 8 cM

(UPDATE: This change has been delayed until early September.)

Currently, our match lists at AncestryDNA include people who share as little as 6 cM of DNA with us.  That minimum will be raised to 8 cM in the new update, meaning many of us will “lose” matches.

I have about 50,000 matches at AncestryDNA, of which roughly 21,000 are below 8 cM.  While it might seem alarming to lose 42% of my matches, in practice it’s not such a bad thing.  First, when a child and both parents have tested, studies show that about 40% of the child’s matches in that range don’t match either parent, meaning they’re false positives. There’s no easy way to tell which matches are false, meaning many of us are being mislead by them.  With these matches, we’re not just chasing ghosts, we’re chasing someone else’s ghosts!

Second, the tiny matches that are valid may represent genetic connections dozens of generations back, ones I’ll never be able to document.  I’ve only managed to connect a small fraction of my closer matches to my tree in the years since I first tested, and more matches are rolling in all the time, so I’ll never be able to systematically analyze those extremely distant matches.

Finally, even though I’ll miss out on some valid matches that might be traceable, I recognize that this compromise will accommodate the ever growing database.  And I’d much rather AncestryDNA invest in growing their database than divert resources to matches I’ll probably never look at.

That said, there are some distant matches I’d like to keep in my list — and I can.  Any match that I’ve messaged, added a note to, starred, or included in a custom group (the color dots) will be retained after the update goes into effect.  Better yet, if I act to retain a tiny match in my list, I will still show in their list after the update, even if they don’t tag me.  In other words, tiny matches will be symmetrical.

 

What Should You Do?

For the first two updates—improved number of segments and longest segment size—you needn’t do anything.  However, if you want to retain your very distant matches, you’ll need to take some extra steps.

Here’s what I’m doing:  First, I’m triaging.  I’m not even trying to preserve every single match below 8 cM. Most of them are either false positives or too distantly related to ever sort out.  There are some, though, that I’d really like to keep.

 

Preserving ThruLines

There’s one particular question I’ve been working on lately: the parentage of my 4th great grandmother Marianne Dykes.  Without documentation, I added a likely Dykes couple to my tree to see whether ThruLines were generated, and they were!  Some of those matches, though, are 8 cM or smaller.

Here, it’s important to know that AncestryDNA rounds the numbers they show us.  A match that’s labeled as 8 cM might be 8.2 cM, above the new threshold, but could just also be 7.5 cM and scheduled to disappear.  For that reason, I’m triaging all matches of 8 cM or below.

To keep them, first I created a custom group for them.  Next, I clicked on the ThruLines for William Dykes and for his wife Phoebe Singleton and viewed the matches in list format.

Then I opened each match of 8 cM or less in a new tab and added them to the Dykes–Singleton custom group.  (Quick tip:  Put an exclamation mark before the group name so it sorts at the top of the list of groups.)

 

Importantly, I did all of this in my mom’s match list and my uncle’s rather than my own.  Their one generation closer to Marianne Dykes than I am, so their matches to the Dykes–Singleton family are more important than mine.  It doesn’t matter if I lose them from my own match list, as long as they’re preserved in my mother’s and uncle’s.

Remember:  triage.

 

Keeping Surnames of Interest

I’m also using the custom filters in the main match list to quickly find and flag potential Dykes–Singleton matches.  I first filtered the centimorgan range to between 6 and 8 cM. (Remember: the ones above 8 cM are not in danger of disappearing soon.)

Then I searched for trees with the surnames Dykes or Singleton and added those matches to the Dykes–Singleton group using the “Add to group” feature. No need to open a tab for each match this time, but be sure to scroll all the way down so you don’t miss anyone.

It took about 15 minutes total to preserve 110 distant matches in my mom’s list and another 107 in my uncle’s.  Of course, they may turn out to be false leads, but I can decide that later.

 

Matches with Common Ancestors

The third group of matches I’d like to preserve are those for whom AncestryDNA has identified common ancestors.  Because I don’t have time to sort these by which branch of my tree they’re on, I created a new custom group called “Common Ancestor”.  Then, using the “Common ancestors” filter and my custom DNA range, I’ve started labeling these matches, too.

I may not get through all of these before they disappear, so I’m starting with the largest ones (8 cM) and working my way down.

Updates to This Post

  • 17 Jul 2020:  Added the approximate position of the Toll-like receptor (TLR) genes that appear to confer resistance to the plague
  • 18 Jul 2020:  Explained that AncestryDNA rounds cM values so matches of 8 cM and below should be triaged; added new clarification from AncestryDNA that starred matches will be retained, and retention is symmetrical
  • 23 July 2020:  The elimination of 6–8 cM matches has been delayed until early September.

38 thoughts on “AncestryDNA’s 2020 Matching White Paper”

  1. I have 150 saved 6cMs matches at Ancestry with CA found in their trees.

    One of the 6 cMs has 3 segments. Duh! What is that, 2 cM, 2 cMs and 2 cMs.

    And, Leah, I think I posted this once before: Many of my matches are 2 generations younger than I. If I put that on a level playing field, I “might have” shared 12 cMs with their parent (depending upon recombination)………

    So, for me 6 cMs can possibly be gold.

    1. A match with a total of 6 cM across three segments means you have pile-ups. Each of those segments was originally 6 cM or larger and has been down-weighted by Timber because they are overrepresented in your matches. Pile-ups (called excess IBD in the scientific literature) are not reliable indicators of recent relationship. I’d be very careful drawing conclusions from those segments.

  2. Thank you for a splendid summary of the changes and for including the Ancestry DNA White Paper for our reading pleasure. I found lots of interesting and useful nuggets while reading the White paper, things that will enhance my understanding of both Ancestry’s process and how to interpret the results.

  3. I have Dykes in my Tree, on my mother’s side out of Suffolk, England – I have traced some who emigrated to South Africa, and a Baldry line that emigrated to the USA (also from Suffolk).

    1. How cool! I don’t know much about this Dykes family, just that they might be the key to my mysterious Marianne. They lived on the border between Mississippi and Louisiana north of New Orleans (had property on both sides of the state line) and my have come from South Carolina before that.

  4. Thanks for the info! Curious about one thing, though… So let’s say I keep John Smith, a 6cM match. Mark him with the dot and he stays once the changes are made to Ancestry. Two months down the line, I contact John about our match and to ask him about a possible connection on his tree. If he didn’t use a color dot for me, then I’m not going to show up as a match on his end – which may deter him from getting back in touch; or replying to say I’m crazy because we don’t match, after all.

    I just wonder how that’s all going to work out in the end….

    1. Good news! If you tag a tiny match, Ancestry will preserve it on the other person’s side, as well. I’ll share more information as it becomes available.

  5. You’ve created a monster!! I’m trying to replicate your search on 6-8cm matches and I keep getting this error message:

    “Our backend services are overtaxed at the moment and we are unable to retrieve all your matches. We apologize for the inconvenience, please try again later.”

    I’m sure it will recover – I’ll try again later! Just thought it was amusing that 5 minutes after I saw your email, 🙂

  6. Regarding adding a group for common ancestors in the 6 to 8 cm range. Giving them a group name will preserve their appearance on the match list. What are your thoughts as to whether that hint itself will be preserved?

  7. ” a random error in their data i the matching segment making it appear that they don’t”

    I think there is a typo in the text?

    Maybe it is an American thing?

  8. This explanation will take some time for me to understand. It’s soooo interesting, thanks for writing the article. I am learning about our dna all the time, and it’s extremely fascinating. In Ancestry I have a half-great aunt to which I am matched, 997cM / 41 segments. I thought that was surely too many segments. Anyway, thanks again!

  9. I was disappointed that the new white paper did not give any additional information about the distributions that you are now using for WATO. They just included the same picture as from the earlier white paper.

    1. Excellent observation! You’re absolutely right. They used the same old graph, although they’ve updated the distributions they’re using.

  10. The note which has just appeared on my Ancestry account says
    “As a result, you’ll no longer see matches (or be matched to people) that share less than 8 cM with you – unless you have added a note about them, added them to a custom group or have messaged them.” I read that as only matches at 6 and 7 cM will disappear, yet you have included 8. Does this mean that all matches at 8 will also disappear?

    1. AncestryDNA rounds the number they show us on the screen. Some matches that show as “8 cM” are safe because they’re 8.0 or higher, but some are, for example, 7.6 cM and scheduled to disappear. Since we can’t see where the list transitions from 8.000 to 7.999, I’m preserving everything that shows as 8 cM. I’ve added a paragraph to the post explaining why I’m using 8.

  11. As expected … You’re the best! Thank You!!
    Concise explanations, great advice on the best courses of action that are well thought out.
    .. Time for me to get busy. I’ve already done the low hanging fruit of CA’s and was about to do surnames, now I’m off to do ThruLines as well.

  12. There’s one other useful thing that Ancestry could do, short of simply making a chromosome browser available. They could report shared cM and longest segment on the X chromosome. As it is, they include the X chromosome in our DNA files, but don’t otherwise appear to make much use of it.

    To illustrate how useful this would be for some of us, I know through 23andMe that I only share 20 cM of my X chromosome with my maternal grandmother. The remainder is from my maternal grandfather.

    Since I’m male, *any* significant sharing on the X chromosome immediately tells me that the segment came from my mother. And since my parents are not related — at least, not within a genealogical time frame — I can also reasonably infer that any autosomal sharing is from the same side.

    But I can actually take it further than that. Through 23andMe, FTDNA, and GEDmatch, I know that only 20 cM of my X chromosome came from my maternal grandmother; all the rest is from my maternal grandfather.

    So even without knowing where a particular shared segment on the X chromosome might be located, if I know it’s much greater than 20 cM I also know that it’s from my maternal grandfather’s side.

    If Ancestry, which tells me I share 465 cM across 15 segments with a predicted 2nd cousin, were also to tell me I share 42.8 cM across 2 segments on the X chromosome, I would immediately know that this match is on my mother’s paternal grandmother’s side.

    In this particular case, it turns out that my mother’s father and my match’s mother’s father were brothers — so we are indeed 2nd cousins.

  13. Wonderful news that without so many 6-7.499 cM matches they will have more room on their servers.
    So they can really help people by extending Shared Matches down to 15cM.
    (They extended it by accident one day around a year ago and I found several matches that are invisible by any other method.)
    But I bet they don’t.

  14. I may be over reacting, but I’m not pleased with Ancestry dropping matches below 8cM. I would rather fish in a pond that I know has fish (even if they are not the type I am looking for), than to have the water taken away from me. This is what I sent to Ancestry…
    ———
    I would like to ask Ancestry to re-think their new DNA matching policy of eliminating DNA matches below 8cM. I’ve looked for proper feedback on the Ancestry page and it is unclear how to make input. I can provide ample examples of family line matches that will be lost in my ThruLine pages due to this new policy. Many of these matches have multiple branches, where DNA matching to one member exceeds 8cM while matching to the second match is less than 8cM. Clearly the two matches in this line have DNA cM much exceeding the 8cM threshold, even though when traced out to my DNA they are in the boundary area. That might not be clear so let me give another example I have seen. I have a Parent/Child DNA match, the parent exceeds 8cM while the child is less than 8cM. With the new matching rules, I will lose the child matching while keeping the parent matching. Why would Ancestry take my family away from me? I have worked so hard to find them.

    I can accept this if you were adding a “Pro” level to the Ancestry payment plan, where the remote DNA matches are maintained, and an increased set of DNA tools offered. I would be satisfied knowing that the information could be made available to me, even if I did not subscribe. Your white paper indicates that a reason for this change is due to database requirements. In fact, this is listed as a first reason. It makes me feel even worse knowing that I am losing many potential DNA matches due to a lack of server infrastructure! Again, charge me a higher tier to keep the same level of service, I am OK with that. Better yet, make it an incentive for people to deeply build out their trees, where after 1000 family tree identities, you get upgraded service and DNA matches of less than 8cM. This solves 2 problems, it incentives people to build their trees deeply and removes the vast majority of your server demands.

    Please Ancestry, re-think this decision. From your white paper:

    “The cutoff of 8 cM was chosen after considering several factors. The first factor is data storage. Since the number of matching segments grows exponentially with decreasing length, we dramatically reduce the storage requirements of our matching database by increasing the cutoff. A second, and more critical, factor is that the accuracy of IBD detection drops rapidly with decreasing IBD length—that is, the shorter the length of the detected IBD segment (expressed in genetic distance), the less likely it is that the detected chromosome segment is truly inherited from a common ancestor.”

    1. I assume you’re sending similar emails to the other databases asking them to lower their thresholds, as they’re all higher than Ancestry’s.

      1. Thanks for the message. First, I would say that you can’t miss what you never had… But, Ancestry’s is most useful for me due to the quality of the trees and use of ThruLines. I’m not complaining about their service, just sad to be losing what I see as valuable information. As an example, for my DNA I have about 50 matches that are connected to ThruLines that are impacted, for my Brother it is 69, and for my Mother it is 29. I am counting 8cM or less, thanks for your pointing out that there is a rounding issue to consider. My Mothers numbers are so low relative to my Brothers and Mine because I have not yet gotten around to try and fuss out the family DNA connections in the pre-1850 census lines. This is the boundary where I see the Ancestry change having the most impact for me… and if I lose a significant portion of the database to work in then it will reduce the options I have to shore up my tree via these remote DNA linkages.

        Ancestry has given me the opportunity to protect the lines I have already formed, and I am thankful for that… but it is the loss of possible future connections that make me concerned.

        Comments?

  15. Thank you for the great article and suggestions. I’m new to DNA so have a question. Wouldn’t any ThruLines matches also show up under the Common Ancestor filter? Or do I need to look at both Common Ancestors and ThruLines to capture all the 6-8 cM folks? Thank you.

    1. Yes, ThruLines matches will also show up under Common Ancestors filter. I did my Dykes–Singleton ThruLines first because I wasn’t sure I’d have time to get to all of my matches with Common Ancestors. Triage! If you do Common Ancestors first, there’s no need to do ThruLines.

  16. I’m saddened by what I’m seeing at AncestryDNA.

    I work on matches on my son’s paternal side. The heritage on that side is mostly African American, and for various reasons it’s difficult to find matches with trees going back multiple generations and are interested in working together.

    I’ve been working with one match for a number of weeks. We’ve made great progress. We know we share a Winston line, the hurdle is connecting the farthest back generation, but we know we’re on the right track per the number of shared matches etc.

    Today the match shared his list with me. On my side, his match has the note Winston. He still appears on my match list, 2 segments totalling 21 cM. This is the primary match, but he had confirmed other relatives managed by me were on his list. Today – nothing. Not the match I see on my side or any of the others he had confirmed. Why would Ancestry have taken them away?

    1. Ancestry isn’t taking away matches that share 21 cM. If this match has disappeared, it may be a glitch. Check back in a day or so.

  17. You mention, “Finally, even though I’ll miss out on some valid matches that might be traceable, I recognize that this compromise will accommodate the ever growing database. And I’d much rather AncestryDNA invest in growing their database than divert resources to matches I’ll probably never look at.”

    Has AncestryDNA told you the reason for the “update” is to reduce resource consumption? How will Ancestry “invest” in growing their database? by more advertising?

    What is the range of storage required for processing the matches of each new kit?

    It would appear there are no economies of scale in a growing DNA segment matching database. Will higher matching cutoffs be needed in the future.

    1. Yes, AncestryDNA’s Matching White Paper says that one reason for the change is computation/storage. From page 14:
      “An important feature of our method is that we do not keep track of all matching segments; in
      step 5, we filter out a candidate match if its genetic distance is less than 8 cM. The cutoff of 8
      cM was chosen after considering several factors. The first factor is data storage. Since the
      number of matching segments grows exponentially with decreasing length, we dramatically
      reduce the storage requirements of our matching database by increasing the cutoff. A second,
      and more critical, factor is that the accuracy of IBD detection drops rapidly with decreasing IBD
      length—that is, the shorter the length of the detected IBD segment (expressed in genetic
      distance), the less likely it is that the detected chromosome segment is truly inherited from a
      common ancestor.”
      https://www.ancestrycdn.com/support/us/2020/08/matchingwhitepaper.pdf

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.