AncestryDNA’s 2020 Matching White Paper

This post has been updated

On 15 July, 2020, AncestryDNA updated their “Matching White Paper“, which is a detailed document describing how they use our DNA data to match us to our genetic relatives.  The previous Matching White Paper was released in 2016.

On the surface, it seems like finding identical DNA segments shared by two people would be simple.  It’s anything but.  For biological, technical, economic, and computational reasons, all of the genetic genealogy companies have to use sophisticated algorithms to achieve the goal.  In fact, differences in how each company approaches the problem are why you can match the exact same person at different sites and appear to share more or less DNA.

How AncestryDNA Matches Users

As a very broad overview, DNA matching at AncestryDNA involves four main steps:

  1. We have two copies of each autosomal chromosome, but the laboratory technique used by the companies (called a microarray) doesn’t analyze each one individually.  Instead, your two copies of chromosome 1 are analyzed as a unit, your two copies of chromosome 2 are analyzed together, etcetera.  After the fact, the computer algorithm has to determine which data came from your maternal copy of chromosome 1 versus your paternal copy, and so on for each chromosome pair.  This step is called phasing.
  2. After the raw data is phased, people in the database are compared to one another to determine whether they share matching DNA sequences as a result of recent common ancestry. Such segments are called “identical by descent” or IBD to distinguish them from DNA that might appear to be identical by chance or because the matching DNA dates back dozens of generations.  This matching is complicated by the sheer size of AncestryDNA’s database.  There are hundred of trillions of comparisons to be made, and the database is growing all the while.
  3. Segments of DNA can be shared for reasons other than recent common ancestry. For example, there is a cluster of three genes on chromosome 4 around position 38,800,000 that appear to give resistance to the plague.  Two people could share this segment of DNA not because they are recent cousins but because both came from populations that survived the Black Death 1000 years ago.  AncestryDNA applies an algorithm called Timber to adjust for population-level segments, sometimes called pile-ups,
  4. Once AncestryDNA has determined how much DNA two people share, the final step is relationship estimation.  It’s all well and good to say that cousins Tyneka and William share 192 cM of DNA, but what does that mean for how they’re related to one another?  Here, AncestryDNA sorts our matches into broad categories of relationship meant to be a starting point for us to look for the connection in our family trees.  To use our example, Tyneka and William would appear in the 3rd Cousin category in one another’s lists, but they might well be 2nd cousins instead.  (AncestryDNA tends to err on the side of underestimating relationships, so you’re far more likely to see a true 2nd cousin estimated as a 3rd than vice versa.)

So What’s New?

That’s the general overview of how AncestryDNA provides us matches.  What’s changed in this new White Paper?  There are three key updates that you should be aware of.  They go into effect in early August.  Ultimately, they’ll position AncestryDNA to start incorporating NextGen sequence data into their database.

 

The Number of Shared Segments Will Be More Accurate

First, the number of unique segments shared by two people will be more accurate.  Because of a strict matching algorithm, AncestryDNA sometimes reports a single long segment as two separate segments.  This occurs when one of the people being compared has a random error in their data, making it appear that they don’t match at a single spot within the segment when they really do.

The effect is most obvious when comparing a parent and child.  For example, AncestryDNA currently says that my mother and I share 3,475 cM of DNA across 44 segments.

That’s impossible.  I only have 22 autosomal chromosomes, and I match her all the way across each one.  In reality, she and I share 22 segments, each of which is an entire chromosome.  There must be 22 errors in either my data or hers causing some of our chromosomes to match in discrete chunks rather than across the entire thing.  (Given that AncestryDNA analyzes more than 600,000 markers in our DNA, it’s remarkable that there are only 22 such errors!)

The update will allow such “single SNP mismatches” nested within otherwise matching regions to be ignored, so the segment correctly appears as continuous rather than broken in two.  By August, my mom and I should be reported to share 3,475 cM of DNA across 22 segments rather than the current 44.

This is good news but will only affect the subset of genetic genealogists who use the number of matching segments in their work.  I calculate average segment size when working with endogamous populations, so I am very pleased to see this update.  More accurate averages are always better.

 

AncestryDNA Will Report the Length of Longest Shared Segment

Thus far, AncestryDNA has only shown us the total amount of shared DNA and the number of shared segments.  With the pending update, they will also report how long the longest shared segment is.  For many users, this won’t make a difference in how they work with their matches, but for those of us from endogamous populations, this will be a huge benefit.

Endogamy occurs when people marry within the same group for many generations.  This eventually causes complicated webs of relationship, in which individuals are cousins many times over.

Endogamous matches often share far more DNA than would be expected given their closest relationship, because those more distant connections are also adding DNA to the total.  Those additional genetic contributions, though, tend to be very small segments.  That’s why knowing the size of the largest segment is so important.  A match who shares 70 cM with a largest segment of 35 cM is far more likely to be a recent cousin than a match of 70 cM whose largest segment is 12 cM.

 

Minimum Match Raised from 6 cM to 8 cM

(UPDATE: This change has been delayed until early September.)

Currently, our match lists at AncestryDNA include people who share as little as 6 cM of DNA with us.  That minimum will be raised to 8 cM in the new update, meaning many of us will “lose” matches.

I have about 50,000 matches at AncestryDNA, of which roughly 21,000 are below 8 cM.  While it might seem alarming to lose 42% of my matches, in practice it’s not such a bad thing.  First, when a child and both parents have tested, studies show that about 40% of the child’s matches in that range don’t match either parent, meaning they’re false positives. There’s no easy way to tell which matches are false, meaning many of us are being mislead by them.  With these matches, we’re not just chasing ghosts, we’re chasing someone else’s ghosts!

Second, the tiny matches that are valid may represent genetic connections dozens of generations back, ones I’ll never be able to document.  I’ve only managed to connect a small fraction of my closer matches to my tree in the years since I first tested, and more matches are rolling in all the time, so I’ll never be able to systematically analyze those extremely distant matches.

Finally, even though I’ll miss out on some valid matches that might be traceable, I recognize that this compromise will accommodate the ever growing database.  And I’d much rather AncestryDNA invest in growing their database than divert resources to matches I’ll probably never look at.

That said, there are some distant matches I’d like to keep in my list — and I can.  Any match that I’ve messaged, added a note to, starred, or included in a custom group (the color dots) will be retained after the update goes into effect.  Better yet, if I act to retain a tiny match in my list, I will still show in their list after the update, even if they don’t tag me.  In other words, tiny matches will be symmetrical.

 

What Should You Do?

For the first two updates—improved number of segments and longest segment size—you needn’t do anything.  However, if you want to retain your very distant matches, you’ll need to take some extra steps.

Here’s what I’m doing:  First, I’m triaging.  I’m not even trying to preserve every single match below 8 cM. Most of them are either false positives or too distantly related to ever sort out.  There are some, though, that I’d really like to keep.

 

Preserving ThruLines

There’s one particular question I’ve been working on lately: the parentage of my 4th great grandmother Marianne Dykes.  Without documentation, I added a likely Dykes couple to my tree to see whether ThruLines were generated, and they were!  Some of those matches, though, are 8 cM or smaller.

Here, it’s important to know that AncestryDNA rounds the numbers they show us.  A match that’s labeled as 8 cM might be 8.2 cM, above the new threshold, but could just also be 7.5 cM and scheduled to disappear.  For that reason, I’m triaging all matches of 8 cM or below.

To keep them, first I created a custom group for them.  Next, I clicked on the ThruLines for William Dykes and for his wife Phoebe Singleton and viewed the matches in list format.

Then I opened each match of 8 cM or less in a new tab and added them to the Dykes–Singleton custom group.  (Quick tip:  Put an exclamation mark before the group name so it sorts at the top of the list of groups.)

 

Importantly, I did all of this in my mom’s match list and my uncle’s rather than my own.  Their one generation closer to Marianne Dykes than I am, so their matches to the Dykes–Singleton family are more important than mine.  It doesn’t matter if I lose them from my own match list, as long as they’re preserved in my mother’s and uncle’s.

Remember:  triage.

 

Keeping Surnames of Interest

I’m also using the custom filters in the main match list to quickly find and flag potential Dykes–Singleton matches.  I first filtered the centimorgan range to between 6 and 8 cM. (Remember: the ones above 8 cM are not in danger of disappearing soon.)

Then I searched for trees with the surnames Dykes or Singleton and added those matches to the Dykes–Singleton group using the “Add to group” feature. No need to open a tab for each match this time, but be sure to scroll all the way down so you don’t miss anyone.

It took about 15 minutes total to preserve 110 distant matches in my mom’s list and another 107 in my uncle’s.  Of course, they may turn out to be false leads, but I can decide that later.

 

Matches with Common Ancestors

The third group of matches I’d like to preserve are those for whom AncestryDNA has identified common ancestors.  Because I don’t have time to sort these by which branch of my tree they’re on, I created a new custom group called “Common Ancestor”.  Then, using the “Common ancestors” filter and my custom DNA range, I’ve started labeling these matches, too.

I may not get through all of these before they disappear, so I’m starting with the largest ones (8 cM) and working my way down.

Updates to This Post

  • 17 Jul 2020:  Added the approximate position of the Toll-like receptor (TLR) genes that appear to confer resistance to the plague
  • 18 Jul 2020:  Explained that AncestryDNA rounds cM values so matches of 8 cM and below should be triaged; added new clarification from AncestryDNA that starred matches will be retained, and retention is symmetrical
  • 23 July 2020:  The elimination of 6–8 cM matches has been delayed until early September.

78 thoughts on “AncestryDNA’s 2020 Matching White Paper”

  1. I have 150 saved 6cMs matches at Ancestry with CA found in their trees.

    One of the 6 cMs has 3 segments. Duh! What is that, 2 cM, 2 cMs and 2 cMs.

    And, Leah, I think I posted this once before: Many of my matches are 2 generations younger than I. If I put that on a level playing field, I “might have” shared 12 cMs with their parent (depending upon recombination)………

    So, for me 6 cMs can possibly be gold.

    1. A match with a total of 6 cM across three segments means you have pile-ups. Each of those segments was originally 6 cM or larger and has been down-weighted by Timber because they are overrepresented in your matches. Pile-ups (called excess IBD in the scientific literature) are not reliable indicators of recent relationship. I’d be very careful drawing conclusions from those segments.

  2. Thank you for a splendid summary of the changes and for including the Ancestry DNA White Paper for our reading pleasure. I found lots of interesting and useful nuggets while reading the White paper, things that will enhance my understanding of both Ancestry’s process and how to interpret the results.

  3. I have Dykes in my Tree, on my mother’s side out of Suffolk, England – I have traced some who emigrated to South Africa, and a Baldry line that emigrated to the USA (also from Suffolk).

    1. How cool! I don’t know much about this Dykes family, just that they might be the key to my mysterious Marianne. They lived on the border between Mississippi and Louisiana north of New Orleans (had property on both sides of the state line) and my have come from South Carolina before that.

  4. Thanks for the info! Curious about one thing, though… So let’s say I keep John Smith, a 6cM match. Mark him with the dot and he stays once the changes are made to Ancestry. Two months down the line, I contact John about our match and to ask him about a possible connection on his tree. If he didn’t use a color dot for me, then I’m not going to show up as a match on his end – which may deter him from getting back in touch; or replying to say I’m crazy because we don’t match, after all.

    I just wonder how that’s all going to work out in the end….

    1. Good news! If you tag a tiny match, Ancestry will preserve it on the other person’s side, as well. I’ll share more information as it becomes available.

  5. You’ve created a monster!! I’m trying to replicate your search on 6-8cm matches and I keep getting this error message:

    “Our backend services are overtaxed at the moment and we are unable to retrieve all your matches. We apologize for the inconvenience, please try again later.”

    I’m sure it will recover – I’ll try again later! Just thought it was amusing that 5 minutes after I saw your email, 🙂

  6. Regarding adding a group for common ancestors in the 6 to 8 cm range. Giving them a group name will preserve their appearance on the match list. What are your thoughts as to whether that hint itself will be preserved?

  7. ” a random error in their data i the matching segment making it appear that they don’t”

    I think there is a typo in the text?

    Maybe it is an American thing?

  8. This explanation will take some time for me to understand. It’s soooo interesting, thanks for writing the article. I am learning about our dna all the time, and it’s extremely fascinating. In Ancestry I have a half-great aunt to which I am matched, 997cM / 41 segments. I thought that was surely too many segments. Anyway, thanks again!

  9. Why does ancestryDNA not provide info on which segment in which chromosone you share with a DNA cousin?

  10. I was disappointed that the new white paper did not give any additional information about the distributions that you are now using for WATO. They just included the same picture as from the earlier white paper.

    1. Excellent observation! You’re absolutely right. They used the same old graph, although they’ve updated the distributions they’re using.

  11. The note which has just appeared on my Ancestry account says
    “As a result, you’ll no longer see matches (or be matched to people) that share less than 8 cM with you – unless you have added a note about them, added them to a custom group or have messaged them.” I read that as only matches at 6 and 7 cM will disappear, yet you have included 8. Does this mean that all matches at 8 will also disappear?

    1. AncestryDNA rounds the number they show us on the screen. Some matches that show as “8 cM” are safe because they’re 8.0 or higher, but some are, for example, 7.6 cM and scheduled to disappear. Since we can’t see where the list transitions from 8.000 to 7.999, I’m preserving everything that shows as 8 cM. I’ve added a paragraph to the post explaining why I’m using 8.

  12. As expected … You’re the best! Thank You!!
    Concise explanations, great advice on the best courses of action that are well thought out.
    .. Time for me to get busy. I’ve already done the low hanging fruit of CA’s and was about to do surnames, now I’m off to do ThruLines as well.

  13. There’s one other useful thing that Ancestry could do, short of simply making a chromosome browser available. They could report shared cM and longest segment on the X chromosome. As it is, they include the X chromosome in our DNA files, but don’t otherwise appear to make much use of it.

    To illustrate how useful this would be for some of us, I know through 23andMe that I only share 20 cM of my X chromosome with my maternal grandmother. The remainder is from my maternal grandfather.

    Since I’m male, *any* significant sharing on the X chromosome immediately tells me that the segment came from my mother. And since my parents are not related — at least, not within a genealogical time frame — I can also reasonably infer that any autosomal sharing is from the same side.

    But I can actually take it further than that. Through 23andMe, FTDNA, and GEDmatch, I know that only 20 cM of my X chromosome came from my maternal grandmother; all the rest is from my maternal grandfather.

    So even without knowing where a particular shared segment on the X chromosome might be located, if I know it’s much greater than 20 cM I also know that it’s from my maternal grandfather’s side.

    If Ancestry, which tells me I share 465 cM across 15 segments with a predicted 2nd cousin, were also to tell me I share 42.8 cM across 2 segments on the X chromosome, I would immediately know that this match is on my mother’s paternal grandmother’s side.

    In this particular case, it turns out that my mother’s father and my match’s mother’s father were brothers — so we are indeed 2nd cousins.

  14. Wonderful news that without so many 6-7.499 cM matches they will have more room on their servers.
    So they can really help people by extending Shared Matches down to 15cM.
    (They extended it by accident one day around a year ago and I found several matches that are invisible by any other method.)
    But I bet they don’t.

    1. Shared Matches down to 8cM would make them 1) competitive with MyHeritage and 2) provide complete details on shared matches without needing access to other people’s tests.

  15. I may be over reacting, but I’m not pleased with Ancestry dropping matches below 8cM. I would rather fish in a pond that I know has fish (even if they are not the type I am looking for), than to have the water taken away from me. This is what I sent to Ancestry…
    ———
    I would like to ask Ancestry to re-think their new DNA matching policy of eliminating DNA matches below 8cM. I’ve looked for proper feedback on the Ancestry page and it is unclear how to make input. I can provide ample examples of family line matches that will be lost in my ThruLine pages due to this new policy. Many of these matches have multiple branches, where DNA matching to one member exceeds 8cM while matching to the second match is less than 8cM. Clearly the two matches in this line have DNA cM much exceeding the 8cM threshold, even though when traced out to my DNA they are in the boundary area. That might not be clear so let me give another example I have seen. I have a Parent/Child DNA match, the parent exceeds 8cM while the child is less than 8cM. With the new matching rules, I will lose the child matching while keeping the parent matching. Why would Ancestry take my family away from me? I have worked so hard to find them.

    I can accept this if you were adding a “Pro” level to the Ancestry payment plan, where the remote DNA matches are maintained, and an increased set of DNA tools offered. I would be satisfied knowing that the information could be made available to me, even if I did not subscribe. Your white paper indicates that a reason for this change is due to database requirements. In fact, this is listed as a first reason. It makes me feel even worse knowing that I am losing many potential DNA matches due to a lack of server infrastructure! Again, charge me a higher tier to keep the same level of service, I am OK with that. Better yet, make it an incentive for people to deeply build out their trees, where after 1000 family tree identities, you get upgraded service and DNA matches of less than 8cM. This solves 2 problems, it incentives people to build their trees deeply and removes the vast majority of your server demands.

    Please Ancestry, re-think this decision. From your white paper:

    “The cutoff of 8 cM was chosen after considering several factors. The first factor is data storage. Since the number of matching segments grows exponentially with decreasing length, we dramatically reduce the storage requirements of our matching database by increasing the cutoff. A second, and more critical, factor is that the accuracy of IBD detection drops rapidly with decreasing IBD length—that is, the shorter the length of the detected IBD segment (expressed in genetic distance), the less likely it is that the detected chromosome segment is truly inherited from a common ancestor.”

    1. I assume you’re sending similar emails to the other databases asking them to lower their thresholds, as they’re all higher than Ancestry’s.

      1. Thanks for the message. First, I would say that you can’t miss what you never had… But, Ancestry’s is most useful for me due to the quality of the trees and use of ThruLines. I’m not complaining about their service, just sad to be losing what I see as valuable information. As an example, for my DNA I have about 50 matches that are connected to ThruLines that are impacted, for my Brother it is 69, and for my Mother it is 29. I am counting 8cM or less, thanks for your pointing out that there is a rounding issue to consider. My Mothers numbers are so low relative to my Brothers and Mine because I have not yet gotten around to try and fuss out the family DNA connections in the pre-1850 census lines. This is the boundary where I see the Ancestry change having the most impact for me… and if I lose a significant portion of the database to work in then it will reduce the options I have to shore up my tree via these remote DNA linkages.

        Ancestry has given me the opportunity to protect the lines I have already formed, and I am thankful for that… but it is the loss of possible future connections that make me concerned.

        Comments?

    2. I agree. From a 6cM match I broke through the ‘brick wall’ (and my reason for joining ancestry 20 years ago. I was able to confirm that the family I suspected I was descended form was correct. All from a 6cM match at the beginning of 2020. I now have many matches with people descended from people born generations before which gives me 100% certainty of the accuracy. Over the moon with joy!

      1. Unfortunately, even if the relationship is correct, you can’t consider the 6-cM match evidence for it until you’ve ruled out all other possible transmission paths between you and the other person. That means building out and confirming every line on both trees several generations past the MRCA you’re trying to prove.

  16. Thank you for the great article and suggestions. I’m new to DNA so have a question. Wouldn’t any ThruLines matches also show up under the Common Ancestor filter? Or do I need to look at both Common Ancestors and ThruLines to capture all the 6-8 cM folks? Thank you.

    1. Yes, ThruLines matches will also show up under Common Ancestors filter. I did my Dykes–Singleton ThruLines first because I wasn’t sure I’d have time to get to all of my matches with Common Ancestors. Triage! If you do Common Ancestors first, there’s no need to do ThruLines.

  17. I’m saddened by what I’m seeing at AncestryDNA.

    I work on matches on my son’s paternal side. The heritage on that side is mostly African American, and for various reasons it’s difficult to find matches with trees going back multiple generations and are interested in working together.

    I’ve been working with one match for a number of weeks. We’ve made great progress. We know we share a Winston line, the hurdle is connecting the farthest back generation, but we know we’re on the right track per the number of shared matches etc.

    Today the match shared his list with me. On my side, his match has the note Winston. He still appears on my match list, 2 segments totalling 21 cM. This is the primary match, but he had confirmed other relatives managed by me were on his list. Today – nothing. Not the match I see on my side or any of the others he had confirmed. Why would Ancestry have taken them away?

    1. Ancestry isn’t taking away matches that share 21 cM. If this match has disappeared, it may be a glitch. Check back in a day or so.

  18. You mention, “Finally, even though I’ll miss out on some valid matches that might be traceable, I recognize that this compromise will accommodate the ever growing database. And I’d much rather AncestryDNA invest in growing their database than divert resources to matches I’ll probably never look at.”

    Has AncestryDNA told you the reason for the “update” is to reduce resource consumption? How will Ancestry “invest” in growing their database? by more advertising?

    What is the range of storage required for processing the matches of each new kit?

    It would appear there are no economies of scale in a growing DNA segment matching database. Will higher matching cutoffs be needed in the future.

    1. Yes, AncestryDNA’s Matching White Paper says that one reason for the change is computation/storage. From page 14:
      “An important feature of our method is that we do not keep track of all matching segments; in step 5, we filter out a candidate match if its genetic distance is less than 8 cM. The cutoff of 8 cM was chosen after considering several factors. The first factor is data storage. Since the number of matching segments grows exponentially with decreasing length, we dramatically reduce the storage requirements of our matching database by increasing the cutoff. A second, and more critical, factor is that the accuracy of IBD detection drops rapidly with decreasing IBD length—that is, the shorter the length of the detected IBD segment (expressed in genetic distance), the less likely it is that the detected chromosome segment is truly inherited from a common ancestor.”
      https://www.ancestrycdn.com/support/us/2020/08/matchingwhitepaper.pdf

  19. Ok, I have 2 Questions, my biological aunt and I have 1,436 now across 57 segments, as opposed to 82 segments. Is 57 segments still too many?

    Secondly, my 2nd highest match(I haven’t found how this match is exactly related to me, though my theory is this is on my mom’s side because it’s not a related match to my aunt who’s my dad’s sister) by centimorgans is 263 centimorgans across 7 segments(it was 15 before), does the decrease in segment amount mean that this is a more accurate match? I just got my results this June so I’m learning how to interpret all this.

    1. 57 is about right for an aunt, perhaps a little on the high end but not suspiciously so. The change in segments for the 263-cM match doesn’t really make a difference. The centimorgan amount is the more important number.

  20. I like the premise of the paper, in reality it is not 100% foolproof when discussing the 6-8 cm matches or any matches. My recently passed on 1 month shy of 90 who drove his own ancestry test to the post office 4 years ago (after doing a paternal test with his son on familytree dna without any single paternal surname match outside of their own (Lewis in colonial VA) showed some promising matches on ancestry DNA. Being at IT professional with no DNA training, I can clearly see that he inherits DNA in matching families randomly, i.e. from one of a daughters of grand-grandparents, 19cm is strongest, but jump up 2 generations to common grandmother and matches in 110 cm+.

    All those children are 100% legitimate. My DNA’s sample son has a very similar look to an uncle of his father’s paternal great-mother marernal side (Dills), that his father does not share. That is the family side they share off the charts. Another side, which is Texas 300, is hit or miss. With this ancestry “downgrade” we are losing 10 or so cousins on the Pentecost / Pharr side ho are 3rd x1 or x2 removed, because neither of my Lewis’ inherited that DNA. But we have random DNA spikes to generations way back on that family side far exceeding the recent matches. And all the cousins who will be lost have dozens of matches in “shared mateches” 4-6 cousin match.

    How can you quantify the number of missed connections in 6-8? DNA wont show us the bottom? Nor am I convinced they are all ghosts, fake matches, etc. It seems to me (IT training et al) that ancestry’s motive in reducing the matches is to alleviate drag on their
    resources that even us paid customers are seeing every day with “try again” messages. I posted on a public ancestry board that maybe they should just remove people who do not have trees from our matches (no matter what DNA level) but someone on ancestry decided to just wipe out my message (have proof).

    Being a DNA geek, please tell me what this “longest segment” tells me. IMHO. 0. It is useless. Now if I have this 19 or 18 cm in Australia or Canada, or that cousin that shares African ancestry (and we have many), if when I hit shared matches I can see each
    and every person that matches, (not just 20 cm and above) maybe I can triangulate and find the matching family?

    That said. My 90 year old dearly missed DNA participants 4-6 cousins will be in early 1700s. We gained 1-2 generations on our Lewis side. With most families here well before 1776, with this downgrade, now what? I say lose people who have no trees and no active DNA no matter what the match level. Keep 6-8 with trees. IMHO.

    1. Empirical studies show that roughly half of segments in the 6–8 cM range are false positives. That happens across the board, at all of the testing companies. And the scientific paper published by Speed and Balding in 2015 estimated that, of the fraction of small segments that really are IBD, more than half date back more than 20 generations and more than 80% date back more than 10 generations. In other words, a matching segment of 7 cM has less than a 10% chance of being both valid and traceable. You cannot use such a segment as evidence for a relationship unless you can rule out it being a false positive and also that it couldn’t possibly have been inherited via any other path. How many genetic genealogists have proven trees that date back 20 generations on every single line?
      You can read more about the science of small segments here: https://isogg.org/wiki/Identical_by_descent#False_positive_matches

      Knowing the longest segment size helps with a different problem: endogamy. Those of us from endogamous backgrounds often share far more DNA with our matches than we should given the closest relationship because we’re related in so many different ways. A match who shares 100 cM with a longest of 40 cM is much more likely to be a 3rd cousin than a match who shares 100 cM with a longest of 15 cM.

  21. Ancestry is now showing the longest segment. For many matches, the longest segment length is longer than the total segment lengths of all segments. For example, the match might be 8.1cM and the longest segment is listed as 12cM. Is this an error, or do they use “softer” criteria for determining that longest segment?

    1. I’ve seen examples like you describe that adjusted themselves after a day or two, so it may just be the algorithm settling in. Alternatively, it may be that the longest segment they’re showing is before Timber and the total is post-Timber. I’ll try to find out for you.

  22. I asked Ancestry the same question.
    Here is the reply
    “When the length of the longest shared segment is longer than the total shared it is because we adjust the length of shared DNA to reflect DNA that is most likely shared from a recent ancestor. ”

    So I’m none the wiser whether the longest segment is real or something Ancestry has made up

    1. They’re adjusting the total using an algorithm called Timber. The longest segment is real, but part of it is in a region where you have a ton of matches who all match you just in that one spot. Those so-called “pile-ups” happen when the segment was widespread in the ancestral population, so people descended from the population share the segment even though they descend from different people in the ancestral population. Timber corrects for pile-ups. If, for example, you had a match who shared a single segment, with a total of 20 and a longest of 22, that means Timber down-weighted the segment by 2 cM because there’s a pile-up in there somewhere. I know it’s confusing, but neither figure is made up.

      1. I have one match which Ancestry says we have 12cM, but longest segment is 37 !
        This match has uploaded to Gedmatch as I have too. And that shows that we have matches over 3 segments, all close together on the same chromosome. They are 7.8, 14.6 and 11.9, totalling 34.3 (longest segment 14.6). This is with threshold set at 6cM. Setting to 4, reveals a few more smaller segment, which are probably irrelevant.
        This match’s brother has also tested. My match is identical to the brother both on Ancestry and Gedmatch in terms of number of cM. One segment is very slightly different in end position, but only in last three digits out of 8.

        1. It’s almost impossible that you would share three segments that are all be back-to-back on the same chromosome. It’s more likely that it’s a single segment that GEDmatch is misinterpreting as three.

      2. I don’t fully understand that answer.

        I have a match for example where the overall cM is 74 but the longest segment is 82 cm and it’s only across one segment.

        What I’m asking is, Are matches like this Identity by Descent or Identify by State?

      3. Thanks for the explanations regarding the longest segment. It does makes sense. Early on when testing at FTDNA (where you can see the individual matching segments) I went down a rabbit hole chasing matches where we shared one of these “pile-up” areas on chromosome 5.

        1. Hi Terry

          Are there known pile ups on Chr 5? I have quite a few matches on that Chr but I also have documented matches which make a lot of sense.

          Oliver

  23. I have a question from our DNA SIG. some people are marking their AncestryDNA 8cm or Fewer people and they are finding statements like “Shared DNA: 7.7cM across 1 Segments, Longest Segment 20cM” more than one person had statements like that and more than once. Why are these long segments larger than the shared DNA match? Similar to Terry’s question. (What is TIMBER?)

    1. In a case like that, the segment measures 20 cM but is in what’s called a pile-up region, where hundreds or thousands of other people all match in that one spot and only in that one spot. Pile-ups are not indicative of recent ancestry rather that you and the other matches all come from an ancestral population where that segment was extremely widespread. Ancestry’s Timber algorithm down-weights those segments because they can be misleading for genealogy.

      1. How does that work with a single segment though? I have a 47 cM single segment match that Timber cuts to a 12 cMs single segment — I don’t understand how that works.

        1. Timber down-weights segments proportionally to how large the pileup is. The 47-cM segment must occur in a fairly large pile-up region for you.

      2. Ok, because I asked you similarly, if I understand you correctly, with smaller cM matches if the segment is longer than the overall length of cM match, it doesn’t mean it’s a true recent match, it’s from more of Endogamous match. In other words, it’s more that we mostly share the genetic community and it’s definitely not a recent relationship.

        That why I asked about 74 cm match with the 82 cm longest segment match across one segment. Because I saw that this was happening on matches at 90cM and below. But I get it now. Those type of matches matter when it’s smaller matches when discerning true ancestral matches and matches that are more regional. I think I get it now.

        1. Pileups (what Timber tries to address) aren’t exclusive to endogamous populations, but they’re definitely more common in them. The goal of Timber is to down-weight those segments to the point that they are better gauges of the true relationship.

  24. HI
    Very interesting article. Thanks.

    I have one match where the shared match is a single segment of 12 cMs but the longest segment is a single segment of 47 cMs. I have documented the match and it’s by descent. Why is it reduced by timber when it’s a single matching segment ? Can there be pileups on part of a single matched segment?

    1. Yes, it was reduced by Timber, and yes, pileups can occur in larger IBD segments. Ancestry will be adding a feature to show us the pre- and post-Timber totals, so that should help people to understand better the impact of Timber on their matches.

  25. Thanks for your reply. Appreciate it. Not sure that I do have a lot of matches on that Chromosome. Not too many on the other sites anyway. The match is to a fifth cousin.
    If the match was reduced to zero I’d find it easier to understand as then there would be no match.

    Moving it further back is puzzling. I have other matches in the same line that are documented. And I have other matches to the same family line that aren’t changed at all By timber.

    Can’t figure it. I can understand the reasoning but I can’t figure the application.

    Thanks

  26. I have an Ancestry DNA match with a 2nd cousin but we only share 32 cM across 3 segments, which I understand is extremely low for this relationship, and almost outside its parameters. 

    My son doesn’t even show up as a match which puzzles me as I would have expected a 2nd cousin 1x removed to share an average of about 1.5% DNA which equates to between 40 and 180 cM.

    Could my son’s shared DNA fall outside Ancestry’s new cut off point of 8 cM, or is there any other explanation?

    1. With a match that low, it’s not unusual for a child to not match the cousin. Your son may simply have not inherited those three segments. The bigger issue is how little you share with a 2nd cousin. That’s outside the known range for the relationship. I’d want to confirm on both ends that the trees are correct.

      1. I can understand that it could be possible that my son may not have inherited the three segments I share with my 2nd cousin (who is female). I know her extremely well and can virtually guarantee the veracity of our tree. Her maternal grandmother and my paternal grandfather were full siblings. I did wonder if matching my all-male line with her all-female line could be a reason why the match is so extremely low.

        1. No, the genders of the lines connecting you didn’t affect the match. Is it possible your grandparents were half siblings?

  27. Thank you for your help. I was convinced that our grandparents were full siblings, but working on the assumption that the DNA doesn’t lie, then I have to accept the possibility that they might have been half siblings. Researching this could prove very interesting!

    1. Another possibility is that your great grandparents raised a grandchild as their own. That would add a generation, making you 2C1R rather than 2C.

    2. I would also want to look at what the unweighted sharing is. Potentially, it could be as high as 89 cM and still not be safe from “adjustment” by Timber.

      I have three matches with unweighted sharing of 89 cM. They were “adjusted” to 56 cM, 42 cM, and 38 cM.

      The 42 cM match — which is reportedly across 5 segments — is to a known 3rd cousin, and the “longest segment” by itself is greater than this total (49 cM). Obviously, with 5 segments the adjustment to “longest segment” actually had to have been much more than just 7 cM.

      But the really funny part is that Ancestry reports my daughter’s sharing with this same person as 62 cM across four segments. The “unweighted cM” is 85 — just 4 cM less than my “unweighted cM”; and the “longest segment” is 48 cM.

      So, did Timber just “miss” more “excess sharing” with my daughter than with me? Or is Ancestry’s reliance on Timber simply misplaced? So far, this seems to be my experience — time and time again.

  28. I’ve been told that the Ancestry Timber algorithm excludes certain common DNA elements that other companies include and therefore often show lower match results. I tested with Ancestry and also uploaded my DNA results to My Heritage. My highest match in MyH is 72 cM whereas the same person appears in my Ancestry matches as only 34 cM. How can there be such a difference?

    Also, if I then use DNA Painter to check for possible relationship, should I use 72 or 34 cM as my input value?

    1. Short answer: I use the Ancestry centimorgan amount when I have a choice.

      Longer answer: Yes, the Timber algorithm down-weights some segments when they are over-represented in your matches. When that happens, it’s usually because you and your matches all descend from an ancestral population where that segment was particularly widespread. It’s a known phenomenon; not something AncestryDNA made up. Ancestry calls those segments “pile-ups” while population geneticists call them “excess IBD”. Those widespread segments trace back hundreds of years, so they’re not indicative of a traceable genealogical relationship.

      MyHeritage, on the other hand, uses a process called imputation that can artificially inflate segment sizes. They do it so that they can take uploads from different companies that used different testing platforms (called chips) that aren’t completely compatible with one another. As a result, they often over-estimate the amount of shared DNA.

  29. Actually, it turns out that Timber can be **very** inconsistent in handling so-called “excess sharing” from one generation to the next. One problem, of course, is that the determination of what constitutes “excess sharing” is purely statistical in the first place.

    This is one reason for the 90.0 cM limitation on the first place. Ancestry knows that without such a limitation, even parents and their offspring would often show extremely large amounts of “excess sharing”, even though parents and offspring share across the entire genome. The truth is, if particular regions are deemed to have “oversharing”, that will be true regardless of how close the actual relationship is. They will still be discounted, if not for the 90.0 cM restriction.

    I have a total of 80 “protected relationships” at Ancestry thanks to the 90.0 cM restriction. In the vast majority of these cases, I know exactly what the relationship is. But though there is no adjustment to the amount of sharing in these cases, in many instances there is a significant adjustment in just the next generation.

    I’ve looked at the sharing results for my daughter and 79 of the 80 (79 because one of the 80 is herself). Her total for shared cM has been reduced by Timber for at least 43 of these matches — possibly 44. This actually represents *all* of those cases for which it was allowed (that is, where she herself didn’t share at least 90.0 cM).

    For only 11 of these matches was the reduction by only 10 cM or less. For 11, the reduction was between 11-20 cM. 10 matches had sharing reduced by between 21-30. 6 were reduced between 31-40, and 4 matches were reduced by 41 cM or more.

    A match who shares 136 cM with me in 6 segments, with a longest shared segment of 47, reportedly only shares 35 cM with my daughter. In fact, their unweighted shared is actually 83 cM, so Timber labeled 48 cM of their match as “excess sharing”.

    In fact, this match and I are actually 3rd cousins. I also share with his mother — my 2nd cousin once removed — to the tune of 193 cM in 7 segments, with a longest shared segment of 54 cM. My daughter shares with her as well, for 140 cM across 5 segments, with a longest shared segment of the same length as mine.

    Yet in the next generation, Timber has suddenly found 48 cM of “excess sharing”, even though both of these offspring inherited every bit of the unweighted amount from their respective parents. There has been no magical introduction of “excess sharing” here.

    I will also note that in 4 cases, matches whose sharing with me were protected by the 90.0 cM restricted were reduced to less than 20.0 cM in the case of my daughter, meaning she would not even appear as a shared match. I only discovered this fact going through her matches one-by-one.

    This includes a match with another of my 2nd cousins once removed, which is 134 cM in 6 segments for me but only 19 cM in 3 segments for my daughter. For both of us, the longest shared segment is 48 cM!

    I have not yet completed the same investigation for my wife and daughter, but I already know that there are some similar examples.

    1. Thanks for taking the time to share your analysis! I agree that Timber yields inconsistent results above and below the 90-cM threshold.
      The real question we should be asking, though, is whether the pre- or post-Timber values better predict the correct relationship, especially when it isn’t know a priori. That can only be gauged with large-scale data, because the predictive ability of matches below 90 cM is weak to begin with.

      1. I’d much rather have a match come to my attention in the first place and be allowed to assess whether it *might* be misleading me as to the relationship, than have Ancestry make that judgment for me. The latter approach simply means I might never see the match at all.

        Even well under the 90.0 cM restriction placed on Timber, I’m seen countless examples of matches in which neither my daughter’s mother nor I show up in her shared match list.

        Fortunately, I have almost always been able to find which of us is the connecting parent by doing a direct comparison. It turns out that Timber has typically just adjusted reported sharing to below the 20.0 cM threshold.

        For example, my daughter has a match with a post-Timber sharing amount of 34 cM (in three segments); pre-Timber sharing is shown as 49 cM, and the longest segment is 31 cM.

        Neither of her parents shows up as a shared match, but on direct comparison my post-Timber sharing with this person is reported as 16 cM (in two segments). My pre-Timber sharing is shown as 60 cM, with a longest shared segment of 49 cM.

        This match is actually my 3rd cousin once removed — 4th cousin to my daughter. I also share with her father. My sharing with him is reported as 42 cM (post-Timber) and 89 cM (pre-Timber) — so basically it just missed “safety”. This is in five segments, with a longest shared segment of 49 cM.

        (Interestingly, on Ancestry’s v1 chip the total sharing was reported as 92 cM.)

        At the same time, my daughter’s sharing with this match is reported as 62 cM (pre-Timber) and 85 cM (post-Timber), also in 5 segments. In this case, the longest shared segment is reported as 48 cM — so very little different from my longest shared segment.

        I also share with three sisters of this last match of mine, for 31 cM/47 cM/29 cM (post-Timber/pre-Timber/longest shared segment); 35 cM/63 cM/24 cM; and 107 cM. Obviously, there’s no Timber adjustment for the 3rd sister, and the longest segment I share with her is 53 cM.

        For good measure, I have a match who tested at both Ancestry and 23andMe who is related to both of us. She’s a 3rd cousin once removed to me, and a 2nd cousin once removed to the siblings of the preceding paragraph (and their brother). The ancestors for all of us are my Y-line 2nd great grandfather and his wife; the siblings are all descended from one of this couple’s daughters.

        Anyway, Ancestry reports my sharing with this final match as 28 cM in two segments. That’s of course post-Timber. Unweighted sharing is 58 cM, with a longest shared match of 45 cM.

        At 23andMe, total sharing is the same as Ancestry’s unweighted sharing, 58 cM. But at 23andMe I can see that both segments are on the same chromosome, not terribly far apart. One is 12 cM in length, while the other is 45 cM.

        But … at 23andMe I can also compare my late father to this match. My father’s sharing totals 70 cM in three segments, which includes the same to segments I share, plus one additional. He and the match are 2nd cousins twice removed — that is, he and the match’s grandfather would be 2nd cousins.

        I actually have some more DNA matches with other members of this same family — that is, other descendants of my 2nd great grandparents. But I’ll leave off for now.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.