A pinhole camera is being used to take pictures of someone 's nose.

Low Matches Lie

June 3, 2024 thednageek 38d Comments

Once you get below 20 cM, a match is more likely to be a 10th cousin than a 4th cousin.

Sounds nuts, right? How is that possible when most of our distant cousins don’t share any DNA at all? Let me explain.

It’s absolutely true that beyond 2nd cousins, some of our biological relatives will not share measurable autosomal DNA with us. There’s about a 7% chance that a true 3rd cousin will not match. For 4th cousins, it’s about 50%. Only about 16% of our 5th cousins will match, and the odds of matching obviously go down from there.

On the flip side, we have a lot more 5th cousins than 4th, a lot more 6th cousins than 5th, and so on. Using a very simple assumption of 2.5 children per couple over the generations, we’d expect to have about 938 4th cousins, roughly 586,000 8th cousins, and more than 4.6 million 10th cousins. It turns out that even if only a tiny fraction of our 10th cousins share DNA with us, that’s still more than half of 938.

This concept can be difficult to intuit, so I performed some computer simulations to give you a better feel for the numbers.

Ped-sim Simulations

Ped-sim (short for “pedigree simulator”) is an open-source computer program released by the laboratory of Dr Amy Williams at Cornell University. (Dr Williams is now a senior scientist at 23andMe.) Ped-sim does not have a graphical user interface, so it’s not for the layperson. However, it has features that are useful to us:

It’s lightning fast.
It can simulate shared DNA amounts for any pedigree.
It can account for the fact that the crossover rate is higher in women than men. Crossing over is how DNA segments break down from one generation to the next.
It can account for a phenomenon called crossover interference, which limits how closely two crossovers can happen to one another in a given generation.

On the other hand, the default genetic map for Ped-sim is smaller than the maps the genealogy companies use (Bhérer et al., 2017). For example, at AncestryDNA, a parent–child match is about 3,470 cM, whereas with Ped-sim’s map, it would be 3,346 cM.

Because you can’t simply scale up a centimorgan any more than you can scale up a mile, I created a new genomic map modeled on the Ped-sim one but with the AncestryDNA centimorgan total. This map would be useless for biomedical research, but it should be fine for our purposes. All we care about are the sizes of the DNA segments, not precisely where they start and stop.

For each relationship, I simulated 50,000 matches using my custom map with sex-specific crossover rates, crossover interference, and a 7-cM minimum segment size. I simulated relationships from grandparent grandchild down to 10th cousin, then summarized the data in this spreadsheet.

See for Yourself

Let’s return to my statement about low matches, those who share less than 20 cM. Focus on spreadsheet rows 20–22, which are highlighted in lavender. These calculations assume 2.5 children per generation. For each cousin level. row 20 shows the average number of cousins, row 21 shows the average number who will share measurable DNA, and row 22 shows the average number who will share between 7 and 20 cM.

On average, we have 938 4th cousins, of which about 471 will match and 262 will share between 7 and 20 cM. (Some will share more than 20 cM.) In that same centimorgan range, we’ll have approximately 900 6th cousins, 1,242 8th cousins, and 2,344 10th cousins. In other words, our sub-20 matches contain about nine times as many 10th cousins as 4th cousins.

Be Skeptical of Small Segments

Good genealogy is all about evidence, not just the easy stuff that falls in our laps, but what we find after reasonably exhaustive research. You can’t assume that a funeral record for “Mary Smith” in a city full of Marys and Smiths is proof about your 3rd great grandmother until you’ve vetted it: year, age, race, neighborhood, religious denomination, maiden or married surname, next of kin, and so on. You need to demonstrate that the record is not for some other Mary Smith before you can claim it for your Mary Smith,

The same goes for DNA.A shared segment is not evidence for any given relationship until you’ve shown that you could not have inherited it through any other relationship path.

This is true even if traditional genealogy has led you to a common ancestor with your match. You might truly be a 4th cousin to that match through Mary Smith, but one or both of you could have inherited a shared DNA segment via different ancestors. In other words, the segment is not evidence for the relationship until you can show that you both inherited it from Mary and not some other ancestor. That’s especially challenging when you consider that a segment less than 20 cM could have come from a 10th great grandparent on a completely different line for either of you.

The truth is, most of us stop looking for connections to our matches once we find one, and we’re more likely to find a 4th cousin connection than a 10th cousin one, even though the 10th cousin relationship is statistically more likely. (I’m guilty of this too!) But finding a genealogical connection isn’t the same thing as proving that the DNA match came via that connection. If you use the paper trail to argue that the segment came from Mary Smith, and the segment to validate the paper trail, you are guilty of circular logic.

To understand why this is important, we need to think about the lives of our ancestors. Travel was challenging. Social norms restricted associations. Even immigrants tended to move in extended family units. That is, our ancestors from any given place were probably all related to one another. In fact, in 2018, scientists from MyHeritage reported in a scientific paper that married couples between 1650 and 1850 tended to be 4th cousins to one another (Kaplanis et al., 2018).

In other words, Mary Smith was probably related to her husband, and her children were probably related to their spouses, as were her parents and siblings and cousins and so on. A shared segment with another descendant of Mary has many alternate paths to follow.

What Good Are Small Matches?

Small matches can point you to a geographic region or possibly even an extended family. But they can’t tell you who, exactly, your ancestor was. For that, you;ll need traditional research. Even if the documentation pans out, you still can’t assume that the segments in question came from that ancestor.

For example, I have a 4th great grandmother named Marianne Dykes of unknown parentage.My family also has very distant DNA matches (7–32 cM) who are descended from a couple named William Dykes and Phoebe Singleton from eastern Louisiana. I can find no record that they had a daughter named Marianne, but Phoebe’s father was in the process of moving to Cajun Louisiana when he died, so I have a plausible connection between the two regions. Does that prove who Marianne’s parents were? No, I’ll need records for that. Those small matches did draw my interest to eastern Louisiana, though, and that’s a clue I never would have found otherwise.

The simulations provide other valuable insights, which I’ll explore in future posts.

Share on Facebook

38 thoughts on “Low Matches Lie”

Valorie A. Zimmerman says:

June 3, 2024 at 1:30 pm

Thanks for the post! The further back my MRCAs are with the match I’m researching, the more focus on following the spouse lines too, for this very reason. I’ve made a tree tag on Ancestry “MultipleRelationships” as soon as I find a second one, and often there are more which are easily found – if I look. And these days, I always look. And then look again.

Reply
Ellen Basler says:

June 3, 2024 at 1:44 pm

For once it is useful that my paternal grandparents were born thousands of miles away from each other. Both, ultimately Rhine Valley, though.

It helps to have first cousins testing. The largest red flag is actually IF a match has DNA in common with all cousins.

It helps to know segments and to plot them on DNAPainter. If the cousins share the same pieces and a couple of matches share the same pieces, it suggests I am on the right track.Three cousins only have UK ethnicity on their other side. I have to be cautious there.

I tend to think Occam’s razor has a place.

Reply
1. thednageek says:
  
  June 3, 2024 at 7:26 pm
  
  If by Occam’s razor you mean that the MRCA is the source of the segment, I fear that’s can get you into trouble. Sorting matches by the four grandparent lines is pretty straightforward if you don’t have endogamy. That narrows the possibilities for alternate inheritance paths a great deal.
  
  Reply
Dewi Jones says:

June 3, 2024 at 3:31 pm

Thank you for a very interesting post. I note that the model is very sensitive to the assumed number of children each couple has. On what basis has 2.5 children been calculated?
It’s not the number of children born to each couple that is important, but the the number of children born who have descendants to my generation.
I would have guessed a higher figure than 2.5, but it would just be a guess.

Reply
1. thednageek says:
  
  June 3, 2024 at 7:30 pm
  
  Fabulous insight! Yes, what matters is the number of children that leave descendants of their own. I used 2.5 children/couple because I’m pretty sure that’s what AncestryDNA used for their probabilities, and it’s a fairly standard population growth rate. (Long story: I tried for years to get a straight answer out of them, and that’s the closest I got.)
  
  A more realistic model would be to have larger families in, say, the 1800s and smaller ones today. I’d like to do a post where I play around with those numbers to see how it affects the matching estimates.
  
  Reply
  1. Dewi Jones says:
    
    June 9, 2024 at 1:06 pm
    
    Thanks. As an average 2.5 seems to be as good as any other number.
    One other factor to consider:- we probably have a smaller percentage of our 10th cousins living today than the percentage of 4th cousins. Some (many?) of our 10th cousins were born many years ago, or haven’t yed been born, and so they will not have been tested. A higher percentage of fourth cousinds will have done a DNA test.
    Is there a measure of the spread of the ages for cousins?
    Dewi
    
    Reply
    1. thednageek says:
      
      June 9, 2024 at 1:43 pm
      
      You make an interesting point about the offset in birth years for different generations. I suspect as far as our matches go that it’s offset by genetically equivalent removed relationships. For example, we might not overlap lifespans with all of our 10C, but that category also includes 9C2R, 8C4R, etc.
Scott says:

June 3, 2024 at 4:36 pm

> The same goes for DNA. A shared segment is not evidence for any given relationship until you’ve shown that you could not have inherited it through any other relationship path.

I think you’re missing something important, maybe two somethings.

* At present, most test labs look for SNPs, not full DNA sequences. So, you could have one SNP from one source and another from another source. When you gets down to the lower cMs, you’re getting into statistically uncertain territory.

* At present, most test labs don’t identify which DNA strand any particular SNP came from, so one SNP could come from one parent and the next from the other parent, so that it appears you have a match.

Reply
1. thednageek says:
  
  June 3, 2024 at 7:36 pm
  
  False matches (caused by the phasing issues you describe) are a separate and important issue but they don’t affect this analysis. There are no false matches in the simulated data.
  
  Reply
Dan says:

June 3, 2024 at 4:54 pm

Thank you for doing this. I find DNA the most complicated aspect of genealogy. I have a long way to go to understand, but this kind of information will lead the way to dealing better with DNA results.

Reply
1. thednageek says:
  
  June 3, 2024 at 7:37 pm
  
  You’re welcome!
  
  Reply
David Norton says:

June 3, 2024 at 6:02 pm

Except…I have begun a study of siblings and the variation of cM amounts within matches to each. This began when I was looking for my grandmother’s father and I began comparing the DNA matches of two of her children, my Aunt and Uncle. Their largest delta match was 110cm vs 23 cM for the same person. Since there were some large swings in matches I compared my brother and I. Our largest was 149cM vs 17cM. I have sent two more kits to siblings of other cousins so I can compare those also. Had I not compared our matches, I would have clearly dismissed relevant relationships, depending on who I looked at first.

Reply
1. thednageek says:
  
  June 3, 2024 at 7:40 pm
  
  You are describing issues with the matching algorithms at the various sites rather than the segments themselves. That’s not a factor for simulated data.
  
  I’m curious which site gives you the larger matches and which the smaller. Which site do you think is most accurate and why?
  
  Reply
Veronica Williams says:

June 4, 2024 at 10:06 am

Couldn’t agree more about what you say, however using segment data and chromosome analysis techniques such as visual phasing and walking back the segments through the generations can confirm, or at least increase the likelihood, that the segments and the genealogy are correct even with smaller segments <20cMs. I totally agree that people jump too quickly to conclusions with one genealogical match, without confirming that it is also a genetic match on that line. Shared matches can be very misleading the further back you go, but shared segments on the other hand help to increase the chance that the hypothesis about the genealogy is correct.

Reply
1. thednageek says:
  
  June 4, 2024 at 11:22 am
  
  Thank you so much for saying this! Chromosome mapping and especially the “walking back the segment” of Jim Bartlett can go a long way to ruling out the alternate paths of inheritance. Once you’ve done that, those small segments can be a valuable part of our work.
  
  Reply
Roseann Hogan says:

June 4, 2024 at 1:03 pm

Thanks for bringing this article to my attention, but….. the Kaplanis et al., 2018 article is seriously flawed. Two main issues, first, while the stat methodology used might be appropriate for the authors aims, but the base problem is that assumptions are made which a science based genealogist knowing the data in these pedigrees would not make. Other of their ‘tests’ for validation are biased to the point of being of no utility. The Vermont vital recs database for example, just silly to use as representation of I suppose class bias in genealogy trees. And one NPE per family, no, that’s clearly not supported by historic demographic data. Despite all those big words, old axiom applies, junk in, junk out.

Reply
1. thednageek says:
  
  June 4, 2024 at 3:47 pm
  
  I assume you’re referring to this sentence in Kaplanis et al.: “Using a prior of no more than a single non-paternity event per lineage, we estimated a non-maternity rate of 0.3% per meiosis and non-paternity rate of 1.9% per meiosis.”
  
  They aren’t saying that there was one NPE per family. A prior is a statistical parameter in Bayesian analysis. Put simply, once a misattributed parentage event disassociates the haplogroup from the paper lineage, it’s difficult to tell whether there was one MPE, two, or several. In the analysis, Kaplanis et al. assumed no more than one (i.e., between zero and 1 inclusive). If anything, they underestimated the MPE rate, because there will be lineages with more than one.
  
  In any case, that part of the paper is not relevant to this blog post.
  
  Reply
John Thompson says:

June 5, 2024 at 10:12 am

Thanks for the great post.

This analysis (https://www.biorxiv.org/content/10.1101/352732v1) is a bit dated (done in 2017/8) so I am not sure if the data are still completely accurate but I suspect that even if the exact SNPs examined have changed over time, the company’s philosophies are likely the same. I think genealogy companies tune their SNP selection to optimize geographical information rather than inter-individual matching. The two goals are intertwined but not identical and inter-individual matches are completely independent of the geographical databases maintained by each company. 23 and me was an outlier in SNP choice in that they favored many more rare SNPs with a higher ancestral bias than either Ancestry or My Heritage. This should also have an impact in that agreement on the rare SNPs should be more meaningful than matching of the more common SNPs. I don’t know whether their analysis was skewed such that those SNPs were more important or not. It would be useful to know which SNPs are shared among supposedly related individuals. Sharing common SNPs is much less meaningful than sharing rare SNPs.

Reply
1. thednageek says:
  
  June 5, 2024 at 5:25 pm
  
  Thanks for the link. You make some interesting points. Fortunately, SNP selection is not a factor in the simulations, so we can draw conclusions about the longevity of small segments regardless of the SNPs used.
  
  Reply
Arlene Freeman says:

June 5, 2024 at 4:21 pm

Wow, what a great tool, totally improved my understanding of the role of segment length!

Reply
Geoff King says:

June 8, 2024 at 2:31 am

Just a comment about your spreadsheet. I include in a separate column, the ahnentafel of the match to the CA as well as mine. It certainly shows which way the removed generation(s) are. Also when I have endogamy because my grandparents were cousins, for instance, I show ahnentafels for both paths, e.g. 54/56 which immediately shows why cMs are higher than expected. Thank you for your interesting post.

Reply
Pingback: This week's crème de la crème - June 8, 2024 - Genealogy à la carteGenealogy à la carte
Dewi Jones says:

June 9, 2024 at 1:23 pm

Your paragraphs about Mary Smith describe endogamy. Has any research been done to find a measure for endogamy?
E.g. on Ancestry, the percentage of shared matches, compared to total matches. Or the percentage of total matches which are over 20 cM.

Reply
1. thednageek says:
  
  June 9, 2024 at 1:44 pm
  
  I use average segment size as a gauge of endogamy.
  
  Reply
The Search Cooperative says:

June 9, 2024 at 5:53 pm

“There’s about a 7% chance that a true 3rd cousin will not match. For 4th cousins, it’s about 50%. Only about 16% of our 5th cousins will match, and the odds of matching obviously go down from there.” 7% of 3rd cousins won’t match and then 50% of 4th cousins won’t match? There is a difference between ‘not matching’ and not sharing DNA though right? If I paraphrase are you saying that 50% of 4th cousins will share less than 7 cM resulting in the testing company not matching the two true 4th cousin testers as related? There are only 512 ways (paths) for someone to be your 4th cousin. 4th cousins share an average of .20% of their DNA which coincidentally is 1/512 of their DNA. Since Tester 1 can be either a man or a woman the total ways that two people can be 4th cousins are 1024 and 169 of 1024 happen to be on paths that can share centimorgans on the X chromosome which won’t be included in the testers total cM shared by most Testing companies except for 23 and Me. Ancestry has the lowest 100% total cM of all the testing companies and the low end of 4th cousin is 9.8 cM. Testers that shared 12 cM total with all of it on the X chromasome would be 4th cousins at 23 and Me but would not even match at Ancestry. About 16% of 4th cousins have the potential to have autosomal amounts dip below the 7 cM reporting threshold. It is just a different way to look at the same issue a reason why some 4th cousins are not matching has nothing to do with the amount of DNA they share it has to do with the amount of shared DNA not being reported by the testing companies. At 23 and Me the reverse happens on a few paths. Some paths share more cM on X than the expected high end of the 4th cousin range so they might have cM amounts consistent with 1/2 3rd cousins.

Reply
1. thednageek says:
  
  June 16, 2024 at 5:26 pm
  
  Correct: 7% of 3rd cousins, 50% of 4th cousins, and 84% of 5th cousins won’t match one another in the genealogy databases, assuming a 7-cM match threshold. The X chromosome is not included in those calculations.
  
  Reply
Christopher Schuetz says:

June 10, 2024 at 2:56 pm

Wonderful work.
I try to persuade people to start from the big matches and work downwards. Someone always pipes up “but I had a great 6cm match”, ignoring all similar matches that were rubbish. So it’s really good to have some statistics to back me up.
In my own research I am very aware that there may be more than one possible path of connection between me and the match. And some may be on the other parental side. I have to accurately assign each individual match segment to a line to be certain.

Reply
1. thednageek says:
  
  June 10, 2024 at 8:33 pm
  
  Your careful approach is the model we should all aspire to.
  
  Reply
Vincenzo says:

December 27, 2024 at 11:20 am

Ciao sono kevin ho un gruppo di corrispondenze su mhyeritage e abbiamo una triangolazione loro hanno gruppi ge dei rom.i segmento triangolati ha volte io contro due di loro e 9cM poi io ci teo due di loro e 8cM ecc,poi quando faccio io e 7 di loro il segmento triangolati diventa 7,1cM che significa?

Reply
1. thednageek says:
  
  December 27, 2024 at 12:28 pm
  
  I’m sorry, I don’t read Italian.
  
  Reply
  1. Scott McNay says:
    
    December 27, 2024 at 4:31 pm
    
    This appears to be a good translation:
    
    Hello, my name is Kevin and I am part of a group on MyHeritage where we performed a triangulation. Currently, I have triangulated segments between me and two of them ranging from 9cM to 8cM, but when I include a seventh segment, the value drops to 7.1cM.
    
    I would like to understand the meaning of this variation.
    
    Reply
Scott McNay says:

December 27, 2024 at 4:19 pm

This might be of interest; fertility rate over time in the US.

An interesting article might be trying to improve accuracy for your own family by tying the historical rates to the actual rate for branches of your own family. For this, though, would need to examine a lot of high-accuracy trees to see how long an apparent difference lasts. For example, lots of kids this generation and last generation doesn’t necessarily mean there were a lot the prior generation.

In addition, should look at the average actual number of children per woman. Three kids per each of 12 women is a lot different genealogy-wise from 12 kids per each of 3 women (and the other 9 women have none).

https://www.statista.com/statistics/1033027/fertility-rate-us-1800-2020/

Reply
1. thednageek says:
  
  December 28, 2024 at 12:11 pm
  
  You make excellent points! I did a first pass at addressing the changes in family size in the “variable model” rows of the spreadsheet, which it appears I haven’t copied over to the public version. Oops!
  
  Reply
  1. Scott McNay says:
    
    December 31, 2024 at 8:58 pm
    
    This doesn’t seem to apply here, but others might be interested: generations (parent to child) are generally considered to be about 25 years apart, but this doesn’t match the actual averages, which is about 28 for women and 31 for men:
    
    https://isogg.org/wiki/How_long_is_a_generation%3F_Science_provides_an_answer
    
    https://pmc.ncbi.nlm.nih.gov/articles/PMC9821931/
    
    https://vitabrevis.americanancestors.org/2018/09/how-long-is-a-generation
    
    https://www.yourdnaguide.com/ydgblog/how-long-is-a-generation
    
    Reply
Marian Hill says:

September 4, 2025 at 8:37 am

Looking at 10C in the spreadsheet, AG4 and AG9 indicate that the average cMs shared was 10.6 and AG5 and AG10 indicate that the longest segment was 19.9 cMS. However, AG29 shows 0.0% sharing 10cMs or more. How is AG29 calculated? Thank you.

Reply
1. thednageek says:
  
  October 26, 2025 at 10:22 am
  
  If you look at AG19 and AG20, you’ll see that the vast majority of 10C don’t match at all. Out of 50,000 simulations, only 8 matched. The numbers in AG4, AG5, AG9, and AG10 only represent those eight replicates. Those are the ones we would see in our match lists.
  
  Given how many 10C we’re likely to have (AG21), even a tiny match probability will result in thousands of 10Cs in our match lists (AG22 and AG23).
  
  Reply
Marian Hill says:

October 26, 2025 at 12:05 pm

Thanks. I realized later that it was an issue of rounding, but couldn’t figure out a way to delete the post.

Reply
Scott McNay says:

October 27, 2025 at 10:54 am

> You can’t assume that a funeral record for “Mary Smith” in a city full of Marys and Smiths is proof about your 3rd great grandmother

Some time ago at my job, I discovered a kid who had the same name and same parent names as another kid, but at a different address. So, even matching names of relatives that live with you, should not be assumed sufficient for a positive match.

And I know from my own experience (but oddly I don’t remember ever seeing it mentioned), having the same name as someone else living at the same location (parent, child, simple wild coincidence) gets you rather thoroughly intertwined: conflation fallacy.

Reply