The one thing we genealogists probably want most from our autosomal DNA matches is something they can’t give us: an exact relationship prediction based on shared DNA alone. Unfortunately, with the exceptions of identical-twin, parent–child and full-sibling matches, that’s simply not possible.
Why not? One reason is that multiple different relationships can give the same patterns of shared DNA. For example, a woman who shares 1750 cM with you could be your grandmother, granddaughter, aunt, or half sister. Those relationships are indistinguishable based solely on the amount of shared DNA. (In this case, you can narrow the possibilities using age.) Someone sharing 950 cM with you could be a great-grandparent/grandchild, first cousin, great-uncle/aunt/nephew/niece, or half-uncle/aunt/nephew/niece.
The DNA Detectives Facebook team has designed a nifty chart that categorizes relationships into groups based on the expected amounts of shared DNA. In the two examples above, grandparent/child, aunt/uncle, and half sibling would be Group B, and great-grandparent/grandchild, first cousin, great-uncle/aunt/nephew/niece, or half-uncle/aunt/nephew/niece would be Group C. I will use the DNA Detectives group names in the rest of this post for ease of reference.
To complicate matters, each group is defined not so much by an average or “expected” amount of shared DNA but by a range. That is, someone in Group B might share 1750 cM with you, but they could also share as little as 1300 cM or as much as 2300 cM, according to the DNA Detectives chart. Group C can range from 575 cM to 1330 cM.
Notice another problem? The low end of the Group B range overlaps the high end of the Group C range. Put another way, someone who shares 1315 cM with you could be in either group (and remember that each group includes multiple possible relationships). Worse, the more distantly related the group, the broader the range of shared centimorgans relative to the average and the more overlap there is with other groups. Someone who shares 3015 cM with you can only fall into Groups B or C, but someone who shares 100 cM could belong to Group E, F, or G, according to the DNA Detectives chart.
When you have a match in an overlap zone, the best approach is to consider the most likely group first. AncestryDNA’s Matching White Paper (31 March 2016) presents an informative graph (their Figure 5.2) that shows the likelihood of each group (the x axis) given the amount of shared DNA (the y axis). Their graph is based on simulated data, rather than empirical (real) data, but as long as the model they used to do the simulations is reasonable, the data should be reliable.
Unfortunately, they used a logarithmic scale, which is a great space saver but is intuitive to precisely no one. They also misuse the word “meioses”, confusing people who aren’t familiar with the term as well as those who are. To make the information easier to understand, I edited the image labels to use the groups from the DNA Detectives chart. Here’s what the modified figure looks like.
The figure gives you a visual sense of how broad the ranges are for each relationship group and how much overlap there is. It also shows us which centimorgan values represent only one possible group; those are the zones along the vertical y axis that only have one colored line crossing them. Between about 2400 cM and 3200 cM, the only line is the medium blue one for Group A, and between about 1550 cM and 2000 cM, the only line is the forest green one for Group B. There’s a short interval around 1000 cM that can only be Group C, but for all other centimorgan values, more than one group of relationships could apply.
Because of the log scale, the graph is hard to interpret if you’re interested in a specific centimorgan amount. To get around the problem, I approximated x and y values for each curve using an online plot digitizer. Geek power!
What does this tell us? It gives us an indication of which group of relationships is most likely to apply to a match who shares a specific amount of DNA. For example, a match sharing 750 cM with you is in an overlap zone, but they are far more likely to be in Group C (probability p = 0.85, or 85% chance) than in the overlapping Group D (p = 0.15, or 15% chance). Of course, the numbers don’t guarantee that the match is in Group C, but that’s where I’d start looking for the connection.
The probabilities can be more complicated. Consider a match who shares 110 cM with you. That person could belong to Group E (p = 0.08, 8% chance), Group F (p = 0.39, 39% chance), Group G (p = 0.30, 30% chance), Group H (p = 0.20, 20% chance), or Group I (p = 0.06, 6% chance). Again, the best approach would be to look for a shared ancestor in the most likely relationship range first, so Group F > Group G > Group H > Group E > Group I.
You may also be familiar with the Shared cM Project by Blaine Bettinger. This project compiles self-reported data from the genetic genealogy community for different relationships. Thus, it gives us both the extremes (maximum and minimum values) as well as histograms (bar graphs showing how common given centimorgan values are for each relationship). The histograms are comparable to the colored lines on the AncestryDNA graph.
For comparison, I’ve aligned the ranges from the three datasets below. For the Shared cM Project, I’ve combined data for relationships that belong to the same group (e.g., first cousins once removed and second cousins both belong to Group E, so they were treated together).
The ranges given by the DNA Detectives are consistently narrower than those from the other two sources. That is mainly due to the fact that the DNA Detectives chart intentionally omits extreme outliers, which are especially challenging to deal with in the unknown parentage searches for which the chart was created. Their dataset is also the smallest, although it has the advantage that each datapoint has been carefully vetted by an expert. The Shared cM Project ranges are similar to those of AncestryDNA, but not exactly the same. Differences between the two could result from errors in the self-reported data of the former, the relative sizes of the datasets (the simulated dataset is almost certainly much larger than the empirical data), or assumptions made by AncestryDNA’s scientists in designing the simulations. Regardless of which source of information you prefer to use in your own genealogical work, keeping in mind the strengths and weaknesses of each dataset is wise.
Note: The probabilities and cM ranges discussed in this post assume little or no endogamy. Endogamy is the practice of members of a population marrying within the same group over multiple generations. If practiced for enough time, the present-day members of the population will all be related to one another multiple different ways.
Acknowledgements: Thanks to Dr. Tracy Vogler for alerting me to the online plot digitizer. CeCe Moore and Christa Stalcup kindly agreed to let me reproduce the DNA Detectives chart here.