Two of the most useful aids for figuring out how you are related to your autosomal DNA matches are at the DNA Painter website.
The Shared cM Project Tool lets you enter a centimorgan amount for a DNA match and reports the most likely relationship groups, in order, between yourself and that person. What Are the Odds? (WATO) helps you evaluate multiple DNA matches at once in tree-based format. Each hypothesis is given a score based on how it compares to the other hypotheses. Hypotheses with higher scores are more likely to be true and are worth investigating further.
(The DNA Painter site has other nifty features, too. I recommend checking it out if you haven’t already.)
Both tools are based on a set of probabilities that were derived from a figure the 2016 Ancestry DNA Matching White Paper. The chart is shown here in a slightly modified form for ease of use. These data were obtained by starting with real DNA profiles then simulating generations of descend from those individuals.
With the exceptions of the parent–child and full sibling groups, each colored line represents a category of relationships that share basically the same amount of DNA. For example, the “half sib” group also includes aunt/uncle–niece/nephew and grandparent–grandchild.
For any given amount of DNA shared by two people (the x axis in the graph above), the most likely relationship is represented by the colored line that is highest on the y axis at that point. Other lines that cross above that centimorgan amount represent other relationships that are less likely but still possible. (For more on the graph and what it means, see this post.)
As you can see, the graph stops at 40 cM, which imposed an effective lower limit on the WATO tool. You can plug match amounts below 40 cM into WATO, but you can’t have much confidence in the results. (For more on how the WATO tool works, I recommend this series of posts.)
The Silent Update
A little over a year ago, AncestryDNA updated their DNA Matches feature and added something new: clicking on the amount of shared DNA between yourself and a match pulled up a probability table for the possible relationships.
What they never openly stated, though, was that those probabilities were new! And they go all the way down to 6 cM.
With help from the genealogy community, I compiled this new dataset into a table in hopes of improving WATO and extending its utility down below 40 cM. And then I waited for Ancestry to update their white paper to describe how they arrived at the new numbers … and waited, and waited, and waited.
Y’all, I give up! It’s time for these newer probabilities to be made available in an updated version of WATO.
New Versus Old
The graph below compares the new probability distributions (solid lines) with the old ones (dashed). In the discussion below, remember that each designated relationship is actually a shorthand for a group of relationships that all share roughly equivalent amounts of DNA. For example, the 3C category includes 2C2R, half 2C1R, and other relationships. (The abbreviations are “C” for cousin and “R” for removed, so 2C1R means a second cousin once removed.)
There are three main differences between the datasets.
The New Data Go Below 40 cM
The most obvious difference is that the old dataset (dashed lines) stoped at 40 cM, whereas the new numbers go below 10 cM. (This graph is cut off at 10 cM for convenience.)
Below 40 cM, multiple relationship groups are possible, and they all have similar probabilities, between about 5% and 35%. This means a distant DNA match doesn’t give you a lot of information about what the relationship is, but it can tell you what the relationship isn’t. Someone who shares 20 cM with you could be anything from a 2C to a 5C or greater, but they can’t be a 1C1R or closer based on these data.
Those of you with a bit of experience with WATO will note that you have always been able to plug cM values below 40 into it. What you may not know is that the probabilities below 40 were just educated guesses.
Here’s how the guesses (dashed lines to the right of the red line) compare to the new simulated probabilities (solid lines).
Compare the pairs of green, black, and teal lines (the dashed teal line is hidden behind the dashed lilac one). The guesses weren’t that far off. However, I underestimated the probability of the 4C category by about 5%, and the old probabilities lumped all relationships more distant than 4C1R into one category (dashed royal blue line), whereas the new statistics from Ancestry distinguish 4C1R and 5C (orange) groups.
What does the new dataset mean for WATO? Small centimorgan matches are not likely to have a significant effect on how hypotheses are ranked in WATO, but having probabilities below 40 cM that are based on scientific simulations rather than educated guesses means we can use small matches with more confidence.
The Peaks Are Shifted Downward
Another difference between the two datasets is that the peaks for each relationship group are shifted to lower centimorgan amounts. The effect is quite subtle for closer relationships, but becomes more noticeable for 2C (yellow) and more distant relationships.
Look at the 2C1R category (light green). The old peak for 2C1R (dashed) was at about 150 cM, whereas the new peak (solid) is at about 130 cM.
What does the new dataset mean for WATO? The shift has the effect of making any given DNA match a bit more likely to be a generation closer than before. Consider a match who shares 185 cM with you. Under the old stats, the most likely relationship group was 2C1R (50% chance), whereas the new stats give that match a 60% chance of being in the 2C group.
Time will tell whether this new dataset helps us arrive at answers faster.
Astute readers will notice that, in addition to being shifted, the new distributions are broader than the old ones. The effect is best exemplified by the 1C1R (navy) relationship. The new distribution lies outside the old, on both the upper and the lower slopes of the curve.
What does the new dataset mean for WATO? Broader distributions mean that some hypothesized relationships that were considered impossible using the old stats will be just barely possible under the new ones. A match who shares 40 cM was previously ruled out entirely as a 2C but has a roughly 0.5% chance of being a second cousin now.
The effect on WATO is profound. Because hypotheses are scored relative to one another—a score of 100 is ten times better than a score of 10 and one hundred times better than a score of 1), having hypotheses that were ever-so-just-barely possible serves to inflate the scores of better hypotheses into the millions or even billions.
In fact, the hyper-inflation of scores is a key reason WATO has kept the new probabilities under wraps so long. It’s confusing and distracting—especially to beginning users of the tool—and doesn’t necessarily get you closer to an answer.
Fortunately, we’ve come up with a workable solution, and the new statistics described here will be available in a major update to the WATO tool very soon. Stay tuned!
- “What Are the Odds?” An online tool that can help solve DNA puzzles video by Jonny Perl
- Introduction to What Are the Odds? (WATO) video by The DNA Geek
- What Are the Odds? A tool for fitting a DNA match into a family tree video by Andrew Millard
- The Limits of Predicting Relationships Using DNA blog
- Science the Heck Out of Your DNA—Part 1 blog
- Science the Heck Out of Your DNA—Part 2 blog
- Science the Heck Out of Your DNA—Part 3 blog
- Science the Heck Out of Your DNA—Part 4 blog
- Science the Heck Out of Your DNA—Part 5 blog
- Science the Heck Out of Your DNA—Part 6 blog
- Science the Heck Out of Your DNA—Part 7 blog