Improving the Odds

Two of the most useful aids for figuring out how you are related to your autosomal DNA matches are at the DNA Painter website.

The Shared cM Project Tool lets you enter a centimorgan amount for a DNA match and reports the most likely relationship groups, in order, between yourself and that person.  What Are the Odds? (WATO) helps you evaluate multiple DNA matches at once in tree-based format. Each hypothesis is given a score based on how it compares to the other hypotheses.  Hypotheses with higher scores are more likely to be true and are worth investigating further.

(The DNA Painter site has other nifty features, too.  I recommend checking it out if you haven’t already.)

Both tools are based on a set of probabilities that were derived from a figure the 2016 Ancestry DNA Matching White Paper.  The chart is shown here in a slightly modified form for ease of use.  These data were obtained by starting with real DNA profiles then simulating generations of descend from those individuals.

 

With the exceptions of the parent–child and full sibling groups, each colored line represents a category of relationships that share basically the same amount of DNA.  For example, the “half sib” group also includes aunt/uncle–niece/nephew and grandparent–grandchild.

For any given amount of DNA shared by two people (the x axis in the graph above), the most likely relationship is represented by the colored line that is highest on the y axis at that point.  Other lines that cross above that centimorgan amount represent other relationships that are less likely but still possible. (For more on the graph and what it means, see this post.)

As you can see, the graph stops at 40 cM, which imposed an effective lower limit on the WATO tool.  You can plug match amounts below 40 cM into WATO, but you can’t have much confidence in the results. (For more on how the WATO tool works, I recommend this series of posts.)

 

The Silent Update

A little over a year ago, AncestryDNA updated their DNA Matches feature and added something new: clicking on the amount of shared DNA between yourself and a match pulled up a probability table for the possible relationships.

 

What they never openly stated, though, was that those probabilities were new!  And they go all the way down to 6 cM.

With help from the genealogy community, I compiled this new dataset into a table in hopes of improving WATO and extending its utility down below 40 cM.  And then I waited for Ancestry to update their white paper to describe how they arrived at the new numbers … and waited, and waited, and waited.

Y’all, I give up!  It’s time for these newer probabilities to be made available in an updated version of WATO.

 

New Versus Old

The graph below compares the new probability distributions (solid lines) with the old ones (dashed).  In the discussion below, remember that each designated relationship is actually a shorthand for a group of relationships that all share roughly equivalent amounts of DNA.  For example, the 3C category includes 2C2R, half 2C1R, and other relationships.  (The abbreviations are “C” for cousin and “R” for removed, so 2C1R means a second cousin once removed.)

There are three main differences between the datasets.

 

The New Data Go Below 40 cM

The most obvious difference is that the old dataset (dashed lines) stoped at 40 cM, whereas the new numbers go below 10 cM.  (This graph is cut off at 10 cM for convenience.)

Below 40 cM, multiple relationship groups are possible, and they all have similar probabilities, between about 5% and 35%. This means a distant DNA match doesn’t give you a lot of information about what the relationship is, but it can tell you what the relationship isn’t.  Someone who shares 20 cM with you could be anything from a 2C to a 5C or greater, but they can’t be a 1C1R or closer based on these data.

Those of you with a bit of experience with WATO will note that you have always been able to plug cM values below 40 into it.  What you may not know is that the probabilities below 40 were just educated guesses.

Here’s how the guesses (dashed lines to the right of the red line) compare to the new simulated probabilities (solid lines).

 

Compare the pairs of green, black, and teal lines (the dashed teal line is hidden behind the dashed lilac one).  The guesses weren’t that far off.  However, I underestimated the probability of the 4C category by about 5%, and the old probabilities lumped all relationships more distant than 4C1R into one category (dashed royal blue line), whereas the new statistics from Ancestry distinguish 4C1R and 5C (orange) groups.

What does the new dataset mean for WATO?  Small centimorgan matches are not likely to have a significant effect on how hypotheses are ranked in WATO, but having probabilities below 40 cM that are based on scientific simulations rather than educated guesses means we can use small matches with more confidence.

 

The Peaks Are Shifted Downward

Another difference between the two datasets is that the peaks for each relationship group are shifted to lower centimorgan amounts. The effect is quite subtle for closer relationships, but becomes more noticeable for 2C (yellow) and more distant relationships.

Look at the 2C1R category (light green).  The old peak for 2C1R (dashed) was at about 150 cM, whereas the new peak (solid) is at about 130 cM.

What does the new dataset mean for WATO? The shift has the effect of making any given DNA match a bit more likely to be a generation closer than before. Consider a match who shares 185 cM with you. Under the old stats, the most likely relationship group was 2C1R (50% chance), whereas the new stats give that match a 60% chance of being in the 2C group.

Time will tell whether this new dataset helps us arrive at answers faster.

 

Broader Distributions

Astute readers will notice that, in addition to being shifted, the new distributions are broader than the old ones.  The effect is best exemplified by the 1C1R (navy) relationship.  The new distribution lies outside the old, on both the upper and the lower slopes of the curve.

 

What does the new dataset mean for WATO? Broader distributions mean that some hypothesized relationships that were considered impossible using the old stats will be just barely possible under the new ones.  A match who shares 40 cM was previously ruled out entirely as a 2C but has a roughly 0.5% chance of being a second cousin now.

The effect on WATO is profound.  Because hypotheses are scored relative to one another—a score of 100 is ten times better than a score of 10 and one hundred times better than a score of 1), having hypotheses that were ever-so-just-barely possible serves to inflate the scores of better hypotheses into the millions or even billions.

In fact, the hyper-inflation of scores is a key reason WATO has kept the new probabilities under wraps so long.  It’s confusing and distracting—especially to beginning users of the tool—and doesn’t necessarily get you closer to an answer.

Fortunately, we’ve come up with a workable solution, and the new statistics described here will be available in a major update to the WATO tool very soon.  Stay tuned!

 

Related Media

23 thoughts on “Improving the Odds”

  1. Thank you for all the work you have done on this, Leah … Both when creating WATO in the first place and now towards its ongoing improvement. It is very much appreciated!

  2. Thank you and awesome work!

    I have a WATO that gives me disturbing results due to a 59cM match causing an hypothesis to go red when I suspect it really should be green. Can’t wait to see if the new stats can give me some more clarity on that particular match.

    1. Are you sure the 59-cM match is in the right place? And that they’re a full relationship rather than half? If they’re communicative, ask them how much they share with the other people in the tree and create a WATO tree to test whether they’re a half cousin instead of full.

      1. WATO is giving me a red hypothesis for 59cM being a half-1C1R. I believe everyone is in their right place in the tree. While I do not have access to their other results, they have assured me that they are correctly represented in the tree.

        1. A 59-cM match will have a probability of about 1.5% with the new stats. Hope that helps!

  3. Excellent! Looking forward to the upgrade.
    Question: Will existing WATO trees reflect the new probabilities automatically as they are opened?

    1. The plan is to have the new and old versions of WATO available in parallel. You’ll be able to switch back and forth.

  4. Thank you for your rapid reply.
    An additional question if I may…are you still looking into the possibility of expanding the capability of the WATO tool to include two distinct sources of ancestors rather than limiting the origination point to a single couple?

    1. That’s a programming issue so outside my purview. There’s a workaround to include matches from both lines, though.

  5. Really looking forward to this, Leah! You did a test run using my WATO about two weeks ago, with a several lower matches, and it didnt make a huge difference…but I’m remaining hopeful that the update may give me some clearer indications. Thanks so much for what you do!

  6. How does the new version (and/or old) handle 0cM? E.g., in a case where I am twinning and my brother matches someone, but I don’t. Do you incorporate the probability of detecting different cousin orders to include the 0cM cases, or are they just exclusionary when out of range?

    1. Great question! Both old and new versions gave you odds for different cousin levels for 0-cM matches, and in both cases the probabilities are extrapolated rather than taken directly from Ancestry. The main difference is that in the old version, everything below 40 cM was extrapolated so there was more room for error. With the new stats, we’ve only had to extrapolate between 0 cM and 6 cM.

  7. Have you considered using the Shared cM Project data, especially V4, as the source of your probabilities?

    1. No, the SCP data isn’t applicable. The SCP data shows the probability of a cM amount given a relationship, whereas the probabilities used by WATO are the probability of a relationship given the cM amount.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.