A Major Update to “What Are the Odds?”

Since its release nearly 2 years ago, What Are the Odds? (WATO) has become an essential tool for solving family mysteries with autosomal DNA (the kind tested by AncestryDNA, 23andMe, MyHeritage, Living DNA, and the “Family Finder” test at FamilyTreeDNA).

Briefly, someone with an unidentified parent, grandparent, or even great grandparent may have a set of DNA matches who are all from the same extended family.  That is, how they all descend from a most recent common ancestor or couple (MRCA) is known, but how the original tester—or target—fits in is not.  WATO lets you “try out” the target in different places and tells you which of those hypotheses are most likely, based on how much DNA the target shares with each of the people in the tree.

 

A Simple Example

Consider Julia, whose father is unknown.  Julia has DNA matches to Dan and Jessie, who are first cousins through their grandmother Mabel and Mabel’s husband.  She shares 1800 cM with Dan and 800 cM with Jessie.  She also shares 225 cM with their second cousin, Michelle.  Taken together, these matches tell me that Julia is descended from Mabel’s parents and almost certainly from Mabel herself.  (All names and centimorgan amounts in this example are fabricated.)

The relationships among these matches tell me that Julia must be descended from Mabel’s parents, and based on how much DNA she shares with Dan and Jessie, she’s probably descended from Mabel herself.  But how?

To use WATO, I can build a simple tree descendant tree from Mabel’s parents, assign centimorgan amounts to the people who have tested (Dan, Jessie, and Michelle), and then place hypotheses where I think Julia might fit.  WATO automatically figures out which hypotheses don’t work (e.g., Julia can’t be Dan’s granddaughter), and which are most probable.  In this case, WATO is telling us that it’s 111 times more likely that Julia is Dan’s half sister than his niece.

 

WHAT’S NEW IN WATO?

The new beta version of WATO has several major improvements.  Some will be readily apparent to those who already use the tool, and some are less obvious.  I’ll describe each of them below.

Bear in mind that this version of WATO has been tested by a group of about 15 people, but there are almost certainly bugs we didn’t catch or scenarios we didn’t consider.  If you think you’ve found a bug, or simply want help using the tool, please join our Facebook group.

 

Gedcom Import

When you first access the new version of WATO, you’ll be greeted by this pop-up.

 

Note the very exciting “IMPORT A GEDCOM” button at the bottom.  We no longer need to manually build out the descendant tree of the MRCA!  (A gedcom file is a text version of your family tree.  You can export one from the software you use to manage your tree.)

If you dismiss the popup, you can also upload a gedcom using the LOAD button in WATO’s toolbar.

 

Either route will take you to this pop-up.  You can either drop a gedcom file into the window or click to browse for one on your storage device.

 

Once you’ve loaded your gedcom file, WATO will ask you to select the MRCA of the DNA matches.

 

You can make a choice about the how living people will be displayed in the tree, then click “IMPORT DESCENDANTS”.

My descendant tree for James Weicks had more than 100 people.  Here’s what WATO imported in just moments.  I had to split the image to display it all in one view.  Can you imagine how long it would take to build this out manually?

 

Once you’ve imported a tree, you can manually enter the centimorgan amounts to each DNA match by hovering over a name, selecting “Enter Match cM”, and typing in the number.  Do this for each match whose placement in the tree is known.  You can use shared DNA amounts from any of the testing companies.

 

Suggest Hypotheses

The next step in using WATO is to designate places you think the target person might fit into the tree.  These are your hypotheses.  WATO can now suggest them for you!

First, though, you need to tell WATO when your target person was born.  That’s because WATO considers whether a potential parent was of age to have a child and whether they died before the child was conceived.  Simply enter the birth year in the field at the top, then click the “SUGGEST HYPOTHESES” button.

 

Here’s a simplified version of the tree I imported, before and after the hypotheses are added.  It’s nigh upon impossible to read at this resolution, but you can see that WATO has placed 42 hypotheses throughout the tree.  It’s left out many other places where the score would be zero.

 

When we zoom in to a portion of the tree, we see that WATO has added full and half siblings as needed to generate hypotheses.  That’s because the tree you import might not be filled in on all branches, and there may be family members you don’t know about.

 

You can still add or delete hypotheses manually wherever you like.  If you’re absolutely certain, for example, that Edward only had one child, you could remove the “Unknown sibling” and “Unknown half-sib” individuals to simplify you tree.  You can also remove all suggested hypotheses with a single mouse click.

 

Updated Probabilities

The newest version of WATO also implements updated relationship probabilities from AncestryDNA.   The key features of the new dataset are that the probabilities now extend down to 6 cM (previously 40 cM), they are shifted slightly, and the distributions are broader.  You can learn more here.

A consequence of having broader cM distributions for each relationship is that hypotheses that were ruled out in the earlier version of WATO will be ever-so-slightly possible in this one.  Because the scores in WATO are all relative to the least likely hypothesis, this has the effect of inflating the scores of better hypotheses to astronomical levels.

To correct for these misleadingly large values, this version of WATO now filters out extremely low scores.  Any hypothesis that is a million times (or more) worse than the best hypothesis is ruled out.  (As a reminder, this is a beta version of the tool, and the threshold for score filtering may be adjusted in the future.  Feedback is appreciated.)

 

Spouse Names

A hidden gem in the new version of WATO is the presence of spouse names.  To see them, hover your cursor over a person in the tree.  Spouses will be listed below the birth–death dates.  To edit them, click the ADD/EDIT DETAILS button.

 

For now, spouse names are just for bookkeeping, but there are some interesting future applications that can be built off of them.

 

Score Overlay

The green and red score flags associated with hypotheses in the tree have some new tweaks.  Hover over a red flag (score = 0) and you’ll see this overlay:

 

It’s a great reminder to check your work.  You wouldn’t want to rule out a valid hypothesis because of an error.  If the tree checks out, you know that hypothesis is really not possible.

You can do the same with the green flags.  The overlay calculates how much more likely the best hypothesis is than the next best one, and for weaker hypotheses, it tells you that the score is not significantly better than others.

And finally, if you click on the flag, WATO will take you to the Ranking of Hypotheses table below the tree, which ranks the hypotheses by score and summarizes them.

 

Feedback Is Appreciated

WATO v2 is still a beta version, meaning it’s been put through its paces by a small group of users (the alpha testing group), but there may well be programming bugs we didn’t find or user scenarios we didn’t consider.  If you have problems with the tool, or just want a community to help you figure out how to use it, please join the DNA Painter: What Are the Odds? (WATO) group on Facebook.  Tell them The DNA Geek send you!

 

Related Media

 

42 thoughts on “A Major Update to “What Are the Odds?””

  1. Have been looking forward to this for a few weeks! Lots of great new features, I especially like that the match amounts go lower than 40cM. I logged out of Jonny’s webinar too quickly, so didn’t get a chance to ask – is there any way to import a previously-created WATO tree into the new beta version? I don’t have a GEDCOM to import since this line is new and I don’t know where it fits in.

    Thanks for all of the hard work!

  2. I just listened to Johnny Perl introduce the new Beta version and am excited to play with it.

  3. Thank you – I did figure that out after I encountered my other problem that I posted about on Facebook 🙂

  4. Nice improvements! Might it be an idea to color the top 5 hypotheses green and the remainder orange (and red if the probability is 0). Also, I am not sure if you still have these high scores but it might also be easier to interpret the scores when you use ranks in the visualization.

    1. Glad you like it! We experimented with color scales for the hypotheses before v1 came out but decided against it. I don’t recall why. By ranks, do you mean in the tree itself? The hypotheses have always been ranked down below the tree.

  5. I’m so excited to try this update version.

    I just saw a post on Facebook regarding twinning and WATO. I’m having difficulty finding any videos or articles on how this works.

    I’m hoping that this will be a useful tool to use in an unknown parentage case where I manage the dna of the target and a cousin.

  6. I really do appreciate your tool. But I am still not sure I have the correct answers I desparately seek. I am trying to find my mother’s biological father before she dies. She is 88. I have sent out many DNA kits. And my newest is a lady who is 85. Her results are 875 cm to my Mom. Can’t tell if this is an Aunt or a Cousin. I have read everything I can. I have hired LegacyTree at a large fee. They used your tool but the results are still inconclusive. Since I have this new result, they are again meeting with me for a fee. Very frustrated.

  7. If the GEDCOM family tree used in a WATO V2 analysis starts with a husband-wife pair who are first cousins to each other (both born about 1810), and the point of our WATO analysis is to discover which of their sons or grandsons was the most likely participant in an NPE circa 1880, does their close DNA relationship invalidate the WATO results, or is the impact of this close DNA relationship on the WATO probabilities lost in the next 3-4 generations?

    We have 37 DNA data points ranging from 10 cM to 180 cM with which to work. The living Hypothesis person (the product from the NPE was her grandmother) is either a 3rd-cousin or half 2nd cousin to my wife, and the DNA match levels are from the Hypothesis person to multiple relatives in my wife’s extended family.

    1. If the MRCA couple were 1Cs to one another, then everyone in the analysis will be affected equally, which is good. Even so, the WATO probabilities might be off somewhat. The higher matches should be more reliable.

  8. I have been mapping out a tree of my DNA matches that relate to an NPE of my Gt-grandfather circa 1850. I now have 32 in total whose tree linkage is mapped, but many are small cM.
    If I use WATO v2 I get a clear winner in the hypotheses, while if using v1 odds, there is no clear winner. If I take the top twelve matches, from 123cM to 30cM, it is the same pattern- v2 has a clear winner, 8000 times the next best, v1 has scores in 1s, 2s and 3s.
    How much weight should I give to the WATO 2 result being the better indicator?

    1. To be honest, since that post was written, I’ve come to distrust the v2 probabilities and use the original ones myself. Are you able to apply Y-DNA to this question?

      1. I tried – I have done a y-dna test on Family Tree Dna (it is down my male line), but had no significant matches at 37 markers. One exact match at 12 markers did not help at all.
        Your answer helps me, in a way. It was quite difficult to credit the V2 “solution” – having a middle child being adopted out did not seem probable. With V1 saying “there is some relation with this family at this generation” I know I have to cast the net broadly.
        Thanks.

    2. Lyle, looks like we are running parallel analyses looking for which ancestor took part in an NPE. My recollection is that V1 of WATO had a 40 cM cutoff and that the computational algorithms were not changed in V2, so what you are seeing is most likely the result of producting in the probabilities of the matches with the lower allowed cMs in V2. If you also remove the 30-40 cM matches, do the two versions give you nearly the same results for the same inputs?

      V2 also implemented “updated relationship probabilities from AncestryDNA”, so this could impact the results somewhat between V2 and V1. But I wouldn’t expect a major (i.e., 2 orders of magnitude) shift in the WATO results between versions.

      1. The “official” v1 probabilities from AncestryDNA stop at 40 cM. You can use smaller matches in v1 WATO, but those probabilities are just educated guesses. The “official” v2 probabilities go down to 8 ccM, and I was initially very excited about them, but in practice they yield some inexplicable results.

      2. That is a good suggestion. Doing this I am left with 7 DNA matches from 41 to 123 cM.
        With V1, my non-zero hypotheses are scored 1, 2, 2, 2, 47, 228. So now there is a better indicator to a “best hypothesis” than when the scores were all 1s 2s and 3s. The wide probability dispersions of distant relationships was smooshing out the signal.
        With V2, I get 1, 0, 0, 1, 1035, 757609. Same outcome but much more peaky result.

        If we could find a private channel I could share the links for these WATO trees.

  9. My conclusions from trialing different lower cutoffs for my matches tree-
    1. For WATO v1, 40cM is the best cutoff to get a strong result. By 25cM, the hypotheses look inconclusive.
    2. For WATO v2, you can go down to 25cM, which point reaches the maximum result (ie. best : 2nd best ratio). Going below 25cM does not add value.
    3. In general, v2 gives a more focussed (higher) result. Whether that is indicating a strong likelihood of the hypothesis, or giving misplaced optimism is the question.
    Thanks

  10. WATO V1 seems to use the Shared CM Project 4.0 Tool Version 4 algorithms for its relationship probabilities as the results are the same. WATO V2 uses a completely different set of relationship probabilities as like-for-like cM levels produce widely differing probabilities. The description of the “updated probabilities” above says they are from Ancestry. This change appears to be the root cause for the wide differences in results between WATO V2 and WATO V1.

    What is the rationale behind switching to Ancestry? Are we confident that their methodology for obtaining these values is better than that of the CM Project Tool V4? Alternately, was a coding error possibly made in implementing the Ancestry equations in WATO V2?

    1. All of the probabilities are from Ancestry. WATO v1 and the SCP tool use data I extracted from Figure 5.2 in the AncestryDNA Matching White Paper. See: https://thednageek.com/the-limits-of-predicting-relationships-using-dna/

      The WATO v2 probabilities are collated from the percentages they give for the cM amounts in our match lists.

      The rationale was to try out the newer set to see if they improved matters. The original probabilities stop at 40 cM, and the values below that are just educated guesses. The beta probabilities go down to 8 cM, but they give unrealistic results at higher centimorgan amounts. I was initially optimistic about the beta probabilities, but they haven’t lived up to expectations.

      There’s no coding error. The probabilities are simple look-up tables. WATO v1 uses the original set and WATO v2 uses the beta set. Otherwise, the coding for that part of the tools is the same.

    2. Rik, I’m not a FB user.
      But if you try my 2 names, using a dot, on Googles mail.

      I think there is a case for keeping v2, if the researcher needs to reach down below 40cM to get enough matches.

      1. The primary issue as I see it is that WATO V2 uses a completely different (or updated?) set of relationship probabilities than either the Shared cM Tool V4, or the WATO Tool V1. These latter two tools both use the same relationship probabilities data base.

        It would be nice at this juncture to pick a database and immediately issue both tools (i.e., Shared cM Tool V5 and WATO V 2.1 or 3) with it.

        Failing that, I’d like the DNA Geek to issue a WATO V2- (Minus) variant with the WATO V1 relationship probabilities data base in it. That way V1 can be retired and we can utilize the other useful new features of V2 with the widely used V4 version of the Shared cM Tool until all the kinks with the V2 Ancestry database are understood and confirmed.

  11. There are some substantial differences in the resultant match probabilities between V2 and V1 – on the order of 0.12 (avg) to 0.22 (max) for the case of 20 data points > 40 cM I was running. While the curve shapes are not dissimilar, the V2 relationship probability curves vs. cM values are shifted to the right by about 30-50 cM. Example: For a given hypothesis (e.g., #3) with a 2C1R relationship, V1 reaches a 0.5 Prob at 137 cM while V2 requires only ~105 cM.

    Do you know why the Ancestry data set changed/shifted so much? I can understand some 2-5 point shifts as more data is acquired. I do not understand a 20+ point shift in probabilities for a given cM level. Is this new Ancestry data set posted somewhere?

    Also, having two different data sets operating in the Shared CM Project 4.0 tool v4 and the WATO V2 tool is confusing to all. A release of a V5 version of the former tool with the updated Ancestry data set is required if it is the better data set.

    Alternately, there are many improvements in WATO V2 outside the data set. Could a WATO V2- (Minus) tool be created that utilizes the WATO V1 data set until both models release simultaneous upgrades in the future?

    1. The probabilities are based on simulations, not empirical data. The original probabilities assume a population growth rate of 2.5 children per generation. See this post to understand why the population growth rate matters: https://thednageek.com/you-cant-get-there-from-here/

      I have not been able to find out what they did differently for the v2 probabilities, not for lack of trying. The v2 data is posted in your match list in the info that pops up when you click on the shared DNA amount.

      Both sets of probabilities are available in both the WATO and SCP tools. The beta WATO site is used to test out features before they’re ported to the main tool.

  12. Re: “Both versions of WATO are identical except for the probabilities.”

    While I agree they are likely identical in a computational logic sense, they are not identical in their features (e.g., GEDCOM Import, Suggested Hypotheses, Spouse Names, Score Overlays,…. described above). Having a variation of WATO with these added features AND the V4 edition of the Shared cM Tool probabilities would keep Shared cM and WATO analyses in sync for near-term analyses and provide the user with more amenities.

  13. So V1 has been updated with the additional features of V2? Was this announced and I missed it? If so, I am sorry to keep pulling on this string.

  14. Couldn’t this situation be resolved by adding the two levels above to the tree, so that the first cousin relationship is taken into account? Or is this something that WATO can’t do for some reason? (Just asking out of curiosity; I understand that the results will be fine in this case apparently without changing something.)

    1. WATO is based on a set of probabilities that assumes everyone is related in only one way. For that reason, it cannot handle double-cousin relationships.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.