Scroll down for links to other posts in this series.
I presented a talk on this method at the i4GG conference
in December 2017. The video is available for purchase here,
either individually or as part of the all-conference package.
Last year, I wrote a post entitled “The Limits of Predicting Relationships Using DNA” in which I shared a statistical approach for deciding which relationship to an unknown DNA relative is most likely. For example, if I have a new match who shares 260 cM with me, the probability table in that post would tell me that there’s a 62% chance that they’re a 2nd cousin (or equivalent), a 23% chance they’re a second cousin once removed (or equivalent), and a 15% chance that they’re a first cousin once removed (or equivalent).
I can use this information to guide where to start looking in my family tree for the connection.
Unknown parentage cases have a different challenge. There may be several DNA matches who all descend from a certain couple, but we don’t know where the searcher belongs in that family tree. Consider the example of Ruth, who has an unknown birth father:
Based on how much DNA Ruth shares with Katie (140 cM) and Marie (136 cM), she could fit into the tree in several different places. We need to test other descendants to narrow down the possibilities. Because the costs of DNA testing add up quickly and because each test can take weeks or months to process, we want our decisions to be guided by the best evidence available. How can we use the existing DNA matches to decide which scenario is most likely and, therefore, which branch of the tree to focus on first?
I am very excited about a new tool that I’ve been working on with Dr Andrew Millard and Jonny Perl to address that exact question. I had the inchoate idea of applying probabilities to situations more complex than one-to-one comparisons, Andrew created a working model that calculates the odds for specific scenarios, and Jonny has been creating an online tool to make the process easily accessible. I am going to describe that tool in a series of posts, beginning with this one.
What’s With the Title?
I’m sure you’re wondering why I titled this series “Science the Heck Out of Your DNA”. Short answer: It’s a (sorta) quote from the movie The Martian (sorta because he doesn’t say “heck”). I love the movie because the main character is an unrepentant science geek who uses his botany powers to survive on Mars. He also does a lot of math. And while botany powers aren’t likely to solve any genealogical brick walls, scientific thinking and math just might.
First, Some Basics
The probability table in my earlier post can tell me the likelihood that someone who shares a given amount of autosomal DNA with me is a 2nd cousin or 3rd cousin or whatever. But what if I have two DNA matches? Or three? Or more? Each match shares a different amount of DNA and could be related to me multiple ways. How do I figure out which scenario is most likely?
Here, we can turn to a basic rule of statistics: If you want to know the combined probability of two independent events (meaning, the chance that both events will happen), you simply multiply the probability of Thing 1 by the probability of Thing 2 (with apologies to Dr Seuss).
And if you want to know the odds of three, or four, or six, or more independent things happening, you multiply the probability of the first by that of the second by that of the third … and so on. The product is called a “compound probability”. (We’ll come back to what “independent” means later.)
In my next post, I’ll describe how we can apply these ideas to a simple genealogical problem. Subsequent posts will expand to more complex cases.
Other posts in this series can be found here:
- Part 1 — Basic Probability (you’re here!)
- Part 2 — Testing Hypotheses
- Part 3 — DNA Painter Look-up Tool
- Part 4 — GIs in Germany: Which Brother?
- Part 5 — Ruth: Using Probability to Guide Future Testing
- Part 6 — Ted, or When Close Relatives Aren’t Available
- Part 7 — The “What Are the Odds?” Tool