Scroll down for links to other posts in this series.
Ruth: Using Probability to Guide Future Testing
The case of the GIs in Germany (Part 4) demonstrated how matches in the 1C–2C range could be used to show which of two brothers had fathered a child during World War I. Key concepts were that weak support for an hypothesis should not be taken as proof and that testing additional family members can add confidence to a conclusion. In the German case, the child’s grandparents had already been identified, and there were only two hypotheses for who the father was.
This post will explore a more challenging unknown parentage case. The initial DNA matches weren’t as close, meaning that there were more possible hypotheses for how the searcher, Ruth, fits into their family. I will take you through a few phases in Ruth’s search to show you how multiple hypotheses are formed, how they are interpreted to guide future testing, and how they are adjusted as new information comes in.
Ruth’s case has not been solved yet. This example is still a work in progress.
Ruth was born in 1963 and placed for adoption. Her birth mother (b. 1942) has been identified, but her biological father is still unknown. We assume he was born around 1940.
Ruth enlisted the help of Carol Rolnick, a well-known search expert, to figure out who he was. Using autosomal DNA testing, Carol initially identified two paternal matches to Ruth. Marie (136 cM shared) and Katie (140 cM) are 2C1R to one another through a couple named Pal (b. 1834) and Petronella (b. 1829). Given the shared DNA amounts, it’s reasonable to assume that Ruth is also descended from Pal and Petronella. But how?
Is she a 1C2R to Katie via Matilda? Is she a great grandchild or 2-great grandchild to Johanna I and thus a 2C or 2C1R to Marie? Or is her connection to this family through another child of Pal and Petronella?
Pal and Petronella
The first phase in identifying Ruth’s biological father revolved around two questions: (1) which child of Pal and Petronella was Ruth’s ancestor? and (2) how many generations removed is Ruth from Pal and Petronella?
I initially defined five explicit hypotheses for relationships. I like to visually map them out, as shown below, with the DNA matches highlighted in yellow and the hypotheses in pink. Ruth doesn’t share enough DNA with either Marie or Katie to be a 1C1R to them, so there’s no need to propose hypotheses for those relationships. Pal and Petronella had three other children besides Johanna I and Matilda: Carolina, Sophia, and Charles. Because we don’t have data for descendants of those three siblings yet, we can treat “descent from Carolina, Sophia, or Charles” as a single hypothesis for now. That may change as we collect more DNA matches.
For each hypothesis, we must determine the relationships to Marie and Katie, then run them through Jonny Perl’s probability calculator. The input is shown:
I’m sure you can intuitively guess that the results would not be very conclusive, and you’d be right. In order of likelihood, the odds are: H3 (8.5 odds), H2 (7.4 odds), H1 (6.5 odds), H4 (1.06 odds), and H5 (1.0 odds).
Recall that the higher the odds, the more likely a given hypothesis is, but an hypothesis must have odds that are at least 10 and preferably 20 times higher than the others to be considered statistical supported. None of these hypotheses is sufficiently better than any of the others to conclude how Ruth fits into this family. However, the odds can be used to guide additional testing. Based on these numbers, I would try to test other descendants of Johanna I, Carolina, Sophia, or Charles. The descendants of Matilda can be considered low priority for now.
Carol used traditional genealogy skills to trace additional descendants of Pal and Petronella. Armed with this information, she identified six other members of the family who had already tested at one or another of the DNA companies. I added them to a simplified version of the graphic.
Now that we have data from Charles’ line (from three siblings who are his great grandchildren), we can consider one more hypothesis, that Ruth is descended from him. A preliminary assessment tells us that Ruth can’t be a 2C or closer to the three tested siblings, because she shares no DNA with one of them. (There are no known 2C who do not share DNA.) Perhaps she’s a 2C1R to them. My gut tells me this is unlikely, given how little DNA she shares with them. The nice thing about this hypothesis approach, though, is that we don’t need to rely on gut instinct. We can simply add another hypothesis (H6) and test it.
The entry table for the calculator looks like this:
The recalculated odds are: H2 (415,037), H3 (141,298), H1 (125,612), H4 (9,729), H5 (9,179), and H6 (1). My gut was right! H6 is 9,000 times less likely than the next hypothesis (H5). In fact, the difference in odds is so stark, we can safely eliminate H6 as a possibility. We can also probably rule out both H4 and H5, as they’re both so much less likely than H2. Note that when we only had two matches to consider, H3 was favored slightly, and we couldn’t rule out any hypotheses. Now, with more data, H2 is the most likely, and some hypotheses are no longer viable.
The most likely hypothesis points toward Johanna I as Ruth’s 2-great grandmother. It’s also possible but less likely that Johanna I, Carolina, or Sophia was Ruth’s great grandmother. Johanna I had 13 children, ten of whom had children of their own. To further the search, Ruth and Carol did targeted testing of a few more of Johanna I’s descendants. The family chart gets a bit busy at this point. (Note that I’ve omitted H6, but I’ve kept H4 and H5 for teaching purposes.)
The revised odds are: H2 (354,218), H3 (64,952), H4 (159), H5 (150), and H1 (1.0). We now have a much better idea about which hypotheses are viable. We already knew that H4 and H5 were probably incorrect, and these new numbers confirm that; the odds for both are at least 400 times less than those for H3. What’s interesting is that H1, which was ranked third previously, is now the worst one. Those new matches show how incredibly unlikely Ruth is to be a great grandchild of Johanna I. The difference in odds between H2 and H3 is still only about 5.5-fold, so additional testing is needed to confirm where Ruth fits into the picture.
Time to Retrench
Johanna I had 13 children. Descendants of seven have been tested. The DNA amounts that Ruth shares with those descendants completely rule out four of those children as Ruth’s great grandparents. Three more had no children of their own, and another married too late to have had a grandson born around 1940, when Ruth’s birth father was probably born. (That is, this child of Johanna I would have fit H1 but not H2, and we’ve ruled out H1 statistically).
This table summarizes the status of Johanna I’s children with respect to Ruth.
Carol and Ruth are currently focusing on descendants of Alma and Otto for testing. Additional data should inform whether one of them was Ruth’s great grandparent; whether they should reconsider Matilda, Anne, or Marie; or whether the focus should shift to Carolina or Sophia (H3).
Follow the Data
Ruth’s example reinforced two lessons we learned previously: (1) weak support is not proof and (2) additional testing can increase our confidence in an hypothesis.
Her case also introduced some new ideas.
- First, we can test several hypotheses at once. In theory, there’s no limit, but in practice you’ll want to limit yourself to a manageable number.
- A second, related, point is that we can add or eliminate hypotheses on the fly as more data come in. In Phase II of Ruth’s case study, I added a new hypothesis (H6), tested it, then eliminated it when it proved to be unsupported. In Phase III, I could have omitted H4 and H5 from the analysis, because they’d been excluded; I chose to keep them to show how much the relative odds of H1 dropped with the latest batch of DNA matches.
- Finally, even when the statistics don’t give you “the answer”, they’re still quite valuable. DNA testing can get expensive, and lab processing takes weeks or months. Perhaps the biggest promise of this method is as a way of deciding where in the tree to invest your resources. For Ruth’s search, the focus should be on the descendants of Johanna I’s children Alma and Otto.
Other posts in this series can be found here:
- Part 1 — Basic Probability
- Part 2 — Testing Hypotheses
- Part 3 — DNA Painter Look-up Tool
- Part 4 — GIs in Germany
- Part 5 — Ruth: Using Probability to Guide Future Testing (you’re here!)
- Part 6 — Ted, or When Close Relatives Aren’t Available
- Part 7 — The “What Are the Odds?” Tool
10 thoughts on “Science the Heck Out of Your DNA — Part 5”
I remain confused, so please bear with me. In part 4, you showed a lot of extra input on the probability calculator (matches to matches as well as primary to matches). In part 5, you showed only primary tester to matches.
How does the calculator know who the secondaries are (matches to matches) and how does including them enhance the search?
In Part 4, the “secondary” was also a direct descendant of the unknown grandfather, so she wasn’t really secondary.
Many thanks for this series – it’s hugely interesting, even though I’m finding it a challenge to apply the lessons to my own mysteries! Forgive me if I’ve missed something obvious in the Ruth example … why is it assumed the father is roughly the same age as the mother? Couldn’t he be a much older man?
Thanks again, Fern
Yes, it could be a much older man, and I could have added another set of hypotheses with Ruth being one generation higher up than in the hypotheses I used. The cM shares don’t work for that arrangement, though.
Comments are closed.