Scroll down for links to other posts in this series.
I presented a talk on this method at the i4GG conference
in December 2017. The video is available for purchase here,
either individually or as part of the all-conference package.
GIs in Germany: Which Brother?
In late 1919 or early 1920, an American GI had an encounter with an adolescent girl from a rural village in Germany. Nine months later, a son was born. The young woman grew into an old one and passed away without ever telling her son, E, or her grandchildren who the GI was.
J, a grandchild of this liaison, turned to DNA testing to identify his unknown grandfather. By the time I consulted on the this case, his grandfather had been narrowed down to one of two American brothers, LD and PD. At the time of conception, PD was 19 and stationed in the same village as the girl, while 27-year-old LD’s unit was only 15 km away. Proximity would favor PD as the father, but LD could easily have visited the village to see his kid brother and met the young lady there.
The autosomal data pointing to the brothers consisted of five DNA matches to members of the same American family. For a scientific approach, we express the two possibilities as hypotheses. The two hypotheses are labeled in red in the McGuire chart below: Hypothesis 1 (H1) is that PD was J’s grandfather and Hypothesis 2 (H2) is that LD was.
At first glance, the DNA results are problematic, because the match to CD2 (493 cM) suggests a half 1st cousin relationship, which would support H1, while the match to GD (855 cM) fits a half uncle relationship, supporting H2. The match to DH (318 cM) is right in the middle of the expected values for a half 1st cousin (H2) and a 2nd cousin (H1), so that’s not much help either.
So which brother was it?
Tobias Kemper, a German genealogist, asked for feedback on this this case in the Facebook group Genetic Genealogy Tips & Techniques. It was a perfect opportunity for the probability approach. For each hypothesis, I determined the relationships between J and his DNA matches, manually looked up each probability, then multiplied them to get the compound probability. (Note that the relationships to CF and LE don’t change under either hypothesis.)
A few observations: First, all those lookups are exceedingly boring, and each one has the potential for a transcription error that would mess with downstream calculations. Second, the compound probabilities get really small—and hard to read—really fast. Third, 0.00027 vs 0.00042 doesn’t mean much to the average person; we need a metric that’s easier to understand.
Jonny Perl’s Probability Test Tool
The solution to all three problems can be found in another fabulous resource from Jonny Perl: the Probability Test Tool, which converts the lookup table and calculations we’ve already been using into a handy online calculator. It looks something like this when you first open it:
For J’s case, we click “Add new relationship hypothesis” to incorporate Hypothesis 2 and then click “Add new match” five times, once for each DNA match. Then, we fill in the comparisons, cM values, and relationships from the table above. The relationship fields have a handy pull-down menu that lets you select the correct designation.
When we click “Calculate Probabilities”, each individual probability will appear below the respective relationship.
Even better, the tool will calculate the compound probabilities, rank them from most to least likely, and also compute the odds of each one. This information is presented above the data input table. Odds are far more intuitive than compound probabilities; the larger the odds, the more likely the hypothesis is.
Here’s what that output looks like for the two GIs in Germany. (You can ignore those massive decimals in parentheses; focus on the odds.)
We see that the odds are about 1.5 for H2 (that LD was J’s grandfather) and 1.0 for H1 (that PD was J’s grandfather).
O Brother, Which Art Thou?
The difference in odds between H1 (1.0) and H2 (1.5) isn’t nearly enough to draw a conclusion about which brother was J’s grandfather. Ideally, we’d like one hypothesis to have odds at least 10 or 20 times greater than the other hypotheses.
We needed more data.
Fortunately, J’s sister C was willing to test. The McGuire chart below shows C’s cM shares to CD2, DH, GD, and LE. (She and CF tested at different sites and could not be compared to one another. This doesn’t affect our hypothesis testing at all, because CF is a 2C1R to C either way and can’t help distinguish between H1 and H2.)
Next, we add these four additional comparisons to ones we already had in Jonny Perl’s probability tool.
What does this tell us? Let’s look.
Well! Well! Well! That certainly changes things. Not only is H1 now favored instead of H2, but H1 is favored strongly: 124-to-1 odds. Recall that we want one hypothesis to have odds at least 10–20 greater than the other. We’re well beyond that limit!
We can conclude that PD was J’s grandfather. This agrees with the circumstantial evidence that he was stationed in the young lady’s village and that his age was closer to hers.
More Data Is Better Data
If the motto of this story is Probabilities Rule!, the moral is that more data is better. Recall that when we first ran the numbers with just J’s data, the LD hypothesis (H2) was supported, but only weakly. Because the odds were only 1.5-to-1.0, we couldn’t draw a firm conclusion. However, when we added J’s sister C, not only was the other hypothesis favored (H1, that PD was the grandfather), the odds strongly supported it. This is a very important point! Weak support for an hypothesis can be used to guide future testing, but it shouldn’t be used to draw conclusions.
Just a few days ago, Tobias messaged me with new information. After we’d identified PD as J and C’s grandfather, DH’s brother decided to test. BH shares 335 cM with J and 222 cM with C. These two new comparisons factored into the overall calculations shifted the odds from 124-to-1 to 791-to-1 in favor of the PD hypothesis .