Scroll down for links to other posts in this series.
Ted, or When Close Relatives Aren’t Available
About 3 years ago, DNA delivered an unexpected shock to KC: her father’s father was not his father. Ted, her father, tested at Family Tree DNA but shared only 895 cM with his sister’s daughter, well outside the range for a full niece. That niece matched first and second cousins related through the man named on Ted’s birth certificate, but Ted did not. KC had a misattributed paternity event (MPE) on her hands. Complicating matters, Ted’s closest paternal DNA match shared only 59.9 cM, and Ted passed away unexpectedly before he could test elsewhere.
KC tested herself and two of her paternal half-siblings at AncestryDNA and transferred those kits to Family Tree DNA, GEDmatch, and MyHeritage to increase her chances of finding people who were related through her father’s biological father. She also tested herself (but not her siblings) at 23andMe.
With a lot of work and even more perseverance, KC was able to home in on a couple who were probably Ted’s 2-great grandparents: Nicholas Hansgen (1834–1899) & Mary Eva Daniel (1838–1908). They were both native Germans who settled and married in Ohio, USA, in the mid 1800s. There, they raised a large family, with 10 children surviving to adulthood. One of those children was Ted’s great grandparent (or possible grandparent), but which one?
Ted was born in 1944 in Washington, D.C. when mother was 24. The Hansgen family lived in Ohio. It seemed likely that KC was looking for a man who was born between 1915 and 1920 and who found himself in D.C. during World War II, probably through military service.
One of the fun (by which I mean: extremely frustrating) complications of this case was the lack of DNA matches who were close enough to give a definitive conclusion. Another was the fact that most of the Hansgen descendants had tested only at AncestryDNA, but Ted’s data was at Family Tree DNA. In practice, this meant that only one of the probabilities in our calculation was for a comparison between Ted and a Hansgen cousin, while all of the others were for comparisons involving his children. The more distant the connection, the lower the predictive value of the shared DNA. Ideally, all of the comparisons would have been to Ted himself, but that wasn’t possible.
The Hansgen Family
Let’s look at the family of Nicholas Hansgen & Mary Eva Daniel. DNA matches are shown in yellow and possible placements for Ted and his children are shown in pink. The numbers along the bottom four lines show the shared amounts to Ted and his children KC, T.C., and C.C.
The first thing to notice is that there was a 24 year spread between the oldest Hansgen child (Mary b. 1858) and the youngest (Emma b. 1882). That made it harder to determine which generation Ted was in. That is, was he a 2-great grandchild of Nicholas and Maria Eva through one of their older children or a great grandchild through one of their younger ones?
We can work through the children of Nicholas and Mary Eva one by one to determine where Ted could fit into the tree:
- If Mary were Ted’s great grandmother, Ted’s children would be 2C1R to J.M., yet KC and C.C. share no DNA with J.M. Although it’s theoretically possible for 2C1Rs to share no DNA, the chance that two pairs of 2C1R share no DNA and that a third pair shares only 13 cM is practically nil, so we can exclude Mary as Ted’s direct ancestor.
- John‘s grandchild, G.H., shares 60 cM with Ted, which rules out a 1C1R or 1C2R relationship, so we can exclude this line. Other tested descendants of John include three siblings, L1, L2, and L3, who share little to no DNA with Ted’s children.
- Frederick had a grandson, W.P., who was born in 1923 and who served in World War II. W.P. is a candidate for Ted’s father; this is Hypothesis 1. Of note, W.P.’s father was the great uncle of the L siblings, meaning that if W.P. were Ted’s father, Ted’s children would be 2C1R to L1, L2, and L3.
- Michael had children but no grandchildren, and his only son died in 1912 so could not have been Ted’s father.
- Anna had no children.
- Joseph fathered a single child, a daughter who had two sons, born in 1916 and 1918 respectively. The younger son, H.C., served in World War II. He is Hypothesis 2.
- Nicholas had five children. His grandchild D.H. shares only 16 cM with KC at 23andMe. (Comparisons to T.C. and C.C. were not available.) Nicholas’ line is ruled out, because 16 cM is not enough shared DNA for D.H. and KC to be 1C1R or 1C2R.
- Andrew‘s grandchild D.K. shares enough DNA to be a possible 1C2R to Ted’s children, but Andrew did not have any grandsons who would have been Ted’s father.
- Herman had a grandson, H.H., who was the right age to be Ted’s father and who served in World War II. H.H.’s children were not open to communication, but fortunately his first cousin, N.H., tested. N.H. shared enough DNA to be 1C2R to Ted’s children. H.H. is Hypothesis 3.
- Emma was the youngest of the Hansgen children. The only one of her descendants who fits the bill was her son P.E., who was born in 1914 and served in World War II. Hypothesis 4 assumes that Ted is one generation older than in the other three hypothesis.
Normally, I would look for DNA matches through the in-laws of Frederick, Joseph, Herman, and Emma to try to bolster one hypothesis over the others (because if Ted’s descended from Frederick, for example, he is probably also descended from Frederick’s wife). However, there simply weren’t many. The few that could be found were connected to Joseph’s wife Edna Laura Potter (1870–1958), but they were all quite distant, generally a single shared segment of 20 cM or less—certainly not enough to conclude safely that Joseph was Ted’s great grandfather. Those shared bits of DNA matches could theoretically be due to shared populations rather than recent common ancestry.
Do the Math
Running the numbers for this case requires extra care, for a few reasons.
First, Ted can be compared directly only to G.H., whereas the other comparisons are to Ted’s children. His children must be ignored in the comparison to G.H., because the amount they share with G.H. is not independent of the amount Ted shares with G.H. (Similarly, a child of G.H. and of D.K. has tested, but they are ignored in the calculations because they are not independent of their parents.)
Second, three of Ted’s children tested, so for each “outside” match, we need to consider three comparisons. For example, the match to J.M. requires one statistical comparison for KC, another for T.C, and a third for C.C.
Another tricky aspect is the secondary connection between W.P. (Hypothesis 1) and the L siblings through W.P.’s father. Thus, if W.P. were Ted’s father, Ted’s children would be 4C to the L siblings through the Hansgen line and also 2C1R to the L siblings through the alternate line. In this case, I used the closer relationship (2C1R) in the calculations. (We do not currently have the statistical data to analyze the double relationship of both 2C1R and 4C).
Finally, D.H. cannot be compared to T.C. and C.C. because D.H. tested at 23andMe while T.C. and C.C. did not. D.H. could not be encouraged to transfer to GEDmatch to compare there.
There are a total of 20 comparisons in this analysis. The input table becomes quite large.
The probabilities are:
H1 odds = 1
H2 odds = 4,138,441,600,129
H3 odds = 47,926,453,381
H4 odds = 0
Yeah, okay, I’ll be the first to admit those numbers are kind of crazy!
The huge disparity between H2 and H3 on one hand and H1 and H4 on the other is obvious, and the latter two can be discarded as possibilities without further consideration.
Recall that the “winning” hypothesis should have odds that are at least 10–20 times higher than those of the next best hypothesis. Do we have that here? H2 has odds of 4.1 trillion, while H3 is “only” 47.9 billion. (Yes, it’s okay to laugh. You just can’t make this stuff up!) The odds for H2 are more than 80 times greater than the odds for H3, well beyond our threshold of 10–20 fold. We can conclude that H2 is the correct hypothesis and that Joseph Hansgen was Ted’s great grandfather.
The Potter Link
Recall that there were distant connections to the family of Joseph Hansgen’s wife, Edna Potter. Those small amounts of shared DNA make sense in light of the statistical analysis, although they were too low to be considered conclusive evidence on their own.
In late 2017, KC used the free autosomal transfer at MyHeritage to get her and her father’s DNA results into that database. There, Ted matched D.W., a great grandchild of Edna Potter’s sister Flora, making D.W. a potential 3rd cousin to Ted. This match seemed like a slam dunk for our hypothesis!
But there was one problem: at the time, the matching system at MyHeritage was unreliable, sometimes reporting shared DNA amounts that were much higher than at the other sites. (Growing pains were to be expected for the newest player in the DNA matching market.) Although by this point I was quite certain that Joseph Hansgen was Ted’s great grandfather, I suggested to KC that we wait for the expected overhaul of MyHeritage’s matching system. When the much-anticipated update arrived, Ted and D.W. shared 126 cM, well in range for a 3rd cousin relationship.
Joseph and Edna had a single daughter, Cara Mae Hansgen (1892–1960). She, in turn, had two sons with her husband, Joseph Martin Carroll (1879–1972). One of those sons was Ted’s father. We may never know which one, though, because neither of them had known children who could be tested. The older son, J.C., seems to have remained in Ohio, while the younger one, H.C., served in World War II, so H.C. is the most likely candidate.
KC did an incredible job over 3 years piecing together DNA results and family trees to identify Nicholas Hansgen and Mary Eva Daniel as her ancestors. By the time we started working together, she had already developed some clear hypotheses and lacked only a reliable way to rank them.
A huge challenge to her search was the fact that there were so few DNA matches on the non-Hansgen sides of her biological grandfather’s family. Joseph Carroll (Cara Mae Hansgen’s husband) had two siblings, but neither of them had children. Joseph’s parents were from Ireland, and their lines have been untraceable thus far. And until the match to D.W. showed up at MyHeritage, there was little evidence to support the idea that Edna Potter was a direct ancestor, either. As a result, targeted testing (that is, of someone who would be a close relative under a given hypothesis) simply wasn’t an option.
Normally, that’s how the probability tool should be used: to guide future testing in the most direct and cost-effective manner, not necessarily to reach a conclusion based solely on 2nd to 4th cousin matches. For Ted and KC, though, targeted testing wasn’t an option. Testing multiple family members and calculating probabilities were the only ways to confirm who Ted’s father was before D.W. showed up.
Other posts in this series can be found here:
- Part 1 — Basic Probability
- Part 2 — Testing Hypotheses
- Part 3 — DNA Painter Look-up Tool
- Part 4 — GIs in Germany: Which Brother?
- Part 5 — Ruth: Using Probability to Guide Targeted Testing
- Part 6 — Ted, or When Close Relatives Aren’t Available (you’re here!)
- Part 7 — The “What Are the Odds?” Tool
14 thoughts on “Science the Heck Out of Your DNA — Part 6”
Very deep and instructive example – thank you!
Love the demonstrative use of 0cM “matches” in the calculation – at first this didn’t make sense to me, but then I realized this too a bonafide measurement just like all the other non-zero matches are.
Another thot – wouldn’t the distribution for a combined (aka double) relationship be the convolution of the two individual distributions? I.e. it seems to me one could calculate the probability vs. cM distribution for a 2C1R+4C relationship from the individual distro for 2C1r and 4C? Why would we need statistical data to do this – the basis of this method you outline here uses Ancestry’s simulated data, not statistical measurements like Blaine’s cM Project.
This is a fantastic series!!!! A comprehensive tutorial that is easy to read. Thanks for posting these!
Anxiously awaiting the bounty of Part 7!
(Does anyone remember that old Kohl’s commercial with the woman pressed up against the glass door chanting, “Open! Open!”? That’s me.
Teaser: You’re going to *love* part 7! Jonny Perl has a fabulous upgrade in the works that’s currently in beta testing.
In my current analysis I am comparing hypotheses about the connections between two family groups 3 to 5 generations ago. I have match data for multiple branches from an originating ancestor for each family group. Because the hypotheses are about how the older generations are connected, I need to use the table based tool instead of the graphical WATO tool. I have a series of pairwise relationships of people from family group A & family group B.
My question is – in setting up the table, should I only enter lines where the relationship differs across hypotheses, or should I also enter lines for relationships that do not change between hypotheses? In other words, the people in family group A are related to each other the same way regardless of the hypothesis and likewise for the folks within family group B. The hypotheses are all about how family groups A & B are related to each other.
If the relationship is the same no matter the hypothesis, then you don’t need to enter it. Including those people will affect the overall probability but not the odds ratio, which is what we’re using to compare hypotheses.
Comments are closed.