We may not know exactly how we’re related to our autosomal DNA matches, but we know we are related to them somehow. Is the reverse true? If we don’t match someone, does that mean we’re not related?
That is, is the lack of a DNA match negative evidence? Elizabeth Shown Mills defines negative evidence as “conclusions or implications that can be drawn from the absence of a situation that should exist given the circumstances.”
That all depends. It depends on how close the expected relationship is. First cousins always share measurable autosomal DNA, so if Joyce and Sean are presumed to be 1st cousins but don’t match one another, we can safely conclude that they are not biological 1st cousins using negative evidence. Eighth cousins, on the other hand, rarely share atDNA; if two suspected 8th cousins don’t match one another, we can’t conclude anything about their shared ancestry.
It also depends on how many relatives we are talking about. There’s a roughly 50% chance that you wouldn’t match a single 4th cousin, but only a 25% chance that you wouldn’t match two such cousins via the same ancestral couple. The chance declines with each cousin added to the mix.
Not matching just one 4th cousin doesn’t mean anything; not matching ten is evidence—negative evidence—that you are not related the way you thought you were.
This looks like a job for stats!
The non-match probabilities for various cousin levels are presented in a set of colorful charts at HAPI-DNA, a website created by Cornell University professor Amy Williams and colleagues.
The information is presented separately for full relationships and for half relationships. It even shows the expected number of segments!
The HAPI-DNA charts give the inverse of what we need for negative evidence, though. They show the percent of the time that two relatives of a given degree do match, whereas we want to know the percentage that they don’t match. Actually, we want to know the probability rather than the percentage.
Fortunately, that’s all easy to calculate: just subtract the HAPI-DNA percent from 100%, then divide by 100 to convert to probability. For example, according to HAPI-DNA, 4th cousins share measurable atDNA 48.5% of the time, meaning that 51.5% (= 100% – 48.5%) don’t share atDNA. That’s a probability of 0.515 and pretty close to the 50% approximation I used above.
The data are summarized here:
Once we have the individual probabilities, we can calculate the likelihood that we wouldn’t match multiple cousins of a given level. The math is simple: it’s the probability that we wouldn’t match one cousin times the probability that we wouldn’t match another cousin (and another and another).
The chance we wouldn’t match two 4th cousins using HAPI-DNA numbers is 0.515 x 0.515 (or 0.5152) = 0.265 = 26.5%. The chance we wouldn’t match three 4th cousins is 0.5153 = 0.137 = 13.7%. And the chance we wouldn’t match ten 4th cousins is 0.51510 = 0.0013 = 0.13%.
We can even mix-and-match cousins. The probability that we wouldn’t match a 3rd cousin and a 4th cousin and a 5th cousin is 0.082 x 0.515 x 0.841 = 0.036, or 3.6%.
If you prefer the odds ratio used in the WATO tool, you can divide 1 by your probability. (1 is the probability of no match if you’re not related). For the mix-and-match scenario, that would be an odds ratio of 1/0.036 = 28, which is considered strong evidence against the hypothesis. For ten 4th cousins, the odds ratio is 1/0.0013 = 762. This is very strong evidence against the proposed relationship.
Negative Evidence in Practice: The Case of Vincent
(This example is used with permission. Some details were changed for privacy.)
Vincent’s father John was born in late 1916 in Essex, England. His mother, Lizzie, was married to a German national at the time. After the sinking of the RMS Lusitania in May 1915, the British government placed all non-naturalized Germans in internment camps, including Lizzie’s husband. He could not possibly have been John’s biological father. But who was?
Vincent took a Y-DNA test that matched him to two men with the surname HENRY. Unfortunately, he does not have particularly promising atDNA results.
At the time of John’s birth in 1916, Lizzie gave her address as 12 Woodham Terrace. In the 1911 census of England, there are two brothers, William and Walter HENRY, living at 13 and 15 Woodham Terrace, respectively. They were the sons of William HENRY the elder and his wife Elizabeth BARTON.
This was a hallelujah moment! Or was it?
Vincent had only very distant atDNA matches to the HENRY surname, and his yDNA matches had no information. An outreach campaign based on family trees unearthed five BARTON descendants who said they had done the AncestryDNA test. None of them match Vincent.
What does this tell us? If one of the HENRY brothers were John’s biological father, BARTON 5 would be a 3rd cousin once removed, with a no-match probability of 0.273. The lack of shared DNA here is not a big deal.
But BARTON 1 and BARTON 2 would both be half 1st cousins once removed if our scenario were true. Their no-match probabilities are 0.0003 each. Throw in BARTON 3 and BARTON 4 (hypothesized half 2C) at 0.012 each, and our overall probability is 0.0003 x 0.0003 x 0.012 x 0.012 x 0.273 = 3.5 x 10-12.
The odds ratio is 282,639,171,528. In other words, we appear to be barking up the wrong tree. (See what I did there?)
I’m not gonna lie: Giving up on this hypothesis is hard. Very hard. The HENRYs were next door to Lizzie! That’s a mighty huge coincidence.
The scientific approach requires that we always consider “What else could explain what we’re seeing?” Well, it turns out that William HENRY the younger was born 6 years before the next sibling. That gap suggests he might be the child of a first marriage for his father, the elder William HENRY. If so, the BARTON descendants would only be related to the younger William by marriage and wouldn’t be expected to share atDNA with his descendants.
So far, we haven’t documented a first marriage for the elder William, and we don’t have a birth record for the younger William naming his mother. Elizabeth BARTON’s grandchildren from her second marriage say that her first husband is a mystery to them. But at least we know what to look for. And if we can’t find it, we’ll have yet more negative evidence.
Meanwhile, the brick wall stands.
Updates to This Post
14 November 2021 — Corrected a typo in the table
9 thoughts on “DNA as Negative Evidence”
Or William Henry the elder could possibly be the father. It happens! Though starting with the eldest son makes the most sense.
William the elder died in 1893 (conception was in 1916), but it could have been one of his brothers or nephews. Unfortunately, we haven’t been able to track his parents, and Elizabeth’s descendants from her second marriage say he’s always been a mystery to them. I never give up, though!
Excellent article!! Sooo timely – thanks, Leah!
You’re welcome! Stay tuned for a couple of follow-up posts.
Thank you for sharing this! Very helpful.
Wouldn’t simply multiplying the odds only be valid if the individual odds were truly independent?
The connections in the paths from Vincent to Barton 1-4 share four parent-child links with each other. The parent of Barton 1 & 2 might share specific segments of DNA with Vincent, and the odds that Barton 1 gets a given segment is about 50%, that Barton 2 gets it also 50%. For Barton 3 & 4, 25% odds of getting any segment. Those 50% and 25% figures are independent.
If Vincent shared more or less DNA with his half-great-aunt/uncle, the odds of sharing with any of Barton 1-4 would all go up or down together.
I don’t know any obvious way to calculate the correct composite odds. If we use the figure for the amount of DNA shared with a half-great-aunt/uncle as 6%, then multiplied by the odds of the kids or grandkids NOT getting a segment (50% * 50% * 75% * 75%) we’d get about 0.8%, about 1 in 120 odds. Of course, that’s multiplying apples and oranges, giving us fruit salad rather than “real” numbers. More to the point, though, is that the amount share with the half-great-uncle/aunt could vary from about 2.5% to 9%, and that the odds of sharing with each descendant would rise and fall together, rather than independently.
There might be a way to run a simulation using conditional probabilities that could produce estimated values, but the simulation would have to take into account each specific scenario. Maybe Johnny Perl has some clever solution…
You’re absolutely right that Bartons 1–4 are not truly independent, and I’ve drafted a follow-up post explaining (a) why it’s a concern and (b) why I did it anyway. Then I got distracted. Squirrel!
To your last point, there is a program in development that can calculate probabilities based on bespoke simulations. Stay tuned!