Negative evidence is the absence of something you would expect to see under the circumstances. We use negative evidence daily in life, without even being aware of it. For example, if your neighbor always parks in her driveway and her car isn’t there, you would naturally assume she isn’t home.
We can also use negative evidence consciously and deliberately to draw conclusions about genealogy. An earlier post described Vincent, who was expected to share DNA with a group of five people. When he didn’t, we concluded that he was not related to them the way we assumed.
We can make inferences about our DNA matches even if we don’t have direct access to their DNA kits. Consider Cousin Bob, who shares 650 cM. This DNA amount is well in range for a first cousin, but it’s just as likely (in fact, slightly more likely) to be a half first cousin. Given the right DNA matches, you can use negative evidence to figure out which is true.
How? If a senior 1C1R (first cousin once removed in the older generation) has tested on each side, you can used shared matches to see whether Bob matches both of them.
If he’s your full first cousin, he should. But, if Bob matches the 1C1R on your grandmother’s side and not the one on your grandfather’s side, then he’s a half cousin rather than full. That’s because 1C1Rs always share DNA.
Shared matches allow us to draw conclusions about our matches even if we can’t see their match lists ourselves. All of the main DNA databases have some version of this. It’s called “Relatives in Common” at 23andMe and “ICW” (in common with) at FamilyTreeDNA, but they do basically the same thing: for any given DNA match (like Bob), the feature shows you which of your other matches also share DNA with that person.
There’s one exception, though. AncestryDNA’s version of this feature has a threshold. It only shows shared matches who share at least 20 cM with both you and the relative of interest. If two matches share 19.99 cM or less, they won’t show as shared matches even if they really do match one another.
For Bob, the threshold doesn’t matter. First cousins once removed always share more than 20 cM. In fact, the lowest of 3,700 self-reported values in the Shared cM Project was 102 cM, and the lowest I found in 10,000 simulations using Ped-Sim was 50 cM. (For reasons I won’t go into here, Ped-Sim slightly underestimates shared centimorgans, but it’s close enough for our purposes.)
The picture changes for second cousins once removed and more distant relationships. It’s possible to have a true 2C1R who shares less than 20 cM, and the chances increase for 3C, 3C1R, and so on. So how can we use negative evidence for these more distant connections when we don’t have direct access to the DNA kit in question?
Was Mae William’s Daughter?
Consider William and Nancy. Nancy had a daughter named Mae who was born suspiciously close to the time Nancy married William. Was William Mae’s biological father?
Jacob, the grandson of William and Nancy, has tested, as has Mae’s great granddaughter, Justine. They share 117 cM. Jacob and Justine would be 1C2R if William were Mae’s father and half 1C2R if not. The total of 117 cM favors half 1C2R, but with a WATO score of only 4. This is weak evidence. Can we do better?
William had two other wives, and at least 14 descendants from those other two marriages have tested at AncestryDNA. If William were Mae’s father (that is, if Hypothesis 1 is true), these matches would represent two half 1C2R, two half 2C1R, five half 3C, and five half 3C1R to Justine. None of them are shared matches with her. (I have access to Jacob’s kit but not Justine’s.)
I feel pretty comfortable concluding that William was not, in fact, Mae’s father. But how confident can we be? Technically, it’s possible for each of these relationships to share less than 20 cM, but all 14 of them? What are the odds of that happening?
Dr. Andrew Millard, a professor at Durham University in the UK, kindly performed some simulations to arrive at answer. He came up with probabilities that a match of known relationship would share less than 20 cM.
Now let’s work some math mojo on this problem! If Hypothesis 2 is true and William was not Mae’s father, there’s a 100% chance that Justine will share less than 20 cM with the 14 matches, or a probability PH2 = 1. In fact, she won’t share any DNA with them at all, and AncestryDNA’s threshold doesn’t matter.
If instead Hypothesis 1 is true and Mae was William’s daughter, the probability that Justine would share less than 20 cM with both of the half 1C2Rs is PH1 = 0.029 x 0.029 = 0.00084. We can convert to a WATO score (an odds ratio) by dividing PH2 by PH1: 1/0.00084 = 1,190. That’s a pretty convincing score!
We don’t need to stop there. We can include the other 12 descendants of William in the calculations. (Remember: The probability of several events all occurring is the multiple of each individual probability.) When we do this, we get a combined probability of 3.52 x 10-7 for Hypothesis 1 and an odds ratio of 1/(3.52 x 10-7) = 2,839,994. Wowza!
Even when WATO was inconclusive and we did not have access to Justine’s match list, negative evidence gave us overwhelming support for the conclusion that William was not Mae’s father.
Thus far, I’ve neglected to mention an important statistical consideration: independence. If two events are correlated to one another, you can’t use both in the calculations. In the Vincent example, notice that Barton 2’s child has also tested but is labeled “Ignored Barton”. That’s because how much DNA the child shares with Vincent is dependent on how much Barton 2 shares. The child is not independent.
In the Mae example, there were actually 10 other descendants of William that I excluded from the analysis, because they were closely related to another match. That is, they weren’t independent. I chose to draw the line at matches who were more closely related than first cousins (children, grandchildren, siblings, and niblings were ignored).
In the future, we’ll have tools that can account for non-independence automatically. Until then, be sure to exclude closely related matches from calculations like these.
This blog post is based on a discussion in The DNA Roundtable Facebook group. Many thanks to all who participated, especially Dr. Andrew Millard and Malcolm Peach, who both performed computer modeling to arrive at the probabilities. Special thanks to William Best for reminding me to explain the importance of independence in statistical analysis.
Updates to this Post
15 August 2022 — Added an explanation of independence.