Negative evidence is the absence of something you would expect to see under the circumstances. We use negative evidence daily in life, without even being aware of it. For example, if your neighbor always parks in her driveway and her car isn’t there, you would naturally assume she isn’t home.
We can also use negative evidence consciously and deliberately to draw conclusions about genealogy. An earlier post described Vincent, who was expected to share DNA with a group of five people. When he didn’t, we concluded that he was not related to them the way we assumed.
We can make inferences about our DNA matches even if we don’t have direct access to their DNA kits. Consider Cousin Bob, who shares 650 cM. This DNA amount is well in range for a first cousin, but it’s just as likely (in fact, slightly more likely) to be a half first cousin. Given the right DNA matches, you can use negative evidence to figure out which is true.
How? If a senior 1C1R (first cousin once removed in the older generation) has tested on each side, you can used shared matches to see whether Bob matches both of them.
If he’s your full first cousin, he should. But, if Bob matches the 1C1R on your grandmother’s side and not the one on your grandfather’s side, then he’s a half cousin rather than full. That’s because 1C1Rs always share DNA.
Shared matches allow us to draw conclusions about our matches even if we can’t see their match lists ourselves. All of the main DNA databases have some version of this. It’s called “Relatives in Common” at 23andMe and “ICW” (in common with) at FamilyTreeDNA, but they do basically the same thing: for any given DNA match (like Bob), the feature shows you which of your other matches also share DNA with that person.
There’s one exception, though. AncestryDNA’s version of this feature has a threshold. It only shows shared matches who share at least 20 cM with both you and the relative of interest. If two matches share 19.99 cM or less, they won’t show as shared matches even if they really do match one another.
For Bob, the threshold doesn’t matter. First cousins once removed always share more than 20 cM. In fact, the lowest of 3,700 self-reported values in the Shared cM Project was 102 cM, and the lowest I found in 10,000 simulations using Ped-Sim was 50 cM. (For reasons I won’t go into here, Ped-Sim slightly underestimates shared centimorgans, but it’s close enough for our purposes.)
The picture changes for second cousins once removed and more distant relationships. It’s possible to have a true 2C1R who shares less than 20 cM, and the chances increase for 3C, 3C1R, and so on. So how can we use negative evidence for these more distant connections when we don’t have direct access to the DNA kit in question?
Was Mae William’s Daughter?
Consider William and Nancy. Nancy had a daughter named Mae who was born suspiciously close to the time Nancy married William. Was William Mae’s biological father?
Jacob, the grandson of William and Nancy, has tested, as has Mae’s great granddaughter, Justine. They share 117 cM. Jacob and Justine would be 1C2R if William were Mae’s father and half 1C2R if not. The total of 117 cM favors half 1C2R, but with a WATO score of only 4. This is weak evidence. Can we do better?
William had two other wives, and at least 14 descendants from those other two marriages have tested at AncestryDNA. If William were Mae’s father (that is, if Hypothesis 1 is true), these matches would represent two half 1C2R, two half 2C1R, five half 3C, and five half 3C1R to Justine. None of them are shared matches with her. (I have access to Jacob’s kit but not Justine’s.)
I feel pretty comfortable concluding that William was not, in fact, Mae’s father. But how confident can we be? Technically, it’s possible for each of these relationships to share less than 20 cM, but all 14 of them? What are the odds of that happening?
Dr. Andrew Millard, a professor at Durham University in the UK, kindly performed some simulations to arrive at answer. He came up with probabilities that a match of known relationship would share less than 20 cM.
Now let’s work some math mojo on this problem! If Hypothesis 2 is true and William was not Mae’s father, there’s a 100% chance that Justine will share less than 20 cM with the 14 matches, or a probability PH2 = 1. In fact, she won’t share any DNA with them at all, and AncestryDNA’s threshold doesn’t matter.
If instead Hypothesis 1 is true and Mae was William’s daughter, the probability that Justine would share less than 20 cM with both of the half 1C2Rs is PH1 = 0.029 x 0.029 = 0.00084. We can convert to a WATO score (an odds ratio) by dividing PH2 by PH1: 1/0.00084 = 1,190. That’s a pretty convincing score!
We don’t need to stop there. We can include the other 12 descendants of William in the calculations. (Remember: The probability of several events all occurring is the multiple of each individual probability.) When we do this, we get a combined probability of 3.52 x 10-7 for Hypothesis 1 and an odds ratio of 1/(3.52 x 10-7) = 2,839,994. Wowza!
Even when WATO was inconclusive and we did not have access to Justine’s match list, negative evidence gave us overwhelming support for the conclusion that William was not Mae’s father.
Thus far, I’ve neglected to mention an important statistical consideration: independence. If two events are correlated to one another, you can’t use both in the calculations. In the Vincent example, notice that Barton 2’s child has also tested but is labeled “Ignored Barton”. That’s because how much DNA the child shares with Vincent is dependent on how much Barton 2 shares. The child is not independent.
In the Mae example, there were actually 10 other descendants of William that I excluded from the analysis, because they were closely related to another match. That is, they weren’t independent. I chose to draw the line at matches who were more closely related than first cousins (children, grandchildren, siblings, and niblings were ignored).
In the future, we’ll have tools that can account for non-independence automatically. Until then, be sure to exclude closely related matches from calculations like these.
This blog post is based on a discussion in The DNA Roundtable Facebook group. Many thanks to all who participated, especially Dr. Andrew Millard and Malcolm Peach, who both performed computer modeling to arrive at the probabilities. Special thanks to William Best for reminding me to explain the importance of independence in statistical analysis.
Updates to this Post
15 August 2022 — Added an explanation of independence.
11 thoughts on “DNA as Negative Evidence, Revisited”
You prompt me to ask this question. What do you make of these mystery matches (Ancestry) to M:
JB: 176 cM / 8 seg / 69L
AH: 55 cM / 2 seg / 57L
K12: 20 cM / 2 seg / 21L
AH and K12 are 1C
JB is 2C to AH and K12
These 3 are are unlikely to be related to M in the last 3 gen, based on geography. They are Aussie immigrants. M has no connection to Australia.
How should I view this family connection to M?
They could be 3rd cousins. There are other possibilities as well. Have you tried the WATO tool?
Thanks. Yes, I did do WATO from M’s perspective. I have not done it from JB’s perspective, although I now have the data (same three points) to do so.
If I had just JB’s data, I would have expected 1/2-2C. But, I am clear from JB’s wife that the real relationship with AH and K12 is as full 2C (shared DNA lines up). The conclusion seems to be an NPE on either of the g-grandparents of JB, but I wanted to check in with you, given the very different amts of shared DNA with M. Is the amt of shared DNA amongst the poss 3Cs extreme? Seems so to me.
If not, illustrative.
Third cousins have a wide range of possible share amounts, from over 200 cM down to zero. The values you’re seeing aren’t extreme.
Great example. Many people come to family history for the stories – as I did. But few are comfortable with stats, especially when a degree of emotion is involved. This blog is a very readable guide.
When doing WATO, is it correct to remove hypotheses that do not SEEM likely? Refer to earlier comment about Aussie connection vs M not having any connection to Australian?
If you are certain the hypothesis cannot be true, for example if you’re looking for an unknown father and WATO places your hypothesis as the child of a woman, it’s fine to remove the hypothesis.
You are Brilliant! Thanks for writing about this topic-it is very helpful
One caution is that if you’re using Ancestry’s numbers, the spread between Timber-adjusted and unweighted sharing can be quite significant. I have matches for whom the difference is greater than 50 cM.
Here’s a *slightly* less extreme example. Ancestry reports sharing with one of my matches as 42 cM across five segments. Based on this amount, the company predicted us to be 4th cousins. However, our respective threes show that we are in fact 3rd cousins, and our unweighted sharing is 89 cM — meaning that Timber’s “adjustment” represented a reduction of 47 cM, to less than half the original calculated amount. The longest segment alone is reported as 49 cM, and it’s just one of five shared segments.
In addition, this information is based on the v2 chip. On the v1 chip — which regrettably I deleted — we showed sharing of 92 cM, which would have been safe from Timber.
Amusingly — though it isn’t *really* funny — based on the same v2 chip Ancestry reports my daughter’s sharing with this same match as 65 cM across four segments, with a longest segment of 48 cM. This is still a downward adjustment, since her unweighted sharing is shown as 85 cM.
My point is, pay close attention to unweighted sharing versus Timber’s reported sharing.