NOTE: The original version of this post used an inappropriate simplification of the math. Ann Turner, Louis Kessler, and Andrew Millard were all kind enough to point out my error, which I have incorporated into this revised version. Many thanks to the three of them for helping to make this a much better post!
I’m in a math kind of mood today, so let’s talk about MPEs (mis-attributed parentage events, usually of the father) and probability. First, a quick overview of the two cardinal rules of probability: the AND rule and the OR rule.
Math Is Fun!
The AND rule goes like this: the probability that two independent events will both happen is the probability of the first times the probability of the second. If I flip a coin that has a 50-50 chance of being heads or tails, the probability that I will get heads two times in a row is 0.5 x 0.5 = 0.25. (Go ahead, try it: flip a coin twice in a row, repeat 100 times, and tally up how many times you get heads–heads; it should be pretty darned close to 25%.)
The OR rule is: the probability that either of two independent events will happen is the probability of the first plus the probability of the second. The chance that I will flip either heads or tails when I flip my coin is 0.5 + 0.5 = 1.0. In other words, there’s a 100% chance that I’ll flip one or the other.
I know what you’re thinking: Does this have anything to do with DNA, or does she just like to hear herself type? (You would add those probabilities, by the way.) The answer is: Both! (Multiply ’em.)
MPEs in Your Family Tree
Has a genealogist ever told you that they don’t need to take a DNA test because they’ve got a solid paper trail back to the 1600s. They’re wrong, and here’s why: even in families where there is no reason to suspect misattributed paternity, its rate of occurrence is about 1–2%. (The ISOGG Wiki has a good overview of studies on misattributed paternity rates.) That rate is not universal—it’s affected by parental age, marital status, and socioeconomics, and it’s higher in some countries than in others—and it doesn’t include other forms of misattributed parentage, like an undocumented adoption, a step-parent who is mistakenly assumed to be the biological parent, or grandparents who raise their grandchild as their own.
Consider a “family tree” with just me in it. Without any DNA evidence proving otherwise, there is a 1–2% chance that one of my parents is misidentified. For simplicity, let’s use 2%. That means there’s a 98% chance, or 0.98 probability, that both of my parents are who I think they are.
Now, let’s add my parents to the mix. This looks like an OR situation: the MPE could be with me or my father or my mother. If we add the numbers, we get 2% + 2% + 2% = 6% chance, or 0.06 that one of us has misattributed parentage. Alternately, we could ask “What is the chance that there is no MPE among these three people?” That would be an AND situation, because the answer requires no MPE for me and no MPE for my father and no MPE for my mother. The math is 0.98 x 0.98 x 0.98 = 0.983 = 0.941192, or 94.1% chance.
Wait, what? Those two different approaches give different answers! How can that be? Turns out I made an error using the OR strategy, because the three different events are not exclusive of one another. That is, more than one of us could have misattributed parentage. The AND strategy is the better approach here. (Thank you to Ann Turner, Louis Kessler, and Andrew Millard for pointing out my error.)
There are seven people in the tree once I include my four grandparents, so the overall chance that none of us have an MPEs is 0.987 = 0.868, or 86.8%.
Extending back to my eight great grandparents, the math is 0.9815 = 0.739, or 73.9%.
By the time an average tree includes eight generations, there’s almost no chance that every single parent is correctly identified. So tell me again why you don’t need to check your paper trail with DNA. Just as you can’t prove biological relationships with DNA alone, you can’t do it solely with documents. We need both. (Blaine Bettinger makes this point with an excellent example comparing a niece to the identical twin of her mother in the Facebook group Genetic Genealogy Tips & Techniques. You can read the thread here.)
My Tree and MPEs
I have done autosomal DNA tests on myself and my parents at AncestryDNA (mom) and 23andMe (dad), and they both match me as parent–child. Because they match me across my entire genome and neither of them was an identical twin, I know that they are my parents and that there is a 0% chance that I’m the product of an MPE.
My parents match with the expected amounts of DNA to first cousins and/or first cousins once removed through all four of their grandparents, so I can consider my relationships to my great grandparents, and by extension my grandparents, supported as well. Similarly, DNA matches to 2nd, 3rd, and more distant cousins indicate that all four of my great grandparents were almost certainly who I thought they were.
With autosomal DNA evidence, my family tree going back three generations has a roughly 0% chance of including an MPE. In addition, a yDNA test on my father confirmed that our surname lineage is Larkin.
By virtue of DNA testing, I’ve lowered the chances of an MPE in this part of my tree from 26.1% (100% – 73.9%) to nearly zero. (I use the ≈ symbol in the figure to indicate that it’s “about 0%”, because some alternate scenarios could have led to the DNA matching patterns I see. For example, one of my ancestresses could theoretically have had a child with her husband’s brother, and the DNA might not be able to tell.)
In fact, the great grandfather who is marked with a yellow star in the tree was sort of an MPE. “Sort of” because documentation listed the names of his parents but, prior to having DNA evidence, my family believed that he’d been born in France and came to America as a runaway. Using DNA testing, I was able to show that his parents were locals but not married to one another. That information, in turn, allowed me to extend that branch of my family tree back several more generations. You can read about that search here.
Yes, Even If You Think You Don’t
No amount of paper can prove that the father listed on a record is the biological parent, birth certificates are often amended or outright forged in adoption cases, and one can falsely infer who a child’s parent was from census records when the other parent has remarried. For all of these reasons, DNA is needed to support the conclusions we draw from documentary evidence.
One Last Comment
As a final aside, the 2% MPE rate in this post is just an approximation used to demonstrate the math. It’s not meant to be an accurate representation of the expected MPE rate in my family or any other. Some families will have higher rates, others lower. And the truth is, we don’t know what the true MPE rate is overall. In fact, this is an area where good genealogical skills can contribute to academic research on the subject.
26 thoughts on “MPEs, Probabilities, and Why You Need DNA, Even if You Think You Don’t”
Leah: An excellent and not to my recollection, ever presented before article about NPE likelihood among one’s ancestors.
You just need to adjust your statistics a bit. The calculation is not a OR (adding) as you have it, which you can tell once you get up to 3rd grandparents, that you would calculate the upper bound as 63 x 2% = 126% and that probability is impossible (over 100%).
The calculation is an AND (multiplying) of the inverse event, i.e. of not having an NPE. That would be 98% to 99%. You’d multiply as many as you need to get the probably of no NPEs, and then you’d subtract that from 1 to inverse it again to get the probability of 1 or more NPEs. E.g., the upper bound for 3rd great grandparents would be 1 – (.98 to the power of 63) = 72%
The correct bounds are:
Gen 1: 1 person: 1% – 2%
Gen 2: 3 people: 3% – 6%
Gen 3: 7 people: 7% – 13%
Gen 4: 15 people: 14% – 26%
Gen 5: 31 people: 27% – 47%
Gen 6: 63 people: 47% – 72%
But everything else you say is perfectly valid.
That’s an excellent point. I was a bit sloppy with the math here because I didn’t want to have to explain why the inverse is a better way to calculate it. That was a mistake on my part. You and Ann Turner have both pointed out my error, and you’re both right. I’m going to think some about the best way to explain the “inverse approach” to a general audience and update the post tomorrow. Thanks so much for your feedback!
I was just about to complain about the probability theory here until I read the comments section
… and I tend to use the term NPE (Not the Parent Expected) and you use MPE, but they are the same thing.
But I see the ISOGG uses NPE as Non-Paternity Event, so the statistics there might apply only to paternity events and not maternity. One would think that there would be a different, likely lower probability of a non-maternity event since usually you know who the mother is.
Correct about the NPE stats being only for fathers; those numbers will underestimate the MPE rate. There’s also a lot of variation in NPE/MPE rates among populations. The specific rates, though, are not the key point of this post. The point is that no matter how well documented your tree is, you need to account for a not-inconsequential MPE rate.
Absolutely. We have absolutely NO handle on MPE rates, as there are no good studies. They could be much higher than the 1-2% for Y-DNA, or much lower (although experience suggests that isn’t the case!), and could vary widely from place to place and time to time. And NPE is no longer used in the community because it suggests there is no father. There’s always a “paternal event” even if we don’t know who did it!
Louis has pointed out that you need to consider the inverse of the MPE event. I’ll elaborate a bit. The OR rule of adding probabilities applies only when the events are mutually exclusive, which is true of the coin-tossing example but not of two MPEs in your tree.
Another interesting calculation is the expected number of MPEs. By gen 7 the average person has 1-2% x 127 = 1.27-2.54 MPEs, and it is more likely than not that there is an MPE somewhere in the tree.
Thanks Andrew. I’ll update the post soon. I was hoping to keep the math simple, but that was a mistake on my part.
A fascinating subject – thanks for posting. With any kind of models like this, it’s “better to be roughly right than precisely wrong”
In addition to the point above, I’m afraid there are three other significant inaccuracies in your calculations.
Firstly, the stats you have quoted are about non-paternal events. Mis-attributed maternity is, for obvious reasons, considerably less common than mis-attributed paternity. However, you have applied the same stats to both parents.
Actually, I’m applying the stats to the child, not to either of the parents. That is, I assume that there’s a 2% chance that documents have misidentified one or both of my parents, but don’t consider which parent was misidentified. I was revising the post while you commented, so hopefully that point is clearer.
Secondly, you have mixed up illigitimacy with concealed illigitimacy. Would you expect them to be the same?
I’m not concerned with illegitimacy at all. The question isn’t whether or not the parents were married but whether the parents are who you think they are.
Thirdly, the compounding of odds is only accurate if the events are entirely independent – in statistical terms, they have a correlation of zero. Correlation has a bigger and bigger impact, the more generations back you go.
I suspect, based on nothing more than anecdotal evidence from my own research, that firstly bastardy tends to run in families and secondly that illegitimate children are less likely than average to have children of their own.
For both of these reasons, I suspect the odds you are calculating are over – estimated.
Thanks so much for your feedback! I was revising the post while you commented, so your (correct) concerns about independence of events and variation in MPE rates among families should be addressed now. Please let me know if it’s still unclear. Also, the precise MPE rate is not the point here; it’s that there *is* an MPE rate that we need to think about when we consider how accurate our trees are.
Louis makes a good point about the NPE rates only referring to fathers. I would imagine the false maternity rate must be negligible and that would reduce the probabilities significantly though, as you say, the basic point remains the same that most of us are likely to have false paternities sooner or later in our trees.
NPE is still used in the community as a catch all to describe all the reasons why a surname does not match the expected Y-DNA signature. Misattributed parentage only covers unexpected events. Many NPEs are well documented (eg illegitimate births, name changes). No alternative term has yet caught on though I like Maurice Gleeson’s term “surname switch”.
I think there is a misunderstanding of how I used the 2% rate. I did not assume that there’s a 2% chance that the father has been misidentified (which would be in keeping with the literature) and a 2% chance that the mother has been misidentified (which would be a gross overestimate). Instead, I looked at the child and assumed there was 2% chance that at least one of their parents has been misattributed. As a result, I probably underestimated the rate, because the real rate would be P(MPE) = P(misattributed father) + P(misattributed mother) + P(both misattributed), whereas I simply assumed that P(MPE) = P(misattributed father).
In any case, the numerical value of the rate will vary depending on a lot of factors, and it isn’t the issue. The issue is that there is almost certainty that we have MPEs in our trees, and many of them are likely to be in the range that can be targeted by autosomal DNA. The 50-50 mark is in the 3rd cousin range, and by the time you’re looking at the ancestors you share with 4th cousins, there’s more than a 70% chance that you’re dealing with at least one misattributed parentage.
I often think of this, and wonder how often it occurs, as it is somewhat hard to prove via paper trail, and still challenging to trace via dna.
Excellent and timely article.
How often it occurs is something we can, hopefully, address with enough careful paper+DNA studies.
And thank you!
I hate to sound pedantic here, but I’m wondering if there is a slight problem in the way your data has been presented. Your example starts with a family tree of only you. By definition you do not know who you parents are in THIS family tree, so you can’t say that there are any errors in the tree, so it must be 100% correct (even though, as you state, you are only 98% likely to be the child of the people who think they are your parents)
It is only when your family tree has two generations in it does the first possibility of an NPE error in the family tree come in. A that point the likelihood of it being correct is 98% since there is only one situation (you) when there is the possibility of an NPE. Obviously both your parents could be NPEs, but their parents are not being shown in the family tree.
As a result I think you need to shift your probability calculation back one generation in the commentary you have underneath the chart marked “Probability of an NPE”, so that a family tree with 9 generations in it is unlikely (0.6%) to have “every single parent is correctly identified”.
It’s just a visual. I only put one person in that “tree” (and put “tree” in quotes to emphasize what I was doing) to avoid some people interpreting the example as being about one person and others interpreting the same example as being about three people. As I defined them, the probabilities are the same either way.
except the effect isn’t cumulative? It’s statistically around 0.8-1.3% per generation. I have DNA matches to people who descend from different brothers of my ten-generations-back colonial immigrant ancestor in the direct male line. I have DNA matches to people who descend from other immigrant ancestors of more recent generations along other lines. I have DNA matches with descendants of all 32 of my 3rd great-grandparents. My experience with DNA testing has been that my paper genealogy has been pretty much verified, for at least as far back as autosomal testing is useful; either my experience is an outlier, or the NPE rate isn’t actually as high as your model estimates.
As mentioned in the post, the MPE rate varies by age, marital status, socioeconomic level, country, etc. Overall, the average is about 2% per generation. You may come from a low-MPE group, or it’s possible that there are MPEs in your tree that you’re missing.
In my research I come across children attributed to the wrong mother all the time! If I can find a marriage record, it’s usually clear and easily provable by census records (since 1850 in the US) who the previous wives were.
I assume those who simply assigned all the children called “dau” or “son” in later census records were also children of the wife listed, didn’t have access to earlier census records, or didn’t care enough to dig further. atDNA clearly shows different matches though.
Sometimes adopted children were actually natural children of the husband, whether or not it was acknowledged.
DNA doesn’t lie, and MMEs are a thing too.
What do you suggest if you have a set of 5th great grandparents that you have no matches back to (neither does your sibling)? Assuming we just didn’t inherit DNA from those ancestors, what would be the next step in “proving” the connection?
There’s a small chance that you wouldn’t inherit DNA from a 5GGP (7 generations in the chart linked below), and an even smaller chance that neither you nor your sibling inherited from them. So there’s hope! Whether you will find matches through that couple is a different question and will depend on factors such as (1) where they lived (US is better because more Americans have tested), (2) did they have a lot of children who survived to adulthood, and (3) have their descendants tested.
To “prove” a relationship that far back you’ll need a multi-pronged approach. Can you apply either yDNA or mtDNA to the problem? Can you test other cousins through that couple with atDNA, preferable some that are a generation older than you? Good luck!
Thank you! This is very helpful! The family is from Connecticut and were loyalists that moved to Quebec. We have no DNA matches going back to the parents (my 5th great grandparents) of the Loyalist (who is my 4th great grandfather). I am descended from one of his daughters. Descendants of his sons have all tested and their Y chromosomal DNA have confirmed relationships. We just don’t have any atDNA matches and have no cousins or older generations to test. Thank you again!
Comments are closed.