The probability approach to testing hypotheses that I described in the “Science the Heck Out of Your DNA” series relies on the underlying values being accurate. The calculations are based on this graph from AncestryDNA‘s Matching White Paper.
Each curve shows how likely the possible relationships are for a given amount of shared DNA. For example, when two people share 200 cM, the relationship is most likely in Group E or Group F, with much lower chances that it could be in Groups D or G.
While the shapes of the distributions (curves) are probably accurate, the graph may not fully represent the very extremes of each relationship group. That is, there certainly exist “outliers” for each category that are so rare that the probability in the graph appears to be zero when it’s really, say, 0.002.
When you’re eyeballing relationships, your brain makes allowances for possible outliers (probably too many allowances!), but a computerized calculator, like the one described in the “Science the Heck” series, can’t. A value of zero is always zero, and when multiplied by other values still gives zero. That means that in very rare cases, data from an outlier could rule out an hypothesis that is actually true.
To improve the calculator, I am collecting data on proven outliers. These are matches for whom the relationship can be confirmed by other DNA evidence but who share centimorgan amounts outside the known ranges. For example, if two first cousins match one another as expected and also match their children as expected, but their children don’t match one another as 2nd cousins, that is data that could benefit the community.
If you think you have an outlier example, please run the shared DNA amount through the Shared cM Tool with Probabilities. If the known relationship is not listed as an option, please report the outlier using this online form.