Do You Have a DNA Outlier?

March 22, 2018 thednageek 12d Comments

The probability approach to testing hypotheses that I described in the “Science the Heck Out of Your DNA” series relies on the underlying values being accurate. The calculations are based on this graph from AncestryDNA‘s Matching White Paper.

Figure 5.2 from the AncestryDNA Matching White Paper edited to use the groups defined by the DNA Detectives chart.

Each curve shows how likely the possible relationships are for a given amount of shared DNA. For example, when two people share 200 cM, the relationship is most likely in Group E or Group F, with much lower chances that it could be in Groups D or G.

While the shapes of the distributions (curves) are probably accurate, the graph may not fully represent the very extremes of each relationship group. That is, there certainly exist “outliers” for each category that are so rare that the probability in the graph appears to be zero when it’s really, say, 0.002.

When you’re eyeballing relationships, your brain makes allowances for possible outliers (probably too many allowances!), but a computerized calculator, like the one described in the “Science the Heck” series, can’t. A value of zero is always zero, and when multiplied by other values still gives zero. That means that in very rare cases, data from an outlier could rule out an hypothesis that is actually true.

To improve the calculator, I am collecting data on proven outliers. These are matches for whom the relationship can be confirmed by other DNA evidence but who share centimorgan amounts outside the known ranges. For example, if two first cousins match one another as expected and also match their children as expected, but their children don’t match one another as 2nd cousins, that is data that could benefit the community.

If you think you have an outlier example, please run the shared DNA amount through the Shared cM Tool with Probabilities. If the known relationship is not listed as an option, please report the outlier using this online form.

Thank you!

Share on Facebook

12 thoughts on “Do You Have a DNA Outlier?”

Rebecca says:

March 26, 2018 at 6:18 pm

I have a near-outlier, but because the relationship is doubled, it’s hard to know what the actual expected DNA match would be. The Shared cM Tool doesn’t deal in double relationships.

My great-great-grandparents (Crumpton and Emerson) each had a sibling, and those siblings married each other. My great-grandfather Crumpton would have been a double first cousin to the other couple’s children. I have a match with that other couple’s third-great-grandson (my double fourth cousin once removed) at 99 cM, and my mother (his double third cousin twice removed) matches him at 174 cM.

Reply
1. thednageek says:
  
  March 27, 2018 at 1:09 pm
  
  You’re absolutely correct that a double-cousin relationship can bump someone into the “outlier” category. Eventually, I hope the community will have stats on double relationships similar to the ones we have for trees without pedigree collapse.
  
  Reply
  1. Luisa A Shupe says:
    
    May 17, 2019 at 11:48 am
    
    I have a string of multiple double cousin occurrences in my tree if you would like to study it for statistical data for your research. To be specific, my GGGrandmother was a Siebenthaler, and one of 14 children. They lived on the adjacent farm to the Hubers, who had 11 children. Five Siebenthaler girls married Huber boys. Several Huber girls married Siebenthaler boys. Even better, a set of Siebenthaler fraternal twins (girl/boy) each respectively married a Huber boy/girl. As a result I have copious Siebenthaler/Huber matches that I believe normally wouldn’t have registered as closely to me but because of a increased pool of DNA, they do, thus many “outliers” I have tested with Ancestry. My three daughters have as well. To further add dizziness to this family line, the father of the 14 Siebenthaler children had a brother. His children also married Hubers. In essence, a siblings child married another sibling’s child’s wife/husband’s sibling. And I will most likely find more double cousins as I proceed.
    
    Reply
    1. thednageek says:
      
      May 18, 2019 at 9:37 pm
      
      That many double cousins will definitely post a challenge when working through your DNA matches. Good luck!
Douglas W Fisher says:

March 31, 2018 at 1:37 pm

First off, thanks for the helpful answers in the past. My daughter Sarah just got her ancestry results in. I compared her to my 1st cousin 1x removed, who matches me 997cMs on 38 segments, and Sarah matches H.W. 631cMs on 28 segments.

H.W.’s maternal great-grandmother was my maternal grandmother, hence 1st cousin 1x removed. Sarah’s dna match to H.W. looks more like a 1st cousin 1x removed, but the genealogy shows 2nd cousin.

My birth parents were 2nd cousins, shared the same paternal great-grandfather, which is probably why I match 1st cousin dna with H.W. instead of 1st cousin 1x removed, which we are.

I believe the dna match between Sarah & H.W. confirms their common ancestor was their great-grandmother who was my maternal grandmother.

Reply
Mac says:

May 4, 2018 at 11:14 pm

So helpful. My sister and I (also female) may be an outlier situation. Your thoughts would be greatly appreciated. My sister and I have 1307 cms across 50 segs per Ancestry test. We thought that we had the same mother but now are not so sure. When we compare our shared matches we show a wide variance in our degree of match with known relatives of our mother. Our two top matches are sisters and their mother is the half sister our mother. My sister has 513 cm with 21 seg to each of them and I have 373 cm with 19 segs. Other examples are: 67 vs. 182, 168 v 99, 205 v 239, 47 v 92, 67 v 182, 261 v 173, 125 v 159, 212 v 156 (my sister is always the 1st number). I do know that some variable in numbers would be expected but this seems to be more than expected to me. Could one of us be the daughter of a sibling to our mother? TIA for your thoughts.

Reply
1. thednageek says:
  
  May 5, 2018 at 5:41 pm
  
  Those variations are normal, but 1307 cM is outside the known range for full siblings. Is it possible that you have different fathers?
  
  Reply
  1. Mac says:
    
    May 5, 2018 at 8:45 pm
    
    My apologies, I didn’t explain the situation very well, I am new to this. To clarify, Yes, we do have different fathers, which we have only come know about recently. Because we overlap on your chart (groups b & c) and have some bizarre family history of sisters adopting other sisters children we can not be sure that we are not in fact 1st cousins vs. half siblings. When looking at other family members matches, I expected to have closer cms numbers if we were sisters vs. cousins. My sister having 513 and me having 373 cms to the same other family member seemed like a wide range for sisters. So, if I understand correctly, those number variations would be normal for half sisters and cousins when compared to another/same family member and should not be considered an indicator of either a sister or cousin relationship? Is there a way to determine this with no other dna available? Thank you so much.
    
    Reply
    1. Mac says:
      
      May 5, 2018 at 9:09 pm
      
      oops, there’s more. To further clarify myself. In some cases our numbers put us in two different groups for the same person:
      67 = G vs. 182 = F
      261 = E vs. 173 = F
      212 = E vs. 156 = F
      47 = G vs. 92 = F
      Because we are in two different groups for the same person makes me wonder if this points to a cousin relationship vs half sib between the two of us. I would think we would have the same relationship/in the same group to other people within the same family if we were half sibs. Thanks again.
    2. thednageek says:
      
      May 6, 2018 at 10:24 am
      
      Yes, those ranges are normal for siblings. One way to determine whether you are low-sharing half sibs versus high-sharing 1st cousins would be to test other known children of your mother’s siblings to see how much you and your sister share with them. Barring that, the only other way I know would be to work with simulated datasets, as in this post: https://thednageek.com/julies-story/. You can’t use the simulated data that Julie used because it’s specific for a paternal half sibling. We’d have to simulate a new dataset for maternal half siblings.
Nancy Custer says:

May 31, 2018 at 2:32 pm

Leah,

I send you a PM on FB about an outlier I just uploaded. You can email me if you don’t find it.

I enjoyed the “Science the Heck out of your DNA” series!

Reply
1. thednageek says:
  
  June 1, 2018 at 6:49 am
  
  Thanks!
  
  Reply