The Limits of Predicting Relationships Using DNA  

The one thing we genealogists probably want most from our autosomal DNA matches is something they can’t give us: an exact relationship prediction based on shared DNA alone. Unfortunately, with the exceptions of identical-twin, parent–child and full-sibling matches, that’s simply not possible.

Why not? One reason is that multiple different relationships can give the same patterns of shared DNA. For example, a woman who shares 1750 cM with you could be your grandmother, granddaughter, aunt, or half sister. Those relationships are indistinguishable based solely on the amount of shared DNA. (In this case, you can narrow the possibilities using age.) Someone sharing 950 cM with you could be a great-grandparent/grandchild, first cousin, great-uncle/aunt/nephew/niece, or half-uncle/aunt/nephew/niece.

The DNA Detectives Facebook team has designed a nifty chart that categorizes relationships into groups based on the expected amounts of shared DNA. In the two examples above, grandparent/child, aunt/uncle, and half sibling would be Group B, and great-grandparent/grandchild, first cousin, great-uncle/aunt/nephew/niece, or half-uncle/aunt/nephew/niece would be Group C. I will use the DNA Detectives group names in the rest of this post for ease of reference.

Shared centimorgan ranges for different relationship groups. The original chart is available in the files of the DNA Detectives Facebook group.

 

To complicate matters, each group is defined not so much by an average or “expected” amount of shared DNA but by a range. That is, someone in Group B might share 1750 cM with you, but they could also share as little as 1300 cM or as much as 2300 cM, according to the DNA Detectives chart. Group C can range from 575 cM to 1330 cM.

Notice another problem? The low end of the Group B range overlaps the high end of the Group C range. Put another way, someone who shares 1315 cM with you could be in either group (and remember that each group includes multiple possible relationships). Worse, the more distantly related the group, the broader the range of shared centimorgans relative to the average and the more overlap there is with other groups. Someone who shares 1315 cM with you can only fall into Groups B or C, but someone who shares 100 cM could belong to Group E, F, or G, according to the DNA Detectives chart.

When you have a match in an overlap zone, the best approach is to consider the most likely group first. AncestryDNA’s Matching White Paper (31 March 2016) presents an informative graph (their Figure 5.2) that shows the likelihood of each group (the x axis) given the amount of shared DNA (the y axis). Their graph is based on simulated data, rather than empirical (real) data, but as long as the model they used to do the simulations is reasonable, the data should be reliable.

Distributions of shared centimorgans for different relationship categories based on simulated data. This graph was taken from the AncestryDNA Matching White Paper published 31 March 2016 (their Figure 5.2).

 

Unfortunately, they used a logarithmic scale, which is a great space saver but is intuitive to precisely no one. They also misuse the word “meioses”, confusing people who aren’t familiar with the term as well as those who are. To make the information easier to understand, I edited the image labels to use the groups from the DNA Detectives chart. Here’s what the modified figure looks like.

Figure 5.2 from the AncestryDNA Matching White Paper edited to use the groups defined by the DNA Detectives chart. Note that the numbered ranges to the right of the graph mark regions where that group is the most likely one, not the full range for that group.  For example, between 200 cM and 340 cM, the most probable relationship is Group E, but the full range for that group is 65–600 cM (see below).

 

The figure gives you a visual sense of how broad the ranges are for each relationship group and how much overlap there is. It also shows us which centimorgan values represent only one possible group; those are the zones along the vertical y axis that only have one colored line crossing them. Between about 2400 cM and 3200 cM, the only line is the medium blue one for Group A, and between about 1550 cM and 2000 cM, the only line is the forest green one for Group B. There’s a short interval around 1000 cM that can only be Group C, but for all other centimorgan values, more than one group of relationships could apply.

Because of the log scale, the graph is hard to interpret if you’re interested in a specific centimorgan amount. To get around the problem, I approximated x and y values for each curve using an online plot digitizer. Geek power!

 

What does this tell us? It gives us an indication of which group of relationships is most likely to apply to a match who shares a specific amount of DNA. For example, a match sharing 750 cM with you is in an overlap zone, but they are far more likely to be in Group C (probability p = 0.85, or 85% chance) than in the overlapping Group D (p = 0.15, or 15% chance). Of course, the numbers don’t guarantee that the match is in Group C, but that’s where I’d start looking for the connection.

The probabilities can be more complicated. Consider a match who shares 110 cM with you. That person could belong to Group E (p = 0.08, 8% chance), Group F (p = 0.39, 39% chance), Group G (p = 0.30, 30% chance), Group H (p = 0.20, 20% chance), or Group I (p = 0.06, 6% chance). Again, the best approach would be to look for a shared ancestor in the most likely relationship range first, so Group F > Group G > Group H > Group E > Group I.

You may also be familiar with the Shared cM Project by Blaine Bettinger. This project compiles self-reported data from the genetic genealogy community for different relationships. Thus, it gives us both the extremes (maximum and minimum values) as well as histograms (bar graphs showing how common given centimorgan values are for each relationship). The histograms are comparable to the colored lines on the AncestryDNA graph.

For comparison, I’ve aligned the ranges from the three datasets below. For the Shared cM Project, I’ve combined data for relationships that belong to the same group (e.g., first cousins once removed and second cousins both belong to Group E, so they were treated together).

 

The ranges given by the DNA Detectives are consistently narrower than those from the other two sources. That is mainly due to the fact that the DNA Detectives chart intentionally omits extreme outliers, which are especially challenging to deal with in the unknown parentage searches for which the chart was created. Their dataset is also the smallest, although it has the advantage that each datapoint has been carefully vetted by an expert. The Shared cM Project ranges are similar to those of AncestryDNA, but not exactly the same. Differences between the two could result from errors in the self-reported data of the former, the relative sizes of the datasets (the simulated dataset is almost certainly much larger than the empirical data), or assumptions made by AncestryDNA’s scientists in designing the simulations. Regardless of which source of information you prefer to use in your own genealogical work, keeping in mind the strengths and weaknesses of each dataset is wise.

For more on how AncestryDNA’s figure correlates to the Shared cM Project, read “You Can’t Get There from Here.”

Note:  The probabilities and cM ranges discussed in this post assume little or no endogamy.  Endogamy is the practice of members of a population marrying within the same group over multiple generations.  If practiced for enough time, the present-day members of the population will all be related to one another multiple different ways.

Acknowledgements: Thanks to Dr. Tracy Vogler for alerting me to the online plot digitizer. CeCe Moore and Christa Stalcup kindly agreed to let me reproduce the DNA Detectives chart here.

 

Updates to This Post:

14 Oct 2022 — Added link to “You Can’t Get There from Here.”

 

181 thoughts on “The Limits of Predicting Relationships Using DNA  ”

  1. Dear DNAgeek,

    Nice story! Perhaps it’s an idea to convert this information to an online tool or even a phone app? Let users fill in the largest cM, total cM etc etc and the tool gives a nice visual explanation what would be the most probable connection.

    Best

    EJ

      1. Pretty wrong results. Entered my info and results showed Parent/ Child result when the actual results should have been Half-brother

        1. Are you sure you entered the correct information? There is no overlap between the amount of DNA a parent–child share and the amount half siblings share.

      2. No Results Found
        The page you requested could not be found. Try refining your search, or use the navigation above to locate the post.

        I clicked on the link you posted, and that’s the message I got. ☹

        1. This article is nearly 5 years old. Some of the links may no longer work. Which one is giving you trouble?

      1. You can use AddThis.com. It’s a free service that will add sharing buttons to your site.

        1. if you are using wordpress you can sharing buttons via a plugin…go to plugins in your menu ..find the one you want…and apply it to your page. hope that helps…if ya need more help just contact me and I’ll do a walk thru with you. Great info by the way…this dna stuff is a bit hard for me to understand.

  2. Excellent article. Is the table you constructed with the online digitizer available in a spreadsheet? I think I could use it to assist in an effort to help a distant DNA match to identify her birth parents. It would save me from having to key in your data from the graphic.

    1. Thank you! I emailed you an Excel version of the table, and I’ll see about uploading it here to the blog.

      1. Hi, did you ever post the excel sheet referenced above? If not I would love to get a copy by email.

  3. In this area we should also be including the number of matching segments as a further determinate. For example, grandmother-matches and niece-matches should have the same expected percentage of DNA but the niece match is expected to have the matching DNA broken up into more segments.

    1. That’s an excellent point. The number (and size) of matching segments can help distinguish between grandparent and avuncular relationships, but not other relationships. Scientists from 23andMe published a paper in 2012 that includes simulation data showing the distinction. I’ve digitized that data as well, but it was too much to tackle in this blog post.

      It’s Figure 3A in this open-access paper: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0034267

  4. Thank you for the table of probabilities. I am currently working on my own DNA search for a biologic parent and this will help guide me a bit more…This is actually one of the most understandable charts for the lay person who understands some basic stats that I have seen.

    1. I hope it helps. If you haven’t already, join the “DNA Detectives” group on Facebook for free advice and moral support in your search. Good luck!

  5. Nicely done. I do wonder why you feel that the AncestryDNA white paper authors “also misuse the word “meioses”, confusing people who aren’t familiar with the term as well as those who are.”

    While it is a bit startling to see the term “meiosis” or its plural “meioses” without any introduction or explanation, I do not see what constitutes misuse, or why someone who understands the process would find the use of the word confusing. Isn’t the number of meiotic divisions at the base of all theoretical tables or formulae showing expected shared DNA? With every transfer of DNA from parent to child, the child receives half of the parent’s DNA, divided through the process known as meiosis. Because of crossing over during that meiotic division (as well as upstream meiotic divisions), however, the child does not receive equal amounts of DNA of each line above the parents, and the accumulated “error” with each meiosis explains the increased range in expected or actual shared DNA.

    1. These are good questions. I thought about addressing them in the post, but the explanation would have distracted from the main points I wanted to make here. I used the word meiosis/meioses, because AncestryDNA’s Figure 5.2 uses it. I then switched to the DNA Detectives’ term “group”, because it is both more accurate and less intimidating to the non-biologist.

      As you know, meiosis is the process of forming the egg or sperm in the parent’s body. It results in the egg/sperm getting half of the parent’s DNA. When the mother’s egg fuses with the father’s sperm, the offspring is restored to a full complement of DNA (half from each parent).

      The relationship between a parent and a child involves a single meiosis. That between a grandparent and grandchild involves two meiosis (one in the grandparent, one in the parent). Similarly, half siblings are separated by two meioses, one in the shared parent to produce the first child, and a second in that same parent to produce the second child. This is where AncestryDNA misuses the term. In their figure, they label the group that includes half siblings and grandparents/grandchildren (forest green in the figure, Group B per the DNA Detectives) as three meioses, not two.

      AncestryDNA labels full siblings as being separated by two meiosis, but that’s not the right way to look at it. They *are* separated by two meioses, but they’re also related twice over: once through their mother and once through their father. Essentially, they are double half-sibs, which isn’t quite the same as two meioses. (I’m sure AncestryDNA made this decision to try to make the concept easier for the novice to understand rather than out of ignorance. Unfortunately, in doing so, they’ve used the term incorrectly.)

      Interestingly, although a full aunt/uncle is expected to share the same amount of DNA as a grandparent or half sibling, the aunt/uncle is separated by three meioses, not two. The reason they share in that closer range is because they’re a double relative (i.e., double half aunt/uncle). This fact is potentially useful for relationship predictions; although a full aunt/uncle is in the same group as a half sib or grandparent, the extra round of meiosis means that the shared segments will be smaller, on average, so in most cases, we should be able to distinguish an aunt/uncle from those other two possibilities. (See Figure 3A in this paper: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0034267)

      This is obviously a topic worthy of its own post!

      1. I’m sorry that I not reply for almost a year! I came back to this blog post today to see the stats underlying an online calculator and read through the comments until I found your response. Your response to the comment about number of segments brought my response into clearer focus.

        I think Ancestry did use the term meiosis correctly if one is counting back to a shared ancestral couple as the MRCA. This is parallel to the use of “gen” by GEDmatch. Thinking about it this way also helps explain why grandparent/grandchild and avuncular relationships can be differentiated by number of segments, I suspect, if you count number of meiosis back to a shared couple.

        Counting meioses back to the common ancestral couple, in my opinion, makes it clear that we are looking at matching segments on shared lines, not on total DNA. The furthest back we can go between two matches is the ancestral couple they share. For half-siblings sharing a father, for instance, that would be the paternal grandparents, not the father. Recognizing this also should help in trying to use more distant relationships in searching for an unknown biological line.

        Thinking about it this way, and recognizing that segments with full matches are not double-counted (except in 23andMe, apparently), clarifies the confusion people have with “gen” with close relationships. I still have no kits on 23andMe so am not sure whether one can adjust for the double-counting in close relationships such as full sibs in tables and calculators.

        I think the term “meiosis,” including as used in the context of a shared ancestral couple, clarifies so much that it would be worth neophytes learning one more term. And it might jog memories of learning about the two major forms of recombination that occur during meiosis (crossing over and independent assortment) and therefore set the stage for a more advanced discussion of other aspects of inheritance of matching segments that puzzle people, for neophytes who want to go further.

        Thanks for the extended discussion; it has clarified a lot for me.

      2. The meiosis topic is one that I’ve only recently been giving thought to at this now my 2 year “DNA Deep Dive” into finding my birth parents. And, yes it would be an excellent subject for a post of it’s own.

        While pondering how to sort my matches into family lines it simply jumped out at me that there should be a way to discriminate between removed relationships of the same degree (or meioses) because of exactly what you mention about aunt/uncle relationships. For example a 1C1R+ (forward one generation from self) and a 1C1R- (back 1 gen from self) have two different lineages and DNA inheritance patterns from each other as well as from yourself, even though the number of meioses (and some of the people) are the same. Seems like we need a better & more discriminatory term for the individual DNA passed by each parent during meiosis. Also as is unfortunately the case of removed cousins, knowing if there actually IS an easily detected difference (segment totals or segment sizes?) is impossible on a large scale because I have yet to see a study or survey that uses terminology that would discriminate between the two, and therefor hasn’t been studied. (I’m hoping you know of one and can refer me to it if it exists 🙂 ).

        1. A 1C1R will have the same inheritance pattern whether you’re the “up” generation or the “down” one. There’s a chance that a half 1C could be different, but I think there’s too much overlap between total cM shared and segment size for any clear differentiation.

          We can sometimes differentiate grandparent from aunt/uncle from half sibling, but the latter two are only distinguishable from one another when they’re on your father’s side.
          https://thednageek.com/escape-from-the-overlap-zone/

        2. Thank you for your reply. I read the link and I understand the differences between the 3 relatives mentioned. Can’t quite get my head around why paternal is different, but I believe the results and will ponder it, lol.

          Back to the removed relationships, it would be interesting if Dr. Millard has done (or would do) simulations on say, 1C1R relationships. All of the surveys/studies I’ve run across have not provided a means for respondents to differentiate between the two, and I understand why.

          But, here is my logic. I’m sure I may be overlooking something, but here it is anyway.

          The MRCA between these two and I would be Gr Grandparents in the case of 1C1R-, passed via Gr A/U and both Gr Grandparents as well as Grandparents in the case of 1C1R+, passed via A/U. In addition to the “double dose” of DNA from the avuncular connection, I understand that my 1C1R- would inherit ~25% from Gr Grandparent only, while my 1C1R+ inherits ~3.125% from Gr Grandparent as well as ~12.5% from Grandparents.

          I understand that I would share 6.25% with either 1C1R, but it would be differing portions of these MRCA, thus it just seems to me that it might manifest itself somehow other than through 1 to 1 chromosome comparison.

          Perhaps you can elaborate in a future blog post?

        3. The paternal side is different because women have a higher crossover rate than men. Fewer crossovers means fewer (but larger on average) segments passed down.

      3. If the shared segments are probably smaller, you can probably expect some to be below the standard 7 cM cutoff, meaning they’re more likely to be on the lower end of the probability curve.

  6. I need to to thank you for this good read!!

    I definitely enjoyed every little bit of it.

    I’ve got you book-marked to check out new stuff you

  7. Another thing to take into account is that in earlier times it was common for siblings from one family to marry siblings of another family, all descendants now likely sharing more DNA than typically expected, and impacting accuracy of estimations, depending on how far back this happened.

  8. There is a typo in the asterisk note in your final table comparing the three sources of data. I don’t know if this correctable here. “* The DNA Detectives… and is not comparable to the other to sources.” In the final “to”, the “w” went missing. I realize this is not particularly pertinent to the Blog, but it bugs me.

  9. I really enjoyed your blog. My mother and I recently tested on 23andMe and we were lucky enough two find 2 really close matches. My mother matched 15.1%, 1124cm to a male, 32 segments, and no match on the X. She also matched to a female (the sister to the male above) at 15.9% 1181cm , 33 segments and 3 of those segments are on the X. 23andMe lists both of them to my mother as 1st cousins. What gets me, is that she is in the over lap of all of the charts that I looked at or borderline from one to the other. I know you said when there is an overlap, that one group has a higher probability than the other but it is still hard to not feel 100% sure and leaving doubt. Would you happen to have any suggestions as to what we should focus on? Where we should look? Please forgive any typos as it is hard to focus when your daughter is climbing on you while typing. 😉

    On a positive note, My mom and I have reached out to them, shared photos, and received a family tree of their known relatives. The comparisons between my mom and their family are scary due to how much they resemble each other. We just don’t know who in the family to focus on at this point.

    What do you think?

    1. For both of them, the most likely relationship is one in Group C, with a (much) lower chance of being in Group B. Group C includes first cousins, great grandparent/child, half aunt/uncle/niece/nephew, and great aunt/uncle/niece/nephew. You can probably rule some possibilities out based on their ages.

      1. Thanks. My mom is 20 yrs older than her predicted 1st cousin match. Her 1st cousin matches’ father is the youngest of five and has two male uncles and two female aunts. I’m thinking that one of her two uncles is my mom’s dad. One of the uncles was 19 yrs old and the other was 17 in 1943 when my mom was born. They both registered in the military in 1942. The oldest uncle was sterilized due to the Eugenics program in California, never married, and doesn’t have any known children but could have had relations before joining the military in 1942. The younger uncle did marry, had two daughters (which are still alive), one is 70 and the other is in her 60’s. my mom is 73 so I’m he could have also had relations before he joined the military in 1942. Is it possible that the grand father of my mom’s predicted 1st cousin could also be her dad due to the 1181cm match? He and his wife were also in the Eugenics program. They let him out but his wife supposedly lived out the rest of her life in the state hospital.

  10. Nice work! This is very interesting.

    I’d like to know if there’s more info anywhere on probabilities for the more distant relations. It seems 23andme considers any match in the range of (about) 15 to 42 cM (0.20% to 0.57%) on a single segment as a predicted 4th cousin with a “range” of “3rd to 6th” or “3rd to Distant”. Is there any info on what this really means probabilistically?

    I’d love to see a table like the above that goes down several more rows to 10 or 15 cM and has more columns to show 5th and 6th cousins.

    Is there a “standard” definition of “distant” cousin? It seems the 23andMe uses it to mean beyond 6th cousin, while the table here seems to mean beyond 4th cousin. Obviously the more distant the relation the more different possibilities there are, but if someone’s got a 25 cM overlap, they can’t be only, say 10th, cousins, can they? I can see why 23andMe doesn’t show more beyond a certain point, but I want to see all the geeky details!

    1. The challenge with distant cousins is that we’re not likely to share DNA with them at all. For 4th cousins, estimates range 25% to 50% chance that they won’t match. For 5th cousins, it’s 70–85%, and for 6th it’s 10% or less. Of course, we typically have enough 4th and 5th and 6th cousins that we’ll find a few who match, but by that point they’re statistically likely to share only one segment with us. Basically, there’s no way to distinguish a more-likely 4th cousin from a rare 6th cousin or even rarer 8th cousin based on a single segment.

      Here are some sources for those estimates:
      https://isogg.org/wiki/Cousin_statistics
      https://gcbias.org/2013/12/02/how-many-genomic-blocks-do-you-share-with-a-cousin/

      Another complication that doesn’t get addressed much is that a match who shares one 25 cM segment is likely to be closer than one who shares three 8.3 cM segments, even though the total cM shared is the same. In the latter case, I’d suspect that there are, in fact, multiple connections between the two DNA testers, possibly quite distant.

      1. I can see your point about the complication of matching on multiple segments. It’s not obvious to me whether having multiple segments totaling a certain share amount would be closer or more distant than a single segment share of the same amount, but it seems that 23andMe does interpret this issue oppositely from your assertion.

        If I download my DNA relatives from their site and sort by % of shared DNA, I can see several cases where they predict a closer relationship when there are multiple, short segments shared vs one longer one. For example, I have a predicted 3rd cousin there with whom I share 3 segments which total 37.5 cM. There is another person there with whom I share a single segment of 38.2 cM, and they predict 4th cousin.

        Almost everyone out of 1000+ DNA relatives I have there are predicted 4th cousins, but they also provide a “range” where they say “3rd to 5th”, “3rd to 6th”, or “3rd to Distant”. I can see many cases of predicted 4th cousins of mine there where the range moves up by one (closer) when there are multiple segments shared for a given total shared DNA amount. For example, a 27.2 cM share on 1 segment is predicted as (4th cousin with a range of) 3rd to 6th, but a total 27.2 cM share on 2 segments is a (4th cousin with a range of) 3rd to 5th.

        I tried to figure out what their ranges are for mapping shared DNA % (or cM) to particular predictions, and I couldn’t find any exact thresholds, even when I separate the groups by the number of shared segments. Among my predicted 4th cousins with only a single segment shared, there are several with a predicted range of “3rd to Distant” which have a higher cM amount than many others which are predicted as “3rd to 6th”. I wonder if there is some error in their algorithms here or if there really is some legitimate reason for this. I have not found any explanations on their site.

    2. What is the probability of being provided with false distant cousin relationships? I am finding DNA “distant cousin” relationships with people who cannot possibly be more closely related than 10th or 11th cousins. What is the chance that these are accidental and not actual cousins at this level?

      1. The smaller the segment, the more likely it is to be a “false positive”, that is, a segment that appears to match you but doesn’t match either parent. Below 7 cM, segments are more likely to be false than real. Above 7 cM, they’re more likely to be real than false. Ancestry probably has a lower false-positive rate than the other companies because they try to account for the underlying causes when they do the initial matching, but I can’t say what their false-positive rate is.

        That’s a separate issue from distant matches, though. It’s possible for even large segments (above the danger zone for false positives) to go back 10 or 20 generations.

  11. Can you extrapolate on what the relationships may look like for endogamous populations and for those who then marry outside the endogamous population? Does the count of shared Cms revert to the standard population or will endogamy play a part for many generations?

    1. That’s an excellent question. And, of course, there’s no easy answer, for a few reasons. First, even in non-endogamous populations, there is a range of shared DNA that’s normal for any given relationship (other than parent–child, which is always 50%). We expect the *average* to be higher in an endogamous group, but any two people could still be on the low end of that ranges and therefore not look like the endogamy affected their shared amount of DNA. Second, different populations have different amounts of endogamy. Cajuns and Polynesians and Ashkenazim and Puerto Ricans are all endogamous, but we wouldn’t necessarily expect them all to have the same outcome, because they have different overall population sizes and have been endogamous for different lengths of time. Third, the expected amount of shared DNA is affected by how many relationships there are, and most of our matches won’t have 100% complete trees past about 2nd-great grandparents. As a result, there will be connections that can’t be accounted for.

      Ultimately, we’ll need a combination of simulated data and crowdsourced information from well-studied populations to tease these issues apart.

      As for those that marry outside the population, the effects of endogamy do tapers off, but you can still find yourself matched to very distant cousins.

      Like I said, no easy answers. I wish there were.

  12. Great article; but one should speak of DNA in terms of quantity not “amount”.

    1. I think it’s perfectly acceptable to speak of the “amount” of shared DNA. I suspect you’re suggesting that the word “quantity” is more accurate because many aspects of DNA are discrete and countable. While it is true that base pairs, SNPs, STRs, and physical distance on a chromosome are all countable, shared DNA is measured in centimorgans, which is a calculation based on many factors. Shared DNA in cM is not a discrete quantity.

  13. Mea culpa! On further reflection, I think either amount or quantity could be correct, depending on the context.

    1. If you are on Facebook, a great group to share images and get feedback is “Genetic Genealogy Tips & Techniques”. If you’re not on Facebook, you can email me at theDNAgeek (at) gmail.com.

  14. Hello, I have some very specific needs, and maybe you could help me with choosing
    the correct kit, and what possibilities / expectations to have.
    (Maternal mitochodrial line is not so important)
    Father’s Line, (main interests)
    Pennsylvania, most likely before 1700 (traced to 1771on paper) apparently cousins
    to politically active Rush. Wish to locate Rush relationships from this Pennsylvania
    era, to confirm where I fit in. (6 and 7 generations back).
    ALSO looking to find how this family fits in with the Rush’s in the Tudor Era England.
    I have done fairly extensive research in this area, but there are some holes, and the
    information on Ancestry (from members) is horribly corrupted.
    (side note, Benjamin Rush used a Coat of Arms that tied him to one specific family
    branch, but seemed unaware in his writings of this tie-in.)

    Mother’s side, I have one branch solidly back to 1770, which is most interesting.
    Apparently a heavily intermarried cluster in Lancashire where I lose the paper trail.
    The families are documented to the 1200’s, but I cannot make the exact fit. Is there
    anything that can give hints that far back (roughly 1600’s mother side), does the
    extensive intermarriage help or hinder???

    ALSO, on the Father’s side, Father’s Grandmother, born 1879, Bohemia, has four names
    on Catholic Baptism papers that tie into Ashkenazi Jewish / Catholic Conversion /
    (Frankism) I would like to get some idea if her bloodlines were Jewish, as it appears.

    For any help directing me to which testing can help with these specific questions,
    I am most Thankful. If you guide me to one of the kits on your website, I will purchase
    it here.

    Thank You.
    R. Rush

    1. To investigate your Rush surname line, take the Y-37 DNA test at Family Tree DNA. There’s no guarantee you’ll find matches, of course, but if you do, they can be quite valuable. The test is currently on sale for $129, and you can save an additional $10 with a coupon code.

      I don’t sell DNA tests myself, but if you use this referral link to make a purchase, I will get a small commission. The cost is the same for you:
      https://affiliate.familytreedna.com/idevaffiliate.php?id=1830
      Use this coupon code: R23SGIZZUZY5. It expires today (11/19/17), so if it doesn’t work for you, let me know and I’ll get you an updated one tomorrow.

      If you’re interested in your mother’s maiden surname line, you could ask someone from her family to do the Y-27 test (i.e., her brother, her brother’s son, or maybe a cousin with her maiden surname). I can get you another coupon code if you go that route.

      The Y-DNA tests can be upgraded later if you have too many matches and want to refine the results.

      Y-DNA will only track the direct paternal lineages (the ones usually associated with surname). It can’t help you with your father’s mother’s line, for example. For that, you need what’s called autosomal DNA. With autosomal DNA, the further back in generations you go, the harder it is to find evidence for your ancestors. For that reason, your best bet is to test members of the oldest generation in your family still living. Testing your two parents, if you can, would be better than testing yourself. Testing your four grandparents (if possible) would be better than testing your two parents.

      For autosomal DNA, I recommend starting with the AncestryDNA test. You can transfer those results to other sites, usually for free, to get more bang for your buck. Right now, it’s $79 for the first test, $69 for subsequent ones. They should go on sale for even less over Black Friday weekend. You can buy them through this link (again, I will get a commission, no extra cost for you): http://www.tkqlhce.com/mt80hz74z6MVQQSRPVMONRSTVVS

      1. OK Thank You,
        I got the answers that I had expected. Ancestry the
        best with autosomal, Family Tree for Y.
        I think that with the combination of generations, plus
        distant past intermarriage, much of the autosomal
        will turn into a fog, but in a way, that alone is a positive
        answer. It seems like it might take some time just to
        sort out the data and match it up to info in archives.
        I’ll try to get back by email in the future if this comes out
        interesting.
        And Sadly, I am sixty, and there is no one older to go to,
        except my first cousins.

  15. I’ve got a crazy puzzle dealing with DNA, dealing with the unknown parents of my great grandfather, John Crumpton (1871-1935). DNA testing has shown me who his grandfather was (William Crumpton, Jr. (1805-1860)). However, there are descendants from three different children of William Crumpton who all match me at higher values than should be likely. They can’t all be my 2nd great grandparent, but the DNA makes them all look like they should be.

    As an example, one DNA cousin (R.A.M.) is the 2nd-great grandchild of William Crumpton through his son Lemuel. William Crumpton is my mother’s 2nd-great grandfather, as well. (I use my mother’s DNA results for comparison, as she is a generation closer to the source.) If R.A.M. and my mother descended from different children of William Crumpton, they would be third cousins. R.A.M. and my mother share 171 cM (probability value 0.13).

    Second, I have two matches, R.T. and A.O., who are 3rd-great grandchildren of William Crumpton, through his daughter Mary Ann Crumpton. Mary Ann is definitely not a candidate as my 2nd-great grandmother, as she was busy having a legitimate child with her husband the same year my great grandfather was born, and had three more after that. R.T. and A.O. should therefore be my mother’s third cousins once removed, and yet they match at 157 cM and 113 cM (probabilities of about 0.04 and 0.16).

    The closest relationship found with DNA is to R.E.C., a great-grandson of William Crumpton through his son Jonathan. He could, therefore, either be my mother’s second cousin once removed. Their match is 303 cM, for a probability of 0.10 of being 2C1R.

    There is one line of Crumpton matches which is complicated by the fact that I’m doubly related to them—my great-grandfather’s mother was an Emerson (based on DNA matches), and one Crumpton son (Marion) married an Emerson daughter. My great-grandfather is not one of their children, but was at least a double first cousin to their children, skewing the DNA results. Marion is also a good candidate as my 2nd-great grandfather, if he was cheating on his wife with his sister-in-law, which would make John Crumpton about a 3/4 sibling to his legitimate children. The double DNA connection makes it hard to figure out. One match, who is a 4th great grandson of William Crumpton, shares a DNA match with my mother at 174 cM. (No idea how to calculate the relationship probabilities here.)

    I lean towards the R.E.C. match, since there are fewer generations between him and my mother, so the variation in DNA match values ought to be less. The probability of their match being Group E (half 1C1R) is 0.57. (Not to mention that Jonathan was single at the time of John Crumpton’s conception, though he did marry not long after, and not to an Emerson.)

    1. Sounds like you’re making great progress with this mystery! It’s difficult when you’re dealing with double relationships, because the probability table does not apply. At I4GG this weekend, I’ll be presenting a new approach for working with cases like this, and I hope to blog about it soon after. Stay tuned!

  16. I have a question on total segment length associated with cutoff point and what it means. FTDNA and GEDmatch typically use a cutoff of 7 CM. I was “playing” around and set the GEDmatch cutoff point as two cM. I went from no match to 44 matching segments, longest 4.8 cM, total 119.9 cM. This was with a person I have a Y DNA match. This is with a person that I have a high probability of absolutely no relation of any kind for at least 225 – 250 years and likely longer than that.
    Further, I looked at several people who have the same ancestral surname. On a few of the segments there was some segment overlap for some of the people (not all).
    What am I seeing or imagining. Thanks

    1. Small segments (smaller than 7 cM) are statistically more likely to be false positives than real IBD matches. If you’ve tested your parents and can phase your DNA — and better yet, if your match can also phase — you will see that most, if not all, of those segments will no longer match.
      There are some good blogs on small segments here:
      https://thegeneticgenealogist.com/2014/12/02/small-matching-segments-friend-foe/
      http://www.yourgeneticgenealogist.com/2014/12/the-folly-of-using-small-segments-as.html
      https://thegeneticgenealogist.com/2017/09/03/sharing-large-segments-with-a-match-does-not-validate-small-segments-shared-with-that-match/

  17. First cousins share, on average about 900cM but ranging from around 575-1300. I share 1513 with my first cousin and 1305 with her brother.

    We know why it’s so high. Our mutual grandparents were full siblings.

    My question is: what would the expected shared range be given that scenario?

    1. That’s an interesting question technically and mathematically (not to mention all the other ways!). I thought about this for a while and this is what I came up with, purely theoretically:

      Note that I’m ignoring the X chromosome difference between the sexes and using the round number of 3600 cM for each copy of the genome for a total of 7200 cM in each person.

      I would first consider how related two children of full siblings would be.

      For two full siblings whose parents are not related, the answer would be 50% on average, or 3600 cM. Of course for such close relatedness, you must consider full DNA matches and half matches separately. I think ignoring that distinction is when you get people saying siblings have 2700 cM shared on average. That 2700 cM is really 1800 cM of half match and 900 cM of full match (and the other 900 cM of no match). You have to double the 900 cM of full match and add to the 1800 cM of half match to get 3600 out of 7200 or 50%.

      I won’t show all the details here, but I think using those averages, you can predict that two full siblings of two full siblings would have 1125 cM of full match sharing and 2025 cM of half match. Doubling the 1125 and adding to 2025 gives 4275 cM, or 59.375% (of 7200) shared DNA.

      It makes intuitive sense that the amount of shared DNA between such siblings would be higher than 50%, so this number seems reasonable to me. So this means that such siblings would share 18.75% more DNA than siblings whose parents are not related, on average.

      In the situation described above, the 59.4% (on average) related siblings are the children of the mutual grandparents. I think it’s valid to just scale up the typical cousinship relatedness by that 18.75%, so the 900 cM “standard” for 1st cousins becomes 1069 cM and the 575 to 1300 cM range becomes 682 to 1544 cM. The two relationships listed are indeed both in this range, so maybe this does make sense.

      1. Sorry, I just realized I left out part of a sentence in my last post. When I said “two full siblings of two full siblings” I meant to say “two full siblings who are children of two full siblings”.

        1. I would think that would make your cousins “double first cousins” (group B in first chart)

        2. Hi Tanya, to clarify: double first cousins are cousins who share all 4 grandparents (and are not otherwise more closely related). Being double cousins doesn’t require any marriage between related people. If your mom’s sister and your dad’s brother were married to each other, their children would be double cousins to you. The situation Mick presented here is very different, where a brother and sister had children together. Those children would have only 2 grandparents. Their children would still only be regular (not double) first cousins, but they would be more closely related than typical first cousins because of their grandparents being siblings to each other.

      2. Following this thread (belatedly)–I have a match with 94 cM or shared DNA on three segments. I have hypothesized that they could be a 5th cousin and 6th cousin through two lines inherited through my great-grandfather. But that amount of shared DNA seems to be outside of the range of expected shared cM even for double 5th cousins. How should I evaluate the likelihood that we have a closer relationship than I have hypothesized? We have shared matches whom I think share the same lineages in common that are in the 38-40 cM range, which seem to line up better with the expectations. Is this just an outlier result, or a sign that this one individual must have another common ancestor?

        1. That’s a great question. Depending on how much endogamy you have in your background, you could easily have distant cousins who appear to be outliers because they’re related multiple ways. (Remember, the stats we use all assume there’s a single connection between two DNA matches.)

          How many segments does your 94-cM match share with you? That can be a good indicator of whether they’re really a distant-cousin-multiple-ways or a more recent connection you haven’t found yet.

  18. Thanks, Nick, for your thoughts, and for expressing them so clearly for someone who is fairly new to this game.

    Of course, the situation I describe will be fairly rare and, probably, many people with this sort of ancestry will be unaware of it anyway. Over the generations, though, unusual unions will have occurred in many families. It makes me wonder to what extent such unions contribute to the range extremities shown on, eg the Shared cM Project. Had I submitted my data, without qualification, to that project, would we have a different range for first cousins?

  19. Hi,
    Is there any info as to how much fully-identical by chance we might share, on average, with someone else? What follows may be impressionistic, but I don’t think so. With some of my more distant matches, eg, a third cousin once removed, I seem to have quite a few fully-identical regions, quite, or very, short it’s true. Often they seem to cluster quite close together and comprise 30-50% of a 7+ cM block which GedMatch shows with its blue marker. If these are really identical by chance (and I suppose they must be) then are they distorting the ‘longest segment’ calculation, and also the shared cM calculation?

    My background is UK, so endogamy ought not to be an issue.

    1. A fully identical region (FIR) will be solid green. If you’re seeing a large number of thin vertical green lines in an otherwise yellow block in the one-to-one comparison, that’s not an FIR. That’s just someone who coincidentally shares the same SNP(s) at those spots. It’s not unusual if you both share deep roots (thousands of years) in the same area.

      1. Thanks. That’s clarified it somewhat. One further question, if I may, how wide does the thin green line need to be to be considered a FIR?

  20. For the sake of correctness…
    It would seem you have a typo in the fifth paragraph… “Someone who shares 3015 cM with you can only fall into Groups B or C, but someone who shares 100 cM could belong to Group E, F, or G, according to the DNA Detectives chart.”
    – Easy to see that ‘1315 cM’ (rather than 3015) would correctly continue the subject of the paragraph. Perhaps an error in speech recognition?

  21. My older daughter Sarah recently tested and no surprise that we share 3,384cMs w/longest block 267cMs. Sarah’s half aunt transferred her raw results from myheritage to FTDNA, and matches Sarah 886cMs on chromosomes between 1-22, but nothing on chromosome 23 aka “x”. Sarah’s mother & Sarah’s half aunt have the same mother(different fathers), but the same grandmother of Sarah, so is it possible that Sarah & her half aunt do not match any “x” chromosome dna?

    1. Yes, it’s possible for Sarah and her aunt to not share any DNA on the X. There are probably other chromosomes on which they don’t match, as well.

  22. Will Probability Analysis come up with a common ancestor in the 1700’s? I have collected my DNA cousins who share NativeAmericanIndian DNA with me, and am putting their tribes and trees and ancestors and stuff like that into a spreadsheet…

    1. The more distant the shared ancestor, the more DNA evidence you would need to provide evidence for the relationship.

  23. I am 75 years old and am so confused about my dna test. I found out recently that my father may not be my father. My sibling and I had a test done. the segments say we are matched at 42 and the cM is 2449. BUT, my sister is 56% English, I have none. I am Irish, she has none. Can you please help me. I need to know before I die.

    1. That amount (2449 cM) is in range for a full sibling. I wouldn’t fret about the different ethnicity estimates. One reason for the differences is that you share both parents, but you didn’t necessarily inherit the exact same bits of DNA from each parent. Another reason is that ethnicity estimates are still a developing science. Which company did you test with?

  24. This is very handy, as is the probability calculator on the DNAPainter site.

    Here’s my issue. I share (according to Ancestry) 64 cM in 5 segments with a bloke somewhere in N. America. He descends from Alice Dixon, a sister of my gt. grandmother, so, assuming we are of the same generation (and I think we are) we are 3rd cousins.

    Problem is that I don’t know for sure who was the father of his ancestor. The mother had been married before. Alice’s birth certificate shows her surname as Cole, and the father as William John Cole, the actual husband at the time. Five weeks after the birth registration, Alice was baptised with the surname Dixon, and the father as Samuel Dixon. My gt. grandmother came along later so was definitely a Dixon.

    So, depending on which is right – the birth certificate or the baptism, my match is either my 3rd cousin, or my half-3rd cousin.

    The probability calculator gives a 32% probability that we are half-3rd cousins, and 22% that we are full-3rd cousins.

    But the Shared cM Project Table 3 states that Ancestry averages/medians are 64/53 for a 3rd cousin and only 33/39 for a 3C1R (same as a half-3C in DNA terms). This seems to reverse the likelihoods shown by the probability calculator.

    So, my question is – given the substantial differences between the different companies in calculating shared cMs, is it really viable to have averages that derive from the aggregation of various company’s results?

  25. The amount you share with your sister is for a FULL SIBLING. Unless you are identical twins you won’t have the same ethnic make up.

  26. I have a surprise atDNA match who is either my 1C1R or 2C1R, depending on which of two brothers was my biological grandfather. Our shared ancestors were French-Canadian, with a single recent instance of cousin marriage, which also makes us 3C2R.

    We share 303 cM across 16 segments on AncestryDNA. Being somewhat familiar with the TIMBRE effect when endogamous populations are involved, I asked my match to upload his results to GEDmatch, where our One-to-Many match is 337 cM and our One-to-One match at the default settings (7 cM and 500 SNPs) is 326 cM across 13 segments.

    Based on these numbers, it appears we are far more likely to be 1C1R than 2C1R, even when the potential 3C2R “contribution” is taken into account. How would you go about quantifying that likelihood?

    1. That is such a great question, and unfortunately, I don’t have an easy answer. The best way to go about it would be to simulate each combination of relationships (1C1R + 3C2R and 2C1R and 3C2R) enough times to create distributions of expected shared DNA amounts, and then compare the distributions to the real numbers.

      I have a fudge that I use in cases like this, but I’m not ready to make it public because I haven’t validated it. I ran it on your scenario and it only very slightly favored the 1C1R hypothesis. Definitely not by enough to have any confidence in the result. Is there someone else you can test?

      1. Thanks. Fortunately, there is another person and she just agreed to test. Same relationship (either 1C1R or 2C1R) but a different grandfather than the other.

        That said, I am mystified by your conclusion that the existing results are close to a toss-up.

        1. It’s close to a toss-up because 303 cM is low for a 1C1R and even lower given that some of the shared DNA could have come through the 3C2R connection. Then again, it’s high for 2C1R + 3C2R. (Also, I make no claims to how well my “fudge” works, so maybe that’s where the problem lies.)

          Please update me when the new tester’s results come in. I’d love to see whether my tentative prediction holds or is rejected by more evidence.

  27. Thanks again. I will definitely let you know when we get the new results. At least 6-8 weeks, possibly more. How best to contact you with the results at that time?

    One more question for now. Why did you go with the 303 cM number from Ancestry instead of the 326 cM number from GEDmatch? (Perhaps the difference would be insignificant.)

    1. You can email me at theDNAgeek —a— gmail.com.

      I use AncestryDNA numbers over other estimates because Timber is, in theory, removing segments that are pileups. Pileups are unlikely to reflect recent shared ancestry, so we actually want to downweight them.

      1. One more thing (sorry). In my own crude analysis of this, I assumed that if the total match was 70% of average, then each of the two components (the 2C1R and the 3C2R) would be 70% of average. I gather that’s not right?

  28. Hi,
    Can you please help me…If two brothers each have a child with the same woman, what relationship will those children have? Will their centimorgan values be on the high side of half siblings? Or are they just regular half siblings?

    1. We call those “three-quarter” siblings. They’d be expected to share an amount of DNA between that of half sibs and fulls sibs.

  29. Help….I have a mystery guest that is either a 1/2 first cousin, or a second cousin on my paternal side. How can you determine a more precise match? I match my known Paternal Uncle at 1,825 centimorgans shared across 55 DNA segments. I match his daughter my known first cousin(female) at 1,042 centimorgans shared across 33 DNA segments. I match the mystery guest at 542 centimorgans shared across 22 DNA segments. My Uncle matches the mystery guest at 497 Centimorgans and his daughter(my known first cousin) at 287. I see how the numbers seem to indicate a 1st cousin once removed for my uncle and a second cousin relationship for my first cousin, but my number is higher than both theirs and indicates a 1/2 first cousin. Mystery guest father is the same age as my paternal uncle and mystery guest is close to my age. Unfortunately all the parents/grandparents and my Uncle are deceased so there is no was to get a DNA sample closer up the line.

      1. Thank you. I still had strange results.. probably due to only having 5 DNA sample cm numbers to compare. If all DNA cm’s are correct from all 5 sources, my Uncle, my 1C and my 2C, then the mystery guest is likely my 1/2 1C.

        I am going to have a test done at 23 and me, to confirm or deny my ancestry dna results.

        Removing my uncles Cm gives a score of 89 and points to the mystery guest being a 1/2 1C.

  30. Many thanks for the Utility and Blog.

    I’ve been somewhat perplexed over Siblings being given as sharing 50% of their DNA, but 2629cM rather than 3487cM… after reference to Blaine’s 2 books, I found a description on p104 of the first (as also given above) intimating that 25% is shared with both parents (FIR), 50% with either parent (HIR) and 25% with neither (on average!). This is then given as (25% + 50%) of 3487cM i.e. 2616cM – I can see you make the distinction between the testing companies in the DNA Detectives table above. AncestryDNA have then needed to qualify close matches above 1300cM as given in Figure 5.3 of their White Paper, identifying full siblings as 25% FIR and Identical Twins as 100% FIR to distinguish from Parent-Child and Half-Sibling relationships.

    It’s taken me a day resolve the confusion, so I thought it was worth a mention; I’ve always been somewhat perplexed that at GEDmatch & Ancestry I match about 3500cM with either parent, but no more with myself!

    1. Technically, we each have about 7000 cM of DNA, once you account for both copies of each chromosome. Because we pass along only one copy of each chromosome to our children, they match us at 3500 cM (give or take … it varies slightly by company) in what we call half-identical regions. Those are the ones that show in yellow when you do a one-to-one comparison at GEDmatch with the graphics on. An identical twin would match on both copies (7000 cM total) and would show as fully-identical at GEDmatch (green instead of yellow), but most of the companies don’t count the fully identical regions twice. That’s why your match to yourself (or to an identical twin) would only be 3500 cM instead of 7000 cM. It’s also why the total for full siblings seems off.

  31. Yes, that makes those upper level figures a bit tricky to interpret – it seems apparent that AncestryDNA do actually determine the amount that is FIR (as described in Figure 5.3 of the White Paper) and could presumably include the information with the match data. I think that would then lead to a bell-shaped curve at that top level, extending down from 7000cM. The lower curves then derive from that fundamental (i.e. comparison of two siblings is a comparison of two instances of the procreation process (avoiding the term meiosis!), which then repeats at each generation).

    I’ve also realised that the FIR/HIR mix inherited from parents must be 50%/50% for the mix between Siblings given (i.e. 25% FIR, 50% HIR, 25% no match)? i.e. 50% of the 50% FIR is shared between siblings (on average); the other percentages then follow.

    1. Sorry, I got a bit confused with the last paragraph (not difficult I guess!); I’ve been consulting Blaine’s first book again – p100 gives a nice diagram of the inheritance pattern from grandparents to grandchild. I think that 25% FIR content for a full siblings match derives from their sharing 50% of their father’s DNA on the paternal chromosomes and 50% of their mother’s on the maternal side; an expected 50% overlap between those would then lead to 25% FIR.

      The source of the variability (depicted in Figure 5.2 of Ancestry’s White Paper given above) must then come from the way in which the father’s paternal & maternal chromosomes are combined to provide the child’s paternal chromosomes, and likewise on the mother’s side. If that aspect can be modelled that might then reveal the background Endogamy in the more distant matches (i.e. they’re actually multiple matches)?

      Also wondering if the degree of FIR in a siblings match imparts some additional information not present in more distant matches.

      Think I’d better move over to the Facebook group…

    1. So-called false positive segments become more likely as the segment size decreases. They are rare for segments of 15 cM and higher, whereas most segments below 7 cM are false positives.

        1. To be honest, I don’t think it makes much of a difference. We’ll find out soon enough, though, as Ancestry now offers WGS as part of their health package.

        2. Current matching is based on a few thousand comparisons points among billions of possibilities, meaning that a lot is imputed. If you compare the actual genome, your accuracy should improve dramatically, allowing you to go back at least several more generations.

        3. Because our genomes are 99.9% identical, the gain from WGS for genealogy will be incremental at best; we won’t be gaining 3 billion new bits of information, more like 3 million (compared to 700,000 carefully selected SNPS from microarrays). I highly recommend this scientific paper, which estimates “that WGS data provides a 5% to 15% increase in relationship detection power relative to high-density microarray data for distant relationships.” It’s open access (free). https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004144#:~:text=The%20most%20accurate%20methods%20for,of%20genetic%20sharing%20between%20individuals.&text=We%20show%20that%20ERSA%202.0,high%2Ddensity%20genetic%20marker%20data.

  32. Thank you for the article, Still trying to figure all this out.
    Would someone with 1123 Cm across 40 DNA segments possibly be a half sister?
    Her daughter shares 860 Cm across 37 DNA segments with me.

  33. Hi Leah.

    Through a good deal of hard work and a great deal of good luck, I have finally identified my French-Canadian paternal grandfather. I have a very robust “DNA Circle” on AncestryDNA of 18 first, second, or third cousin matches who share at least 45 cM of autosomal DNA with me, and I have been able to triangulate matching segments for those who are also on FTDNA or GEDmatch. I have personally researched and confirmed all of the associated family trees. I even have a confirming Y-DNA match.

    One of the striking aspects of the autosomal data is the huge variability of shared DNA amounts among matches who are identically related to me. But the aspect which intrigues me most is the fact that my second cousin matches all share much less than the “normal” amount of DNA with me (averaging roughly 0.5 normal) while my third cousin matches in the same ancestral line all share much more than normal (averaging roughly 2X normal). In fact, one of my 3C2R matches shares more DNA with me than any of my second cousin matches!

    What do you make of this? Thanks for your input.

    1. I can think of a few possibilities, which are not mutually exclusive:
      (1) Your second cousins are actually half second cousins or second cousins once removed.
      (2) Your third cousins have another connection(s) to you that’s increasing the amount of shared DNA.
      (3) You don’t fit into the family precisely where you think you do.
      (4) Some of your matches don’t fit into the family where you (and they) think they do.
      (5) One or more of the above.

      To get a better idea of which applies, focus on how much DNA you share with the first cousins.

  34. Hello! What a great post – thank you so much!

    I have a simple question, not sure how simple the answer will be. Looking at the tabular chart, 1400cM shared has a 9% chance of Group C and a 91% chance of Group B. The last table, however, shows that the highest end of Group C range ends in the 1370s (lower if using DNA Detectives numbers).

    Can this contradiction be explained? If the table is correct, why would three separate analyses conclude the range for Group C peters out in the 1370s or lower? Perhaps the error is with the graphic to table converter (geek power is really awesome but has its limits)?

    I ask because I’m trying to determine my birth father and I have a match on Ancestry that shares 1,385 cM of DNA. She’s only a few years older than me but (timewise) it is technically possible that she could be a half-aunt (vs half sibling), and if half-aunt is genetically possible (greater than 9% according to the big table!) then so is 1C.

    So…. how likely is Group C at 1,385 cM? If not possible, then she would have to be a half sibling.

    1. The numbers differ because they’re derived in different ways. The DNA Detective vet every single relationship, but they intentionally leave out the high and low ends of group, so their ranges will be smaller. The Shared cM Project is based on self-reported data, so each data point isn’t independently confirmed, but there’s a lot more data. The SCP also leaves out extreme data points. The AncestryDNA data is based on computer simulations, so they have a lot of data points but the accuracy depends on how good their computer model was.

      At 1385 cM, your match could definitely be a 1C. You might find this tool helpful: https://dnapainter.com/tools/sharedcmv4
      It’s based on the AncestryDNA data.

  35. Thank you for your efforts here.

    I have a concern though. We have more 3rd cousins than 2nd cousins, so if we get a match that is intermediate between 2nd and 3rd, is it not more likely to be a 3rd cousin than the figures above suggest? To take a ridiculous example, if there were 1000 3rd cousins to every 2nd cousin, I would bet that any match that was intermediate would be a 3rd cousin. So shouldn’t we be allowing for the likely frequency of each relationship match (adjusted for the probability it won’t match at all)?

    1. Great point! Fortunately, the Ancestry simulations (Fig. 5.2) already factor in the frequencies of the relationships. Unfortunately, we don’t know what model of population growth they used (i.e., to use your example, what the ratio is between 2nd and 3rd cousins), so we can’t replicate it.

  36. Thanks for this article
    just to clarify: looking at the chart for someone who shares 100 cm, the probability of each of the individual groups (group A B ,C etc) adds up to 99% (39 +30+ 20+ 6+5=99). Does this indicate statistically the probability the person is not related to you at all is 1% if you share a segment 100cM in length?
    Which means that if the genetic testing reveals such a length the person is most likely in fact actually related to you in some capacity?

  37. I have a new found match from Ancestry (“close family-1st cousin” @ 1371 cM across 43 segments) with a previously unknown male family member. I only have his name, no location, and his last log-in was in late Dec 2018. I have sent several messages with no response.

    This match shares common cousins on my maternal grandfather’s side so I can rule out my father’s side. I can only surmise two possibilities: 1) a previously unknown half-sibling from my mom before her marriage to my dad or 2) a previously unknown half-uncle from my maternal grandfather (child out of wedlock).

    Because my mom is in her late 80’s and has dementia and rapidly failing health, I can’t ask her about this without getting an unreliable answer, let alone potentially causing her great upset/agitation. Additionally, if this person were her half-sibling (my half-uncle), it is likely he would have already passed based on his approximate age (late 80’s-90’s based on known family history). Not impossible, but at the very least he is VERY old and probably not tooling around on Ancestry.com. So I do think the age factor rules out his being a half-uncle.

    My conclusion is, this is a previously unknown half-sibling (BOOM! mind officially blown).

    Am I “probably right” about this? I’ve used your tool and it looks like it’s an 85% probability of being a half-sibling.

    I have a small family and all immediate members are known. No possibility of an unknown nephew, uncle or first cousin.

    Any suggestions on how to find out who/where this person is, or at the very least, confirming their relationship to me?

    Should I discreetly get my mom to spit into a test tube and send it in to Ancestry.com to see how she matches with this person? My (full-sibling) sister has just sent in her sample and we are awaiting her results.

    I hate the thought of never knowing who this person is. If he my mother’s child, I would sure like for them to have a chance to meet before she dies and of course, I would like to know him as well.

    Also, I loaded my results onto GEDMATCH but did not find him there.

    Thank you for any advice!

    1. Although half-sib has a higher probability than first cousin, both are still possible: https://dnapainter.com/tools/sharedcmv4/1371

      One way to gauge whether he’s a half sib (same generation as you) or uncle/nephew (different generation) is to use the What Are the Odds? tool with DNA share amounts for other members of your extended family. Would they be willing to help you?

      You might also get more information when your sister’s results come in. Her share with him might be a more definitive amount.

    1. The chart came from Ancestry data, so I can’t add to it. An identical twin at Ancestry will show as “self/identical twin”.

  38. I have a match with another female at 1287 CM / 42 segments and we share 189.7 cm on X chromosome, with the longest segment on the X being 105.1 (61,9400,828 – 154,916,845).

    We also share 109.8 cm on Chromosome #2.

    The original cm’s from Ancestry DNA test listed us as first cousins, but after reviewing numbers on Gedmatch and checking the X, I believe we at half sisters on the Paternal side.

    I have tested several females and their results came back as first cousins (800’s range) and do not have a full X match with me.

    I have been testing the daughters of each son as one of the brothers is my biological father.

    Am I correct that she is a half sister based on this data?

    Thank you,
    J Smith

    1. Paternal half sisters would share the entire X chromosome. Occasionally it’s less than 196 cM but the mismatch will be at the ends of the X. Sounds like yours is in the middle, which would suggest 1C.

  39. My most puzzling match has 94 cM or shared DNA on three segments (ancestryDNA so no indication of the length). We have very little endogamy in my tree, and I’ve found less on these branches than some of my early New England colonial ancestors. I do have matches with folks who descend from two of my major lineages (i.e., they have the same ancestral lineages that my grandfather did coincidentally, but I have not found any close consanguineous marriages–just one possible incident of a second cousin marriage at the great-great grandparent level).

    1. 94 cM over 3 segments is probably a legitimate 2nd–3rd cousin match, rather than a distant cousin multiple times over. If you both have trees that appear to be complete back to, say, 3rd great grandparents, I would consider that there’s a misattributed parentage event on either your side or theirs.

  40. Yes, the strength of the match has me thinking that something hidden must be there. I’ll try to get them into GEDMATCH and see the individual lengths and maybe run triangulation on those shared segments. I have all lines documented at the 2nd grandparent level (and beyond) except for my direct patrilineal 2nd great-grandfather.

  41. Some more data on possible levels of endogamy–one of my unconnected matches has 108 matches on ancestry at the 3rd cousin level or closer (90 cM shared). Another has 48 such matches. A third, who seems to be LDS ancestry, has 67 such matches. That led me to the realization that another “mystery match” (suggested by the New Ancestor Discoveries tool), has 374 such matches; I think he may be LDS too.
    For comparison, my dad has 19 such matches. 1/4 of his line is more recent immigrant ancestry (mid 1800s from Northern Ireland), but those represent 7 of his 19, so they seem over-represented.
    It seems that the high number of matches among those DNA cousins suggests that their lines either were very prolific, their descendants are very into genetic genealogy (which may be likely for the LDS folks) or there was more endogamy than might be obvious from their trees (also likely among the LDS folks).

  42. Hi, I am a mom, and I share 3,304 cm with my son. Your algorithm show me as 100% his sister. So am I a chimera or is it just the limitations of the alogrithim?

    1. I suspect you’re misreading the charts. Amounts of 3300 cM and above are parent–child. Parent–child comparison can also be distinguished from full sibling ones by the pattern of segment matching. A parent and child will match on one of their two chromosomes across the entire set of 22 autosomes. Full siblings will have sections where they match on both copies of the chromosome, sections where they match on one of their two copies, and sections where they don’t match at all.

  43. Okay I’ve Got One For You,
    My “Fraternal Twin” Brother and I share 2,636 cM across 73 segments. (We do share the same Mother and Father) We ended up matching a “Male – 1st Cousin” thru Ancestry. I, being a female, share 1,261 cM across 59 segments with this person, while my Brother shares 1,419 across 63 segments. Although Ancestry has us as a 1st Cousin Relationship, per your Chart and Probability Table, should this actually be a Half Brother? My Twin Brother’s shared cM says yes, but my cM, being the sister here, says it is 1st Cousin. I ask this because our DNA Match seems to think we are Half 1st Cousins meaning “his” Father had a Half-Brother that would be our Father. I think the cM’s are too close for that scenario and that we all share the “same” Father.

    1. You’re absolutely right that this match shares too much DNA to be a half first cousin. It’s about 5 times more likely that your match is a half sibling than a first cousin, but there are some other possibilities, too. He could be your full nephew or full uncle (same probability as half sibling) or your half nephew or half uncle (same probability as first cousin).

  44. Re: the 1st chart – DNA-Detectives-Autosomal-Statistics-Chart.png?ssl=1
    The bottom 2 rows (Group G & H) under column “Notes” indicate % of “not sharing” (I couldn’t find the *footnote). It’s the 1st time I’ve seen % of no relationship. would there be further information on this subject, e.g., the footnotes??

    1. AncestryDNA’s help pages have a good explanation for why more distant cousins might not match. https://www.ancestry.com/cs/dna-help/matches/inheritance
      Note that the reported percentage of, say, third cousins that don’t match varies from company to company. For 3rd cousins, 23andMe and Family Tree DNA both say about 10%, while AncestryDNA and simulations by a professor at UC Davis put it at 2% and 1% respectively. The reports also vary for 4th cousins and more distant cousins.

  45. Leah, excellent information and discussion. Thank you!

    My full sister just tested at 23 & Me and we share 3843 cMs (51.7%) .
    Longest segment 181.32 cMs
    74 shared segments
    Half identical 2897 cMs, 40 segments
    Fully identical 947 cMs, 34 segments

    Obviously, 23 does some double counting in their approach which skews/inflates the final number. Wish they join the norm/standard so everyone would be on the same page.

    1. You’re absolutely right! 23andMe counts the fully identical regions (FIRs) twice, so the total there for full siblings isn’t comparable to the same match at other companies.

  46. 4 Years ago I tested my DNA. The results were a big surprise. The man that I knew as my father was not. He passed as well as my mother so I could not get answers. My brothers had their DNA test and the are half siblings. I matched a 1st cousin 728 cM range he put me in contact with his 1st cousin. He ran his test and matched at 1092 cM range. For a time I thought that I had found a half sibling and who may have been my bio father he has passed. With all the new tools I’m beginning to question that. My cousins grandfather had 3 sons two of them could not have been my father. The 3th son was in France WWII during the time of my conception. So now I’m thinking the grandfather could have been my bio father.
    Any Thoughts?

    1. A match of 1092 cM is much more likely to be a first cousin than a half sibling. If you’re sure you can rule out all three of the grandfather’s sons, then yes, it’s possible the grandfather was your father. You might be able to tell for sure by using the WATO tool. https://thednageek.com/science-the-heck-out-of-your-dna-part-7/
      We could also schedule a phone consultation to review your DNA matches to see whether there is enough information there to answer your question.

  47. So you are comparing a half-nephew against a first cousin relationship? Seems like those will be equivalent in terms of shared segments (for a somewhat comparable case see https://slate.com/technology/2019/10/23andme-family-secrets-half-siblings-cousins.html). You might be able to divine something based upon the lengths of shared segments, but a more straightforward approach would be to figure out the ancestry of the wife of your cousin’s grandfather, and see if you share cousins through those lines. If you do, then you descend from one of the sons. If you don’t then the grandfather as your biological father seems probable.

    1. That can be an effective approach if the cousin also has matches through the grandmother (meaning she didn’t come from a population that hasn’t tested much) and if there the population wasn’t endogamous. Another approach is to use WATO, which can often pinpoint what generation the tester is in, i.e., child of the grandfather versus grandchild of the grandfather.

  48. Thank you for your reply.
    I don’t think I have enough info to use the tool. This is what I have.

    My match with Russell is 1092
    MY match with Ken is 728
    Russell match to Ken is 829

      1. I found two more matches.
        Ken matched me 749
        Ken matched Russell 852
        Russell matcher me 1091
        A1 matched me 205
        A1 matched Ken 44
        A1 matched Russell 71
        A2 matched me 119
        A2 matched Ken 78
        A2 matched Russell 59

  49. The probabilities of each group of relationships given by my recent AncestryDNA results differ quite a lot from the table on this website: eg. for a match of 128 cM (my highest match), Ancestry / DNAgeek results compare as follows:

    Group F : 50% / 48% – about the same
    Group E : 31% / 14% – quite different
    Group G : 15% / 25% – quite different
    Group H : 2% / 10% – quite different

    Similarly divergent results apply to all my lesser matches (108 cM, 101 cM, 52 cM etc).
    Has there been an update in the AncestryDNA methodology that is not yet reflected in the data on the DNAgeek website?

    1. Ancestry updated their probabilities when they came out with ThruLines. They haven’t published the new numbers, though, so we’re still using the earlier ones in tools like What Are the Odds? and the Shared cM Project Tool.

  50. So does that mean that Figure 5.2 in Ancestry’s 2016 DNA Matching White Paper and the table of probabilities per cM value derived from it on your website are both out of date then?

  51. Hi
    What would it mean if my father and his sister share some DNA matches but there are some only she has not him.
    Some matches she has 305cm and he has 99cm this makes her a 2-3 cousin and him a
    3-4
    There are matches to people that have links to the 1st cousins as well as my dads sister but not with him

    1. It’s not unusual for siblings to share very different amounts of DNA with the same relative. My first guess is that the example you gave is a 2nd cousin (or similar) to your dad and aunt. Your aunt shares a bit more than average while your dad shares a bit less. By considering both matches together, we can get a better idea of what the relationship is. Once you get to 3rd cousins and further, it’s possible to not match someone at all, even though the relationships is real. As long as your father and aunt match one another as full siblings, you’re just looking at natural variation in shared DNA.

  52. Hi, my mother’s half uncle matches her at 1790 cM. We know 100% he is a half great uncle because of DNA testing. According to your chart, 1790 is 475 cM away from the maximum of half uncle.

    I know 1790 is closer to the average for full uncle than the maximum of half uncle. However, we have proven with DNA that my grandmother and my half great-uncle had different fathers.

    Is the 475 cM difference too much of a deviant?

    1. A match of 1790 cM is well outside the range for a half uncle. Is it possible there’s another connection?

  53. My wife matches with with her sister with a cM of 2906. She matches with her half sister with a cM of 1975. Her sister matches with the same half sister with a cM of 2985 indicating that they are full sisters. All the parents are deceased. On the face of it my wife has one sister and one half sister while her sister has two full sisters. This is of course is physical impossible. Is there any way of filtering the results to indicate which individual’s DNA indication is giving a false indication over their relationship to the two other individuals.

    1. Well, that’s an interesting situation! Are they all three matching one another in the same database?

  54. It appears to me that the math behind the various relationships is flawed when it comes to predictability. I understand the generational influence of relationship to dna value, but I am struggling to prove / predict MRCA using match values. Applying the coefficeint of relationship factor to the shared matches should allow you to predict the position of the MRCA. However, it is not that simple as I believe the factor is a convenient number (average), and of course is influenced by any pedigree collapse through descendancy. I am working on a problem involving two lines that share the same surname, from the same local and all have produced descendants that have matching values to the subject. I can eliminate possibilities but am struggling to confirm generational location of the MRCA. Am I applying the wrong calculation or is the variability of shared dna beyond the math?

    1. The only genealogical relationship with a fixed amount of shared DNA is parent–child. All other relationships involve a distribution of shared DNA amounts. To complicate matters, some relationships have the same (or nearly the same) distributions (like half sibling and aunt/uncle), and the distributions of different relationship groups overlap. The overlap gets worse the lower the cM amount. Relationship prediction becomes more accurate when you can compare multiple matches to one another.

  55. I have an outlier. It doesn’t show as possible here or on the new tool that I was hoping would be more useful for endogamy (I’m not finding it useful) but does show up as a possibility on AncestryDNA in the pop down when adding a relationship. Are you still collecting information on outliers? It’s a 1/2 1c1r sharing 37cM.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.