Fact Check: The Misunderstood Centimorgan

The poor, misunderstood centimorgan!  It’s fundamental to most of what we do in genetic genealogy, yet it may well the most frequently misdefined term in the field.

A centimorgan is a unit of measure for autosomal DNA segments.  The more DNA we share with someone in centimorgans, the more closely related we probably are.  Yet even experts in the field sometimes get the definition wrong.

Take this example from a social media post by a DNA testing company:

 
A centimorgan (cM) is the unit of measure for the amount of DNA shared between two people, but what is it really?

The centimorgan is actually a measurement of the probability of genetic recombination occurring; a process that happens during the formation of sperm cells and egg cells when 2 copies of each chromosome is reduced to just one copy (meiosis).

One centimorgan represents a 1% chance that a recombination event will occur between two places along the length of the chromosome.
 

 

Let’s Think This Through

If a centimorgan were a probability, then 100-cM segments wouldn’t exist.  They would have had a 100% chance of being broken up in the previous generation, so we would never see them.  (The exception would be parent–child and full-sibling matches, where segments might appear to be ≥100 cM for technical reasons.)

So what do we see in the real world?  I share a segment of 107 cM with my uncle.  Both of my kids share segments greater than 101 cM with each of their three grandparents available for comparison.  (Their fourth grandparent tested in a different database.)

Thus, from first principles, a centimorgan cannot be a probability.

So What Is a Centimorgan?

First, some background information.  Briefly, we have 22 pairs of autosomal chromosomes, but we only pass on one copy of each to our children.  (The child gets the second set of chromosomes from their other parent.)

We don’t usually pass on chromosomes exactly as we inherited them from our own parents, though.  When our bodies make eggs or sperm during the process of meiosis, the chromosome pairs essentially swap bits so that the child’s copy is unique.  This swapping is called crossing over or recombination.

Two crossover events along the mother’s chromosome pair create a unique chromosome in the egg.

 

A centimorgan is, by definition, an average rather than a probability.  A 100-cM segment crosses over, on average, once per generation.  It can cross over more than once—or not at all—in any given individual, but over thousands of meiosis, the average for that segment is 1.  Similarly, a 150-cM segment averages 1.5 crossovers, a 90-cM segment averages 0.9, and so on.

The correlation between centimorgan and the probability of crossing over is not linear.  The table below summarizes the estimated values.  (For the super-nerds amongst us, these calculations assume a Poisson distribution.)

The true nature of a centimorgan makes individual segments challenging to interpret.  Large segments can be misleading.  For example, in computer simulations, the average longest segment for half first cousins was about 69 cM, but the longest could be as much as ≈225 cM in rare cases.  For third cousins, the average longest was ≈31 cM but could (rarely) be over 100 cM.

Looking at these numbers another way might be more useful.  If a match has a longest segment of 100 cM or more, they should be a third cousin or closer.  If the longest segment is ≥150 cM, the match should be a second cousin or closer.  And if the longest segment is ≥200 cM, the match should be a half first cousin or closer.

~~~~~~~~~~

Do you have a suggestion for a DNA fact check? Leave it in the comments!

24 thoughts on “Fact Check: The Misunderstood Centimorgan”

  1. Very helpful! Thank you.
    I just want to be sure I understood–are you saying that a 100 cM segment will, on average, be split once by a crossover in the next “recombination event”? So, supposing a chromosome were 100 cM in total a single crossover over its length is most probable?

    1. Sorta. An average isn’t quite the same thing as a probability. But for everyday work, that’s a reasonable assumption.

  2. Try this one: If a person has two tested parents and matches someone neither parent shares with, then that must be a false match.

    Only, not so fast. It likely would be, at most companies. But at Ancestry, you cannot know because you do not know what Timber may have done to the sharing of the connecting parent (if the match is real).

    It is possible that no matter how much such a parent may actually share with the match, Timber may have “adjusted” the amount to less than 8.0 cM. If so, the match would then drop entirely from the connecting parent.

    A possible example of this is that my daughter has a match with whom she shares 21 cM in two segments (post-Timber). Unweighted sharing is 28 cM, with a longest shared segment of 16 cM.

    Neither parent is shown as a shared match, which tells us they can’t share more than 20.0 cM (post-Timber). However, neither parent shares any DNA with the match even on direct comparison. But rather than saying the match isn’t real, this only tells us that neither parent shares at least 8.0 cM post-Timber. They *could* share much more, unweighted, but it doesn’t matter. We only see Timber-reported sharing of 8.0 cM or more.

    In the case in question, however, my daughter actually has 21 shared matches with this match. All 21 are paternal, and in fact at least 17 of them share common ancestors with me. All of them are descendants of my paternal grandfather’s paternal grandfather’s paternal grandparents. Some are even more closely related.

    Yet another problem with Timber — as least as used by Ancestry. I would not object if Timber were used as a “caution” flag, but when used in a way that completely eliminates possible matches, I do object.

  3. Okay, I think you are saying that average is only most probable in a symmetric distribution. Or something like that…

  4. Something I’ve been wondering, it sounds like the crossover mechanism and probability is heavily conserved. Do you know if it has it ever been observed to not cross over, but instead pass strands directly to the child? I’m thinking if this occurs, a grandparent could appear to be the parent.

    1. That’s a fun thought!

      Short answer: no, I don’t know that it’s ever been observed and I doubt it ever will. It’s not uncommon for a grandchild to share all of one chromosome with a grandparent, though.

      Longer answer: It’s complicated! Even if none of the chromosomes crossed over, there’s only a 0.5^22 (one in 4.2 million or so) chance that all of grandma’s chromosomes would end up in the same gamete. That would need to be multiplied by the probability that none of the chromosomes crossed over. I’m on my phone right now and can’t do the calculation, but I’m thinking it’s NOPE level of small!

  5. And of course DNA recombination is not uniform across the length of the chromosome strand either – that would be too simple. Chromosomes are much more likely to recombine in some regions of the chromosome (areas of high recombination frequency) and less likely at others. It’s all to do with how tightly coiled they are and how far you are away from the centromere (where the two ‘strands’ are connected to make a whole chomosome) or if you are way out near the tips… The closer you look, the more magnificent and fascinating the whole process is, never mind allowing us a window into genealogical time.

  6. It’s not observed naturally — but can be observed. Dolly the sheep was a clone of her mother, born to a surrogate sheep of a different breed. BUT even Dolly was not 100% genetically her mother. Due to the crazy haphazard way our heritable information has evolved, Dolly’s mitochondial DNA – responsible for energy production in each of her cells – was that from her surrogate mother, so Dolly’s autosomal genes were 100% mum, her mt. DNA was 100% surrogate. More on Dolly here: https://dolly.roslin.ed.ac.uk/facts/the-life-of-dolly/index.html
    You’ll also find out why she was named ‘Dolly’… who says geneticists don’t have a sense of humour…

  7. I have a match; and my great granddaughter and I share almost the same number of cMs with him. Is that odd?

      1. My memory was not serving me well, but for clarity, I now realize I was thinking of longest segment, not total.

        Yes, Eric Sweet, is my 4-5th Cousin, with whom I share 5 segments at FTDNA with total of 102, and longest 51.

        My great granddaughter shares 1 segment with Eric Sweet, 34 total, and 34 longest.

        So, my longest is 51 and my great granddaughter’s longest is 34.

        So, four of my small segments disappeared as they were passing down to my ggranddaughter, and the one longest stays strong?

  8. So, I went back and looked at my tree and this match is my great granddaughter’s 7th Cousin. I am happy to see that because I never believed autosomal dna would/could pick up dna back that far but now it has been proven to me.

  9. I think it may also depend on the other partner’s DNA ‘strength’ as it were. So not so much ‘luck of the draw’ but an all out battle if you will. Not that I’ve ever seen that mentioned anywhere (aside from dominant vs recessive; is that what that is?) but I do have an excellent example.

    My 2x gr.grandfather was, ehm, ‘fond of women’. After tirelessly delving through family history because I was certain my gr.grandfather was *not* the child of the alleged mother who passed away 2 years prior to his birth (during childbirth even!) it took me 10 years to finally discover gr.gr.granddad had a whole other family.

    In all, he fathered 9 kids with his legal wife and at least 4, but 7 suspected, kids with his ‘other woman’. I found my gr.granddad baptized (the date perfectly coinciding with his DoB) as the son of this woman’s husband. Except that man had passed away 6 years prior. She repeated this with several more of her kids.

    I suddenly had a heap of DNA matches that now made perfect sense, and so managed to figure out which kids were my gr.granddad’s actual full siblings.

    However..

    I have a 108cM match that theoretically, and according to the genealogical paper trail, is my HALF 3rd cousin once removed, and another 95cM match that is my half 3rd cousin. They descend from my gr.granddad’s half-siblings through his father’s marriage to his official wife. Both matches fall *just* within the realm of possibility according to the Shared cM Tool.

    I then have a 163cM match with a 2nd cousin once removed, a 51cM and a 42cM match with 3rd cousins, a 31cM and 45cM match with 3rd cousins once removed, and a 25 and 23 cM match with 4th cousins. All these matches go through my 2x gr.grandfather’s ‘other wife’, the mother of my gr.granddad.

    I find it remarkable that the descendants of the ‘official’ wife and my gr.granddad’s half-siblings relatively share more DNA with me than those of my real gr.grandmother do. And yes, I’ve gone over the official paper sources, the family trees and the im-/possibilities of the complex relationships within this family many times, and it all ends up pointing in the same direction.

    Very, very confusing.

  10. Saskia, much as we’d like to think of ‘DNA strength’ or ‘stickiness’, there is no ‘battle’ between segments. The process of crossing over and segregation is broadly-speaking entirely random (there are some very rare quirks, but exceptions prove the rule, right?). The phenomenon that you may be observing will have a lot more to do with where on the chomosome the matching segments are located. If they are in a region where crossovers are more frequent (see my earlier post), then on balance you will inherit fewer and shorter fragments over several generations. If they are in an area where crossovers are less frequent, then longer segments may persist for longer down the generations. It;s a lottery, depending on which strand your first ancestors after the MRCA inherited.

    Dominant vs recessive refers to gene function, not their ‘staying power’. The fact that we can do DNA genealogy at all is due to the fact that 95% of our DNA is non-coding junk and so shopping it up and rearranging it makes no difference to the functioning of the organism. If by chance the DNA gets rearranged in the middle of a gene, then you have a problem and most often the resulting cell/embryo is not viable, so if you are a healthy grown-up, it is most likely that any crossovers happened in areas and in a way that no genes (dominant or recessive) were impacted.

  11. If fraternal twins share a female cousin (Val-age 67) match on her father’s side, why is there such a huge discrepancy in the twins cM. Everyone is different but
    93 yr old twin male (Uncle John) : 233cM and his sons are also a match -Son 1: 157 cM son 2: 150 cM .
    93 yr old twin female: 154 cM – is my mom, -I have only 48 cM

    I find it odd that my uncle shares so much of the paternal DNA and my mom has 78 cM less. Even my uncles sons are closer with 150 and 157 cM and I have even less.

    1. We inherit exactly half of our autosomal DNA from each parent, but we don’t necessarily inherit exactly 25% of our DNA from each grandparent, thanks to the randomness of crossing over. In your family’s case, Uncle John inherited more of the DNA segments that your grandparent shared with Val than your mom did. An analogy might help. Think of your grandparent’s shared DNA with Val as 10 coins that get flipped to decide how much is passed on to each child. Your uncle got more “heads” than your mom did, just by chance.

  12. An example of dominant/recessive genes is eye color. Genes for lighter color produce less melatonin, while genes for darker color produce more melatonin. Therefore dark genes override light genes.

    There turn out to be many sources of eye color, so this is a big simplification. The actual shade, for example, is controlled by other genes.

    The inheritance issue you mention is probably the flip of the coin. SOMEONE has to be at the leading and trailing edges of the probability curve.

  13. A 2-3rd cousin 202 cM shares match with 3-4th cousin 65 cM and both share match with 4-6th cousin 36 cM what is the family connection to me..

    I’m finding the centimorgan relationships confusing – any help would be greatly appreciated

    1. Those estimates (2-3C, 3-4C) are just estimates based on the total amount of shared DNA. To figure out the actual relationship, you’ll need to build back your tree and the trees of your matches to see where they intersect.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.