MyHeritage Overhauls Their Matching Algorithm

ANNOUNCEMENT: MyHeritage is running a Winter Sale from January 16 through January 22, 2018. During the sale, DNA tests will sell for $59 / €69 / £69, plus shipping.


MyHeritage has exploded onto the genetic genealogy scene, growing from scratch to a database of more than one million people in just over a year. They’ve done this by both allowing free transfers of DNA data files from AncestryDNA, Family Tree DNA, and 23andMe as well as selling their own test kit at competitive prices.

 

 

However, their matching system (the computer code they use to find relatives) has had a number of flaws, which I reviewed in July. Specifically, the amount of shared DNA estimated by MyHeritage differed substantially from comparisons to the same relatives at other sites, and fully 60% of my matches at the time of that review did not match either of my parents, indicating either false positives for me or false negatives for my parents.

Subsequently, MyHeritage announced milestones for the size of their database that were not reflected in the small number of DNA matches their customers had. This suggested another problem: that they were failing to match large numbers of people who do, in fact, share measurable DNA.

MyHeritage was aware of the problems and promised a major overhaul of their matching system during the recent i4GG conference in San Diego. On 11 January, 2018, they quietly rolled out those changes.

 

Changes to the Matching Algorithm

The MyHeritage blog nicely summarizes the changes and explains some of the science behind them. This diagram from their blog shows the “pipeline” through which they process DNA data.  I’ve added the red check marks to show which steps have been revised.

Their blog does an excellent job of describing these steps and the improvements, so I will refer you to that article for details. Briefly, imputation assists comparisons between people who took different versions of the DNA test, and phasing helps to assign which genetic data was maternally inherited and which paternally. Both imputation and phasing are done by comparing to reference genomes, and much of the new improvements are due to MyHeritage increasing the size of that reference database more than 10-fold.

Matching was enhanced by optimizing how small mismatches (which could be real genetic differences or simply errors in reading the data) are treated and by lowering the matching threshold from 12 cM to 8 cM. With better phasing and matching, these smaller segments are more likely to be real identical-by-descent DNA as opposed to false positives. Stitching is simply a way to correct for artifacts that happen during phasing, and the classifiers are confidence levels (high, medium, low) to guide our research efforts.

Taken together, these changes should both increase the number of matches we have at MyHeritage and make them more accurate.

 

How Does the New System Stack Up?

On December 26, 2017, I had 221 matches at MyHeritage, my mother had 385, and my father had 23. Shortly after the overhaul, we had 2228, 3160, and 880, respectively. Quite a change! That’s about 10 times more matches for us, in line with what MyHeritage is reporting for their users overall. The new totals also make more sense for a database of more than a million people.

One way I evaluated the matching system back in July was to compare how much DNA I share with two known relatives, a 1C1R and a 2C1R, who happen to be in all of the major databases. The numbers for my 2C1R at MyHeritage were in line with the other databases but at the time, MyHeritage substantially underestimated how much DNA and how many segments I share with my 1C1R.

As you can see from the table above, that no longer appears to be a problem. The matching data for my 1C1R is now in line with the other sites.

I also examined how many of my DNA relatives didn’t match my parents. Fully 60% of my matches at MyHeritage in July didn’t match either of them, meaning they were either false positives for me or false negatives for my parents. I used a different estimation method this time because we now have too many matches for me to examine each one. Instead, I took my total number of matches (2228) and subtracted the numbers shared with my parents (1534 shared with my mother + 283 shared with my father = 1817) to get 2228 – 1817 = 411, or 18.4%. That’s much more in line with the false match rates at the other companies. (Hat tip to Paula Williams for alerting me to this nifty workaround.)

One match in particular, Arlene, previously shared 85.0 cM with me, with a largest segment of 19.6 cM, yet she didn’t match my parents at all in July. Another match, Philip, demonstrated that even largest segment size was not previously reliable; he shared a largest segment of 45.7 cM (78.5 cM total) with me but nothing with my parents. After the overhaul, Arlene shares 32.7 cM with me (largest 17.5 cM) and 41.8 cM (largest 20.9 cM) with my mother. Philip shares 50.9 cM (largest 44.3 cM) with me and 64 cM (largest 43.8 cM) with my mother. (The slight discrepancy in Philip’s largest segment size not is not something I’m concerned about.)

In summary, I think MyHeritage got it right this time. I’m sure there will be exceptions, of course, but I am heartened that they’re committed to improving their system.

 

Chromosome Browser

Another exciting addition to MyHeritage is the chromosome browser. It’s a bit hidden though. To find it, click the purple Review DNA Match for someone in your DNA Matches, then scroll down. (If you’re on a mobile device, see the end of this post for instructions on how to access the chromosome browser.)

Segment data can even be downloaded. And if you are not comfortable sharing segment data with your matches, you can opt out of this feature in the privacy settings.

Here is a side-by-side comparison of the chromosome browser segments shared between me and my 1C1R at MyHeritage (left) and 23andMe (right). As you can see, they’re very similar.

 

23andMe finds three segments that MyHeritage doesn’t, on chr 2 (5.03 cM), chr 8 (8.75 cM), and chr 18 (5.43 cM), while MyHeritage finds one on chr 15 (6.19 cM) that 23andMe doesn’t. GEDmatch (not shown) finds all four of those segments plus one other when I use a threshold of 5 cM.

I interpret this comparison as an indicator that MyHeritage is doing a pretty good job at assigning matching segments; the main discrepancies are with small segments that we are likely to ignore anyway. The one exception is the 8.75-cM segment on chr 8 that MyHeritage misses, and I’ll be interested to see what happens with it as MyHeritage continues to refine their algorithms.

 

Coming Features

Here are some of the new things on the horizon at MyHeritage:

  • A database that continues to grow rapidly in size.
  • Ongoing improvements to the matching algorithm.
  • The ability to compare three or more people simultaneously in the chromosome browser. (Currently, you can only compare yourself to one match at a time.)
  • A print feature for chromosome browser segments.
  • Matching for DNA tests done on the GSA chip used by Living DNA and the most recent version of 23andMe. (These test versions are not currently supported at MyHeritage.)
  • Updated ethnicity estimates.

 

How to Get in on the Action

If you are not yet in the MyHeritage database but would like to be, you can purchase your own test kit here, or you can use this link to do a free transfer of autosomal DNA results from AncestryDNA, Family Tree DNA, and 23andMe (versions prior to v5).

 

Accessing the Chromosome Browser from a Mobile Device

UPDATE: The chromosome browser should not be visible on mobile devices even without this work-around. I’m leaving the instructions here in case anyone needs them.

From a mobile device, the chromosome browser isn’t readily apparently in an internet browser. Once again, Paula Williams comes to our rescue: To view the chromosome browser, to go your device’s internet browser’s settings and check “Desktop Site” or “Request Desktop Site” (or similar).

In Safari on an Apple device, click the “forward” button, then select “Request Desktop Site” from the options.

The chromosome browser should now be visible when you view one of your DNA matches. This method has been tried on an Android phone, an Android tablet, and an iPad. Please leave a comment if it works for you, too!

30 thoughts on “MyHeritage Overhauls Their Matching Algorithm”

  1. My Matches went from about 460 to 3,633. This is great but much of this has to do with them lowering the requirements. Previously, a match had to have a single matching block of 12 cm or or 2 matching blocks of at least 8 cm (I think that’s right). Now they’re including people with a matching block of 8 cm or 2 matching blocks of 6 cm. Obviously, that will mean a lot more matches. At least 20% of my matches don’t match my parents, isn’t that number awfully high?

    Also, I have 3,634 matches on Family Tree DNA, and yet we’re supposed to believe that My Heritage has nearly twice as many people in their database? That doesn’t add up.

    1. 20% is roughly in keeping with the other companies.

      It’s hard to compare to FTDNA because FTDNA doesn’t phase. Phasing will reduce the number of matches. I wish FTDNA would just tell us how big their database is.

  2. And it looks like maybe 10-12% of my matches at FTDNA don’t match with either of my parents. That’s higher than I expected, but still quite a bit better than My Heritage.

  3. One final thought. I now have a number of My Heritage matches from Sweden, Norway, Finland and the Netherlands. This is likely correct, but these matches almost certainly share a common ancestor with me who lived in the early 1600’s. (dutch ancestry from early New York and the rest from New Sweden Colony). This would make them no closer than 10th-12th cousins, but they are typically listed as 3rd-5th cousins or even closer. Clearly My Heritage has more work to do.

        1. I would not prioritize the low confidence matches. Remember that, because of the random inheritance of DNA segments, there’s no way to tell the difference between a 4th cousin and a 10th cousin using DNA alone. This isn’t a problem specific to MyHeritage; it’s simply not possible.

          1. I suppose this is true enough. It’s just that when My Heritage required 12cm to consider someone a match, they had a 4th-6th cousin category. Now that they’ve lowered the requirements, they have 3rd-5th cousin as their most distant category…. strange.

            Other companies put distant matches in a “5th cousin/remote cousin” or “5th-8th cousin” category (which seems more accurate”.

  4. I am on an Android & did the above mentioned workaround to see the chromosome browser, & it were worked.
    I went from IIRC 36 matches to 609. Two of them definite 4th-?th cousins who are not elsewhere, one of them has surname which is one of my ancestral surnames which is a 1st for a match to carry. Pretty happy about the new matches.

  5. Is there a way to ask My Heritage to use a maiden name default for the trees? this is a true genealogical standard. I am sure most people don’t know they can go to settings and correct it for their own trees. However, why should they need to do this? Using the married name is a cumbersome and outdated method.

    1. Yes. Click on your name up at the top of the page, then Site Settings, then Genealogy. On the following page, change the setting for Names of Married Women. And I agree that maiden names should be the default. When I first used MyHeritage, I was not pleased that they changed my surname to my husband’s, even though I never have.

      1. Yes funny thing is endogamy is not something I new I had! But yes ancestry for me is by far the most accurate. I would say my heritage is closer to ftdna as far as confidence in matches goes- I would have less matches if they used the ancestry method of phasing.

        1. MyHeritage is using the Ancestry method of phasing (their own implementation, but the same principle). What they’re not doing, though, is removing pile-ups the way Ancestry does with their Timber algorithm.

  6. Yeah I’m not buying the 832 3rd-5th cousins my heritage has given me. For one thing I have an ancestry with very high endogamy- which I presume is partially the reason for having matches that really through geography could not be related at any closeness except to be very distant in the past relations. I did check the chrom. browser segments- many fall on pile-up segments I am aware of through gedmatch. A select few of my closer matches could be confirmed- through family surnames- but many are in countries all together different than my documented tree which in my case would not make sense.

    1. Endogamy will affect your matches no matter where you test, and the more endogamy you have, the greater the effect will be. Only AncestryDNA has an algorithm to reduce pile-up regions, but of course they don’t provide a chromosome browser.

  7. Hi. Yes, I know how to change my own tree, but why is their default the “wrong” way? I don’t see a way to get this concern directly to My Heritage website managers

  8. Thanks! you have been very helpful. Now I seem to have many pile-up segments. Is this an outcome of having endogamy? Its the one frustrating area that I have.

    Alan

  9. Hi Leah, I agree that the new matching algorithm is a huge improvement. Still a few problems though. I’m managing four kits and have noticed that a large proportion of all of their DNA matches include a 20 cM matching segment adjacent to the centromere on chromosome 15. Despite the length of the segments they invariably contain 500 or fewer SNPs. I’m certain that they are just IBC as the segments don’t appear when the same pair of people are compared at FTDNA or on GEDmatch. The extra 20 cM on these matches distorts the match lists, promoting some DNA matches much higher than they should be and making their estimated relationship appear much closer. I have observed quite a few matches sharing two segments: a small one around 6 cM plus the 20 cM adjacent to the centromere on chromosome 15, giving them a total of 26 matching cM. In reality they only share the 6 cM (although that could be IBC too!) and would otherwise not appear on the match lists at all. I’m wondering if anyone else is observing these pseudo-segments on chromosome 15?

      1. On MyHeritage, 14 of my father’s top 30 DNA matches contain this pseudo-segment on chromosome 15. The starting position is always 20004966 and there are five different end positions. The length and SNP counts for the five alternatives are: 14.65/256, 19.48/384, 20.1/512, 21.15/640, 21.68/768. Despite all 14 people sharing this segment with my father at the same location none of the 14 people match each other. This confirms that the segments are IBC but doesn’t it also imply that the matching is not based on phased data? My mother’s results are very similar with 16 of her top 30 matches containing the pseudo-segment on chromosome 15. I can’t find any of these matches on GEDmatch today so I could have been mistaken about that. Most of them I wouldn’t expect to appear in my parents’ match lists because if you take out the false 20 cM they are only left with 6 or 7 cM of genuine matching and you need a lot more than that to get on my parents’ top 2000 lists.

        1. When you say they don’t match one another, is that based on comparing them at GEDmatch or on the Shared DNA Matches feature at MyHeritage?

          1. That’s on the shared matches feature at MyHeritage. I have found one pair of Mum’s DNA matches that also match each other but they both match Mum at the same location on another chromosome. Their total match length with each other was only 10 cM so doesn’t include the 20 cM on chr 15.

          2. Please consider bringing this to the attention of MyHeritage’s science team. The seem open to improving their algorithms.

  10. I have pile-ups on Chrom 15- mainly 37-47 & 79-85 roughly speaking. There also is the 20-30 range but I don’t get those. Pile-ups include people of Finnish ancestry and eastern europe.

Leave a Reply

Your email address will not be published. Required fields are marked *