MyHeritage Overhauls Their Matching Algorithm

MyHeritage has exploded onto the genetic genealogy scene, growing from scratch to a database of more than one million people in just over a year. They’ve done this by both allowing free transfers of DNA data files from AncestryDNA, Family Tree DNA, and 23andMe as well as selling their own test kit at competitive prices.

 

 

However, their matching system (the computer code they use to find relatives) has had a number of flaws, which I reviewed in July. Specifically, the amount of shared DNA estimated by MyHeritage differed substantially from comparisons to the same relatives at other sites, and fully 60% of my matches at the time of that review did not match either of my parents, indicating either false positives for me or false negatives for my parents.

Subsequently, MyHeritage announced milestones for the size of their database that were not reflected in the small number of DNA matches their customers had. This suggested another problem: that they were failing to match large numbers of people who do, in fact, share measurable DNA.

MyHeritage was aware of the problems and promised a major overhaul of their matching system during the recent i4GG conference in San Diego. On 11 January, 2018, they quietly rolled out those changes.

 

Changes to the Matching Algorithm

The MyHeritage blog nicely summarizes the changes and explains some of the science behind them. This diagram from their blog shows the “pipeline” through which they process DNA data.  I’ve added the red check marks to show which steps have been revised.

Their blog does an excellent job of describing these steps and the improvements, so I will refer you to that article for details. Briefly, imputation assists comparisons between people who took different versions of the DNA test, and phasing helps to assign which genetic data was maternally inherited and which paternally. Both imputation and phasing are done by comparing to reference genomes, and much of the new improvements are due to MyHeritage increasing the size of that reference database more than 10-fold.

Matching was enhanced by optimizing how small mismatches (which could be real genetic differences or simply errors in reading the data) are treated and by lowering the matching threshold from 12 cM to 8 cM. With better phasing and matching, these smaller segments are more likely to be real identical-by-descent DNA as opposed to false positives. Stitching is simply a way to correct for artifacts that happen during phasing, and the classifiers are confidence levels (high, medium, low) to guide our research efforts.

Taken together, these changes should both increase the number of matches we have at MyHeritage and make them more accurate.

 

How Does the New System Stack Up?

On December 26, 2017, I had 221 matches at MyHeritage, my mother had 385, and my father had 23. Shortly after the overhaul, we had 2228, 3160, and 880, respectively. Quite a change! That’s about 10 times more matches for us, in line with what MyHeritage is reporting for their users overall. The new totals also make more sense for a database of more than a million people.

One way I evaluated the matching system back in July was to compare how much DNA I share with two known relatives, a 1C1R and a 2C1R, who happen to be in all of the major databases. The numbers for my 2C1R at MyHeritage were in line with the other databases but at the time, MyHeritage substantially underestimated how much DNA and how many segments I share with my 1C1R.

As you can see from the table above, that no longer appears to be a problem. The matching data for my 1C1R is now in line with the other sites.

I also examined how many of my DNA relatives didn’t match my parents. Fully 60% of my matches at MyHeritage in July didn’t match either of them, meaning they were either false positives for me or false negatives for my parents. I used a different estimation method this time because we now have too many matches for me to examine each one. Instead, I took my total number of matches (2228) and subtracted the numbers shared with my parents (1534 shared with my mother + 283 shared with my father = 1817) to get 2228 – 1817 = 411, or 18.4%. That’s much more in line with the false match rates at the other companies. (Hat tip to Paula Williams for alerting me to this nifty workaround.)

One match in particular, Arlene, previously shared 85.0 cM with me, with a largest segment of 19.6 cM, yet she didn’t match my parents at all in July. Another match, Philip, demonstrated that even largest segment size was not previously reliable; he shared a largest segment of 45.7 cM (78.5 cM total) with me but nothing with my parents. After the overhaul, Arlene shares 32.7 cM with me (largest 17.5 cM) and 41.8 cM (largest 20.9 cM) with my mother. Philip shares 50.9 cM (largest 44.3 cM) with me and 64 cM (largest 43.8 cM) with my mother. (The slight discrepancy in Philip’s largest segment size not is not something I’m concerned about.)

In summary, I think MyHeritage got it right this time. I’m sure there will be exceptions, of course, but I am heartened that they’re committed to improving their system.

 

Chromosome Browser

Another exciting addition to MyHeritage is the chromosome browser. It’s a bit hidden though. To find it, click the purple Review DNA Match for someone in your DNA Matches, then scroll down. (If you’re on a mobile device, see the end of this post for instructions on how to access the chromosome browser.)

Segment data can even be downloaded. And if you are not comfortable sharing segment data with your matches, you can opt out of this feature in the privacy settings.

Here is a side-by-side comparison of the chromosome browser segments shared between me and my 1C1R at MyHeritage (left) and 23andMe (right). As you can see, they’re very similar.

 

23andMe finds three segments that MyHeritage doesn’t, on chr 2 (5.03 cM), chr 8 (8.75 cM), and chr 18 (5.43 cM), while MyHeritage finds one on chr 15 (6.19 cM) that 23andMe doesn’t. GEDmatch (not shown) finds all four of those segments plus one other when I use a threshold of 5 cM.

I interpret this comparison as an indicator that MyHeritage is doing a pretty good job at assigning matching segments; the main discrepancies are with small segments that we are likely to ignore anyway. The one exception is the 8.75-cM segment on chr 8 that MyHeritage misses, and I’ll be interested to see what happens with it as MyHeritage continues to refine their algorithms.

 

Coming Features

Here are some of the new things on the horizon at MyHeritage:

  • A database that continues to grow rapidly in size.
  • Ongoing improvements to the matching algorithm.
  • The ability to compare three or more people simultaneously in the chromosome browser. (Currently, you can only compare yourself to one match at a time.)
  • A print feature for chromosome browser segments.
  • Matching for DNA tests done on the GSA chip used by Living DNA and the most recent version of 23andMe. (These test versions are not currently supported at MyHeritage.)
  • Updated ethnicity estimates.

 

How to Get in on the Action

If you are not yet in the MyHeritage database but would like to be, you can purchase your own test kit here, or you can use this link to do a free transfer of autosomal DNA results from AncestryDNA, Family Tree DNA, and 23andMe (versions prior to v5).

 

Accessing the Chromosome Browser from a Mobile Device

UPDATE: The chromosome browser should not be visible on mobile devices even without this work-around. I’m leaving the instructions here in case anyone needs them.

From a mobile device, the chromosome browser isn’t readily apparently in an internet browser. Once again, Paula Williams comes to our rescue: To view the chromosome browser, to go your device’s internet browser’s settings and check “Desktop Site” or “Request Desktop Site” (or similar).

In Safari on an Apple device, click the “forward” button, then select “Request Desktop Site” from the options.

The chromosome browser should now be visible when you view one of your DNA matches. This method has been tried on an Android phone, an Android tablet, and an iPad. Please leave a comment if it works for you, too!

35 thoughts on “MyHeritage Overhauls Their Matching Algorithm”

  1. My Matches went from about 460 to 3,633. This is great but much of this has to do with them lowering the requirements. Previously, a match had to have a single matching block of 12 cm or or 2 matching blocks of at least 8 cm (I think that’s right). Now they’re including people with a matching block of 8 cm or 2 matching blocks of 6 cm. Obviously, that will mean a lot more matches. At least 20% of my matches don’t match my parents, isn’t that number awfully high?

    Also, I have 3,634 matches on Family Tree DNA, and yet we’re supposed to believe that My Heritage has nearly twice as many people in their database? That doesn’t add up.

    1. 20% is roughly in keeping with the other companies.

      It’s hard to compare to FTDNA because FTDNA doesn’t phase. Phasing will reduce the number of matches. I wish FTDNA would just tell us how big their database is.

  2. And it looks like maybe 10-12% of my matches at FTDNA don’t match with either of my parents. That’s higher than I expected, but still quite a bit better than My Heritage.

  3. One final thought. I now have a number of My Heritage matches from Sweden, Norway, Finland and the Netherlands. This is likely correct, but these matches almost certainly share a common ancestor with me who lived in the early 1600’s. (dutch ancestry from early New York and the rest from New Sweden Colony). This would make them no closer than 10th-12th cousins, but they are typically listed as 3rd-5th cousins or even closer. Clearly My Heritage has more work to do.

        1. I would not prioritize the low confidence matches. Remember that, because of the random inheritance of DNA segments, there’s no way to tell the difference between a 4th cousin and a 10th cousin using DNA alone. This isn’t a problem specific to MyHeritage; it’s simply not possible.

        2. I suppose this is true enough. It’s just that when My Heritage required 12cm to consider someone a match, they had a 4th-6th cousin category. Now that they’ve lowered the requirements, they have 3rd-5th cousin as their most distant category…. strange.

          Other companies put distant matches in a “5th cousin/remote cousin” or “5th-8th cousin” category (which seems more accurate”.

  4. I am on an Android & did the above mentioned workaround to see the chromosome browser, & it were worked.
    I went from IIRC 36 matches to 609. Two of them definite 4th-?th cousins who are not elsewhere, one of them has surname which is one of my ancestral surnames which is a 1st for a match to carry. Pretty happy about the new matches.

  5. Is there a way to ask My Heritage to use a maiden name default for the trees? this is a true genealogical standard. I am sure most people don’t know they can go to settings and correct it for their own trees. However, why should they need to do this? Using the married name is a cumbersome and outdated method.

    1. Yes. Click on your name up at the top of the page, then Site Settings, then Genealogy. On the following page, change the setting for Names of Married Women. And I agree that maiden names should be the default. When I first used MyHeritage, I was not pleased that they changed my surname to my husband’s, even though I never have.

      1. Yes funny thing is endogamy is not something I new I had! But yes ancestry for me is by far the most accurate. I would say my heritage is closer to ftdna as far as confidence in matches goes- I would have less matches if they used the ancestry method of phasing.

        1. MyHeritage is using the Ancestry method of phasing (their own implementation, but the same principle). What they’re not doing, though, is removing pile-ups the way Ancestry does with their Timber algorithm.

  6. Yeah I’m not buying the 832 3rd-5th cousins my heritage has given me. For one thing I have an ancestry with very high endogamy- which I presume is partially the reason for having matches that really through geography could not be related at any closeness except to be very distant in the past relations. I did check the chrom. browser segments- many fall on pile-up segments I am aware of through gedmatch. A select few of my closer matches could be confirmed- through family surnames- but many are in countries all together different than my documented tree which in my case would not make sense.

    1. Endogamy will affect your matches no matter where you test, and the more endogamy you have, the greater the effect will be. Only AncestryDNA has an algorithm to reduce pile-up regions, but of course they don’t provide a chromosome browser.

  7. Hi. Yes, I know how to change my own tree, but why is their default the “wrong” way? I don’t see a way to get this concern directly to My Heritage website managers

  8. Thanks! you have been very helpful. Now I seem to have many pile-up segments. Is this an outcome of having endogamy? Its the one frustrating area that I have.

    Alan

  9. Hi Leah, I agree that the new matching algorithm is a huge improvement. Still a few problems though. I’m managing four kits and have noticed that a large proportion of all of their DNA matches include a 20 cM matching segment adjacent to the centromere on chromosome 15. Despite the length of the segments they invariably contain 500 or fewer SNPs. I’m certain that they are just IBC as the segments don’t appear when the same pair of people are compared at FTDNA or on GEDmatch. The extra 20 cM on these matches distorts the match lists, promoting some DNA matches much higher than they should be and making their estimated relationship appear much closer. I have observed quite a few matches sharing two segments: a small one around 6 cM plus the 20 cM adjacent to the centromere on chromosome 15, giving them a total of 26 matching cM. In reality they only share the 6 cM (although that could be IBC too!) and would otherwise not appear on the match lists at all. I’m wondering if anyone else is observing these pseudo-segments on chromosome 15?

      1. On MyHeritage, 14 of my father’s top 30 DNA matches contain this pseudo-segment on chromosome 15. The starting position is always 20004966 and there are five different end positions. The length and SNP counts for the five alternatives are: 14.65/256, 19.48/384, 20.1/512, 21.15/640, 21.68/768. Despite all 14 people sharing this segment with my father at the same location none of the 14 people match each other. This confirms that the segments are IBC but doesn’t it also imply that the matching is not based on phased data? My mother’s results are very similar with 16 of her top 30 matches containing the pseudo-segment on chromosome 15. I can’t find any of these matches on GEDmatch today so I could have been mistaken about that. Most of them I wouldn’t expect to appear in my parents’ match lists because if you take out the false 20 cM they are only left with 6 or 7 cM of genuine matching and you need a lot more than that to get on my parents’ top 2000 lists.

        1. When you say they don’t match one another, is that based on comparing them at GEDmatch or on the Shared DNA Matches feature at MyHeritage?

        2. That’s on the shared matches feature at MyHeritage. I have found one pair of Mum’s DNA matches that also match each other but they both match Mum at the same location on another chromosome. Their total match length with each other was only 10 cM so doesn’t include the 20 cM on chr 15.

        3. Please consider bringing this to the attention of MyHeritage’s science team. The seem open to improving their algorithms.

  10. I have pile-ups on Chrom 15- mainly 37-47 & 79-85 roughly speaking. There also is the 20-30 range but I don’t get those. Pile-ups include people of Finnish ancestry and eastern europe.

  11. Hi! I’m so glad I found this website and thank you so much for such a gift :). I hope you can help me with two things.

    I just got my results back from MyHeritage and I was a bit disappointed. I am first generation American (Latin and Middle Eastern parents). I was hoping to see a bit more granularity. 34% “Central America” and 31% “Greek” seemed a bit frustrating to me. Someone suggested I transfer to FTDNA.

    Others suggested I do a different test entirely. I did plan on another test for my mom while she’s still with us, so I wouldn’t be opposed to this, but do you think I stand to gain anything in the way of granularity with FTDNA?

    And what happens when you transfer anyway? Does that mean they won’t keep my report at MyHeritage? All my mom’s relatives are there so I want to be sure and I can’t seem the find the information. Thanks.

    1. Transferring won’t affect your results at MyHeritage; it just allows you to download your data then upload it into another company’s database without doing another test. Another database may not have the granularity you’re after, though, in part because the “reference panels” (groups of people known to have deep family roots in specific areas) they use aren’t large enough to capture the detail we sometimes want. There’s good news, though. All of the companies are actively working to improve their reference panels and ethnicity estimates. MyHeritage is expected to roll out an update to theirs soon.

      AncestryDNA has a feature called Genetic Communities or Migrations that might give you more precise information about where your ancestors came from. I found it surprisingly accurate for my mother’s side of the family, although I didn’t get anything at all on my father’s side, which is to say, it’s hit-or-miss whether you’ll get them. 23andMe has recently introduced a similar feature, but I haven’t had a chance to evaluate it yet. I wrote about Genetic Communities here: http://thednageek.com/genetic-communities-are-here/

      Perhaps the best way to learn about your family’s origins is through your matching relatives. Seeing where their families are from (via their trees) and connecting them to your own family tree can tell you a lot about your ancestors.

      It’s a great idea to test your parents (or their siblings) while you can. I recommend starting at AncestryDNA. You can transfer to FTDNA and MyHeritage from there. I keep an updated list of prices on my website. The next big sale should be around DNA Day (April 25). http://thednageek.com/dna-tests/

  12. Still some anomalies with Chromosome Browser. Let me explain. I recently used the DNA Match app for my Aunt’s dna which has been transferred to MyHeritage from FTDNA. Her strongest match (outside known family members) is to a woman showing these stats: Shared DNA: 3% (219.2‎ cM); Shared segments: 10; Largest segment: 95.7‎ cM– obviously a meaningful match. So I went to review the match on the Chromosome Browser. I added the match to an empty box and clicked ‘Compare’. The browser then shows me that all of the matching shown in the stats occurs in chromosomes 1-10 and there is zero matching in chromosomes 11-22. The matching segments range in size from 6.2 to 95.7 cMs. There are no segments between 2cM and 6.2cMs in size. Then I began to adjust the size of triangulated segments (the default number is 2cM). I went up to 4cMs: no change in number or size of triangulated segments. Then to 6cMs and still no change. And then to 8cMs and yes now there are only 7 segments according to the summary at the top of the page, but still the same 10 segments appearing in the painted browser.

    I’ve been browsing chromos for some time and these results strike me as being indicative of a problem somewhere. I don’t know if its the compatibility of the two test results (mine is from FTNDA and I cannot be certain about the other). There are several problems I sense here: 1) that there are no matches at all on 13 consecutive chromosomes even at the 2cM threshold, and 2) there are no matches at all sized between 2cM and 6.02cM. Looking over at the SNP sizes I see that there are no segments of less than 1,536.

    It could be that the reason for the result described here is that MyHeritage uses a very high SNP threshold, even for smaller cM segments. I don’t quite see the point of allowing the user to adjust cM minimums from 2 to 4 to 6 if these changes do not alter the app’s output. I suppose one fix would be to allow adjustment of the SNP size as well. But that still does not explain how chromosomes 1-9 could be matching at this level (3%, 10segs, 95.7large) and yet no matching at all on chromosomes 10-21. This just seems impossible, does it not?

    Anyhow, I just wanted to point out something that still looks a little flukey to me.

    1. What you’re describing is a display problem rather than a matching problem. At the very top of the CB window is a line that says “You and all of the selected DNA Matches share N triangulated segments”, where N is a number. When you’ve only selected one person to appear in the CB, that line shouldn’t appear at all, because triangulation isn’t possible with only two people. But, the number N is correct. In this case, it’s counting the number of matching (but not triangulated) segments that meet the threshold.

      I confirmed this by comparing myself to my mom’s cousin (323 cM shared with me). When the triangulation threshold is at 2 cM, N = 19, which is the total number of segments we share. When the threshold is 8 cM, N = 15, which is the number of segments we share greater than 8 cM. Because MyHeritage doesn’t report matching segments less than 6 cM, there’s no effect until you hit the 8 cM threshold. The visual isn’t affected at all.

      To address your two concerns: (1) The fact that you only match on chr 1–10 is luck of the draw, no different, really, than if you only matched on even numbered chromosomes, just more visually striking. (2) MyHeritage’s threshold for matching is 6 cM, so we don’t expect to see segments between 2 and 6 cM. They go down that far for triangulation because you could conceivable have two matches who both share 20 cM with you but who only overlap one another by 2 cM. (To be honest, I’d be dubious of that triangulation myself; I’d prefer they didn’t go down that far.)

      Try putting two matches that you know triangulate into the CB and play around with the thresholds. Hopefully, you’ll see what I mean.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.