MyHeritage has exploded onto the genetic genealogy scene, growing from scratch to a database of more than one million people in just over a year. They’ve done this by both allowing free transfers of DNA data files from AncestryDNA, Family Tree DNA, and 23andMe as well as selling their own test kit at competitive prices.
However, their matching system (the computer code they use to find relatives) has had a number of flaws, which I reviewed in July. Specifically, the amount of shared DNA estimated by MyHeritage differed substantially from comparisons to the same relatives at other sites, and fully 60% of my matches at the time of that review did not match either of my parents, indicating either false positives for me or false negatives for my parents.
Subsequently, MyHeritage announced milestones for the size of their database that were not reflected in the small number of DNA matches their customers had. This suggested another problem: that they were failing to match large numbers of people who do, in fact, share measurable DNA.
Changes to the Matching Algorithm
The MyHeritage blog nicely summarizes the changes and explains some of the science behind them. This diagram from their blog shows the “pipeline” through which they process DNA data. I’ve added the red check marks to show which steps have been revised.
Their blog does an excellent job of describing these steps and the improvements, so I will refer you to that article for details. Briefly, imputation assists comparisons between people who took different versions of the DNA test, and phasing helps to assign which genetic data was maternally inherited and which paternally. Both imputation and phasing are done by comparing to reference genomes, and much of the new improvements are due to MyHeritage increasing the size of that reference database more than 10-fold.
Matching was enhanced by optimizing how small mismatches (which could be real genetic differences or simply errors in reading the data) are treated and by lowering the matching threshold from 12 cM to 8 cM. With better phasing and matching, these smaller segments are more likely to be real identical-by-descent DNA as opposed to false positives. Stitching is simply a way to correct for artifacts that happen during phasing, and the classifiers are confidence levels (high, medium, low) to guide our research efforts.
Taken together, these changes should both increase the number of matches we have at MyHeritage and make them more accurate.
How Does the New System Stack Up?
On December 26, 2017, I had 221 matches at MyHeritage, my mother had 385, and my father had 23. Shortly after the overhaul, we had 2228, 3160, and 880, respectively. Quite a change! That’s about 10 times more matches for us, in line with what MyHeritage is reporting for their users overall. The new totals also make more sense for a database of more than a million people.
One way I evaluated the matching system back in July was to compare how much DNA I share with two known relatives, a 1C1R and a 2C1R, who happen to be in all of the major databases. The numbers for my 2C1R at MyHeritage were in line with the other databases but at the time, MyHeritage substantially underestimated how much DNA and how many segments I share with my 1C1R.
As you can see from the table above, that no longer appears to be a problem. The matching data for my 1C1R is now in line with the other sites.
I also examined how many of my DNA relatives didn’t match my parents. Fully 60% of my matches at MyHeritage in July didn’t match either of them, meaning they were either false positives for me or false negatives for my parents. I used a different estimation method this time because we now have too many matches for me to examine each one. Instead, I took my total number of matches (2228) and subtracted the numbers shared with my parents (1534 shared with my mother + 283 shared with my father = 1817) to get 2228 – 1817 = 411, or 18.4%. That’s much more in line with the false match rates at the other companies. (Hat tip to Paula Williams for alerting me to this nifty workaround.)
One match in particular, Arlene, previously shared 85.0 cM with me, with a largest segment of 19.6 cM, yet she didn’t match my parents at all in July. Another match, Philip, demonstrated that even largest segment size was not previously reliable; he shared a largest segment of 45.7 cM (78.5 cM total) with me but nothing with my parents. After the overhaul, Arlene shares 32.7 cM with me (largest 17.5 cM) and 41.8 cM (largest 20.9 cM) with my mother. Philip shares 50.9 cM (largest 44.3 cM) with me and 64 cM (largest 43.8 cM) with my mother. (The slight discrepancy in Philip’s largest segment size not is not something I’m concerned about.)
In summary, I think MyHeritage got it right this time. I’m sure there will be exceptions, of course, but I am heartened that they’re committed to improving their system.
Another exciting addition to MyHeritage is the chromosome browser. It’s a bit hidden though. To find it, click the purple Review DNA Match for someone in your DNA Matches, then scroll down. (If you’re on a mobile device, see the end of this post for instructions on how to access the chromosome browser.)
Segment data can even be downloaded. And if you are not comfortable sharing segment data with your matches, you can opt out of this feature in the privacy settings.
23andMe finds three segments that MyHeritage doesn’t, on chr 2 (5.03 cM), chr 8 (8.75 cM), and chr 18 (5.43 cM), while MyHeritage finds one on chr 15 (6.19 cM) that 23andMe doesn’t. GEDmatch (not shown) finds all four of those segments plus one other when I use a threshold of 5 cM.
I interpret this comparison as an indicator that MyHeritage is doing a pretty good job at assigning matching segments; the main discrepancies are with small segments that we are likely to ignore anyway. The one exception is the 8.75-cM segment on chr 8 that MyHeritage misses, and I’ll be interested to see what happens with it as MyHeritage continues to refine their algorithms.
Here are some of the new things on the horizon at MyHeritage:
- A database that continues to grow rapidly in size.
- Ongoing improvements to the matching algorithm.
- The ability to compare three or more people simultaneously in the chromosome browser. (Currently, you can only compare yourself to one match at a time.)
- A print feature for chromosome browser segments.
- Matching for DNA tests done on the GSA chip used by Living DNA and the most recent version of 23andMe. (These test versions are not currently supported at MyHeritage.)
- Updated ethnicity estimates.
How to Get in on the Action
If you are not yet in the MyHeritage database but would like to be, you can purchase your own test kit here, or you can use this link to do a free transfer of autosomal DNA results from AncestryDNA, Family Tree DNA, and 23andMe (versions prior to v5).
Accessing the Chromosome Browser from a Mobile Device
UPDATE: The chromosome browser should not be visible on mobile devices even without this work-around. I’m leaving the instructions here in case anyone needs them.
From a mobile device, the chromosome browser isn’t readily apparently in an internet browser. Once again, Paula Williams comes to our rescue: To view the chromosome browser, to go your device’s internet browser’s settings and check “Desktop Site” or “Request Desktop Site” (or similar).
In Safari on an Apple device, click the “forward” button, then select “Request Desktop Site” from the options.
The chromosome browser should now be visible when you view one of your DNA matches. This method has been tried on an Android phone, an Android tablet, and an iPad. Please leave a comment if it works for you, too!