MyHeritage Has Nearly 700,000 in Their Database!

I received word today from Yaniv Erlich, the Chief Science Officer at MyHeritage, that their database contains 670,000 individuals! That’s remarkable growth since the launch of their testing service a year ago in November 2016 and cause for me, once again, to update the Autosomal DNA Growth Chart.

Here’s the latest, greatest version:

 

Dr. Erlich tells me that the majority of the individuals in their database tested directly with MyHeritage (as opposed to doing the free transfer into the database from another testing company) and that most are from the United States, although sales in Europe are strong.

 

Wait, But …

I know what you’re thinking.

If you are in MyHeritage‘s database, you almost certainly have far fewer matches there than at the other companies.  Like, an order of magnitude fewer.  So many fewer, in fact, that it’s hard to believe that MyHeritage‘s database is as large as it is.

For example, this table shows how many matches I have at MyHeritage, 23andMe, Family Tree DNA, and AncestryDNA as of 13 November, 2017. It also shows the most recently reported database sizes and the percentage of the database that matches me.

Company # Matches Database Size % of Database Matching Me
MyHeritage 194 670,000 0.03%
23andMe 1,657 3,000,000 0.06%
Family Tree DNA 2,051 550,000 0.37%
AncestryDNA 17,974 6,000,000 0.30%

 

If MyHeritage‘s database really contains almost 700,000 individuals, why do I match so few of them?  Put numerically, why do I match 0.3% of the database at AncestryDNA and 0.37% at FTDNA but only 0.03% at MyHeritage? (The percentage at 23andMe is not a reliable estimator because that company puts an artificial cap on the number of matches you can have.)

My guess is that the matching algorithm at MyHeritage is causing the difference. For one thing, my most distant match there shares only 15.3 cM total, whereas my most distant matches at AncestryDNA share 6 cM. If I count only my matches at AncestryDNA who share 15 cM or more with me, I match 0.04% of the database (2,696 of 6 million) rather than 0.30%. That’s spittin’ range.

Another factor is that the MyHeritage matching algorithm is still a work in progress. I reviewed it here, and I advised caution with their relationship estimates. I have matches there who don’t match either of my parents and/or who don’t match me at the other companies, despite being in those other databases. MyHeritage also failed to find eight matching segments with my first cousin once removed, so presumably they are failing to match me entirely to people who might really share measurable DNA with me.

These two aspects of the matching algorithm at MyHeritage—a high minimum threshold and unreliable matching overall—are probably why most of us have so few matches at MyHeritage relative to the true database size.

The good news is that Dr. Erlich tells me they’re actively working on a new, improved matching algorithm. All of the existing customers will be reanalyzed with the new method when it’s ready, so we can expect big changes to our match lists. A member of the MyHeritage science team will be doing a talk about their new matching pipeline, as well as other features, at the Institute for Genetic Genealogy (I4GG) conference the weekend of December 9, 2017. (Yours truly will be speaking there, as well.)

No word as of yet about when the matching changes will be implemented.

 

20 thoughts on “MyHeritage Has Nearly 700,000 in Their Database!”

  1. My smallest match MyHeritage is 12.1cM in 1 segment, and 4 of my total of 42 matches are smaller than your smallest. Their algorithm clearly goes to lower values than you have experienced.

    My matches are Ancestry 11095 (0.18%), FTDNA 1651 (0.30%), MyHeritage 42 (0.006%) but my 850 Ancestry matches larger than my smallest MyHeritage match are 0.014% of the Ancestry database. Perhaps my figures are more divergent than yours because I’m not from the USA.

    1. Thanks for the info, Andrew. I have largest segments as small as 9.6 cM at MyHeritage but no matches with a total less than 15.

      1. My smallest largest segment is 7.9 cM in a match with 21.7 cM over 3 segments. I also have one at 8.0 cM, but no others below 10 cM.

  2. Out of all the places in the world they do not allow DNA testing in Alaska! This is ridiculous as all the other sites do.

  3. I have 880 matches at FTDNA with a longest block of at least 15 cm. At My Heritage, I have 406 matches. I’m sorry, but this doesn’t add up at all. Either FTDNA has far more than 550,000 people in their database or My Heritage has far less than 730,000 people in theirs.

    As for My Heritage’s matching problems, they seem to add a lot of phantom matches, but they don’t miss much. That should mean that their database is actually smaller than it appears, not larger.

    1. I just checked- I have many My Heritage matches with a longest block of less than 13 cm matches (406 total). I have 1,058 FTDNA matches with a longest block of 13 or more cm.

      FTDNA almost certainly has more than twice as many kits in their database than My Heritage.

      1. You could only conclude that if (a) their customers were all from the same family groups and (b) they had the same matching algorithms.

    2. We do not have the data to determine whether they’re adding phantom matches or missing matches for our parents. We do know that their matching algorithm is flawed, so that’s the most reasonable explanation for why we have so few matches there relative to their database size.

  4. Matches on FTDNA vs. MH: Me: 1196 vs. 84, Dad: 1043 vs. 134, Mom: 980 vs. 73. No way that FTDNA has only 550.000 Familyfinder and Myheritage has nearly 700.000 (sold or tested?) I also get some curious results that I share more CM with my dad than mom and other strange matches, I wouldnt trust theyr algorithms..

    1. We know that MyHeritage’s matching algorithm is faulty, so using the relative number of matches at each site is not a valid way to compare. I was also surprised that MyHeritage’s database is so large (tested, not just sold).

      Re FTDNA, Tim Janzen and I have used two different methods to estimate their database size, we did our estimates independently, and we strived to be objective. In the absence of official numbers from FTDNA, these two estimates are the best information we have.

    1. Agreed. The Alaska restriction is an unfortunate consequence of a lawsuit there against another of the genetic genealogy companies.

  5. Ancestry got 5 million though, so kinda hard for MyHeritage to become that big in the end. Hopefully Ancestry and MyHertage will make a joint venture and exchange data 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.