MyHeritage Has Nearly 700,000 in Their Database!

November 14, 2017 thednageek 20d Comments

I received word today from Yaniv Erlich, the Chief Science Officer at MyHeritage, that their database contains 670,000 individuals! That’s remarkable growth since the launch of their testing service a year ago in November 2016 and cause for me, once again, to update the Autosomal DNA Growth Chart.

Here’s the latest, greatest version:

Dr. Erlich tells me that the majority of the individuals in their database tested directly with MyHeritage (as opposed to doing the free transfer into the database from another testing company) and that most are from the United States, although sales in Europe are strong.

Wait, But …

I know what you’re thinking.

If you are in MyHeritage‘s database, you almost certainly have far fewer matches there than at the other companies. Like, an order of magnitude fewer. So many fewer, in fact, that it’s hard to believe that MyHeritage‘s database is as large as it is.

For example, this table shows how many matches I have at MyHeritage, 23andMe, Family Tree DNA, and AncestryDNA as of 13 November, 2017. It also shows the most recently reported database sizes and the percentage of the database that matches me.

Company	# Matches	Database Size	% of Database Matching Me
MyHeritage	194	670,000	0.03%
23andMe	1,657	3,000,000	0.06%
Family Tree DNA	2,051	550,000	0.37%
AncestryDNA	17,974	6,000,000	0.30%

If MyHeritage‘s database really contains almost 700,000 individuals, why do I match so few of them? Put numerically, why do I match 0.3% of the database at AncestryDNA and 0.37% at FTDNA but only 0.03% at MyHeritage? (The percentage at 23andMe is not a reliable estimator because that company puts an artificial cap on the number of matches you can have.)

My guess is that the matching algorithm at MyHeritage is causing the difference. For one thing, my most distant match there shares only 15.3 cM total, whereas my most distant matches at AncestryDNA share 6 cM. If I count only my matches at AncestryDNA who share 15 cM or more with me, I match 0.04% of the database (2,696 of 6 million) rather than 0.30%. That’s spittin’ range.

Another factor is that the MyHeritage matching algorithm is still a work in progress. I reviewed it here, and I advised caution with their relationship estimates. I have matches there who don’t match either of my parents and/or who don’t match me at the other companies, despite being in those other databases. MyHeritage also failed to find eight matching segments with my first cousin once removed, so presumably they are failing to match me entirely to people who might really share measurable DNA with me.

These two aspects of the matching algorithm at MyHeritage—a high minimum threshold and unreliable matching overall—are probably why most of us have so few matches at MyHeritage relative to the true database size.

The good news is that Dr. Erlich tells me they’re actively working on a new, improved matching algorithm. All of the existing customers will be reanalyzed with the new method when it’s ready, so we can expect big changes to our match lists. A member of the MyHeritage science team will be doing a talk about their new matching pipeline, as well as other features, at the Institute for Genetic Genealogy (I4GG) conference the weekend of December 9, 2017. (Yours truly will be speaking there, as well.)

No word as of yet about when the matching changes will be implemented.

Share on Facebook

20 thoughts on “MyHeritage Has Nearly 700,000 in Their Database!”

Andrew Millard says:

November 15, 2017 at 5:49 am

My smallest match MyHeritage is 12.1cM in 1 segment, and 4 of my total of 42 matches are smaller than your smallest. Their algorithm clearly goes to lower values than you have experienced.

My matches are Ancestry 11095 (0.18%), FTDNA 1651 (0.30%), MyHeritage 42 (0.006%) but my 850 Ancestry matches larger than my smallest MyHeritage match are 0.014% of the Ancestry database. Perhaps my figures are more divergent than yours because I’m not from the USA.

Reply
1. thednageek says:
  
  November 15, 2017 at 10:14 am
  
  Thanks for the info, Andrew. I have largest segments as small as 9.6 cM at MyHeritage but no matches with a total less than 15.
  
  Reply
  1. Andrew Millard says:
    
    November 16, 2017 at 11:55 am
    
    My smallest largest segment is 7.9 cM in a match with 21.7 cM over 3 segments. I also have one at 8.0 cM, but no others below 10 cM.
    
    Reply
Shelley says:

November 16, 2017 at 5:36 am

Out of all the places in the world they do not allow DNA testing in Alaska! This is ridiculous as all the other sites do.

Reply
1. thednageek says:
  
  November 16, 2017 at 9:11 am
  
  The state of Alaska has a genetic privacy statute that resulted in a lawsuit against Family Tree DNA. I imagine that MyHeritage will stay out of the state until the issue is resolved.
  https://www.genomicslawreport.com/index.php/2017/07/18/a-constitutional-challenge-to-alaskas-genetic-privacy-statute/
  
  Reply
Frank Kelch III says:

November 26, 2017 at 7:13 am

I have 880 matches at FTDNA with a longest block of at least 15 cm. At My Heritage, I have 406 matches. I’m sorry, but this doesn’t add up at all. Either FTDNA has far more than 550,000 people in their database or My Heritage has far less than 730,000 people in theirs.

As for My Heritage’s matching problems, they seem to add a lot of phantom matches, but they don’t miss much. That should mean that their database is actually smaller than it appears, not larger.

Reply
1. Frank Kelch III says:
  
  November 26, 2017 at 7:44 am
  
  I just checked- I have many My Heritage matches with a longest block of less than 13 cm matches (406 total). I have 1,058 FTDNA matches with a longest block of 13 or more cm.
  
  FTDNA almost certainly has more than twice as many kits in their database than My Heritage.
  
  Reply
  1. thednageek says:
    
    November 26, 2017 at 10:04 am
    
    You could only conclude that if (a) their customers were all from the same family groups and (b) they had the same matching algorithms.
    
    Reply
2. thednageek says:
  
  November 26, 2017 at 10:01 am
  
  We do not have the data to determine whether they’re adding phantom matches or missing matches for our parents. We do know that their matching algorithm is flawed, so that’s the most reasonable explanation for why we have so few matches there relative to their database size.
  
  Reply
David Widerberg Howden says:

November 26, 2017 at 12:35 pm

Matches on FTDNA vs. MH: Me: 1196 vs. 84, Dad: 1043 vs. 134, Mom: 980 vs. 73. No way that FTDNA has only 550.000 Familyfinder and Myheritage has nearly 700.000 (sold or tested?) I also get some curious results that I share more CM with my dad than mom and other strange matches, I wouldnt trust theyr algorithms..

Reply
1. David Widerberg Howden says:
  
  November 26, 2017 at 1:38 pm
  
  Me: Ancestry/FTDNA/MH: 8263/1196/84
  
  Reply
2. thednageek says:
  
  November 26, 2017 at 3:23 pm
  
  We know that MyHeritage’s matching algorithm is faulty, so using the relative number of matches at each site is not a valid way to compare. I was also surprised that MyHeritage’s database is so large (tested, not just sold).
  
  Re FTDNA, Tim Janzen and I have used two different methods to estimate their database size, we did our estimates independently, and we strived to be objective. In the absence of official numbers from FTDNA, these two estimates are the best information we have.
  
  Reply
Shelley says:

January 12, 2018 at 8:20 am

Can’t wait for their legal team to allow customers in Alaska to do this.

Reply
1. thednageek says:
  
  January 14, 2018 at 12:11 pm
  
  Agreed. The Alaska restriction is an unfortunate consequence of a lawsuit there against another of the genetic genealogy companies.
  
  Reply
Pingback: MyHeritage Overhauls Their Matching Algorithm – The DNA Geek
aces says:

February 5, 2018 at 2:13 am

I have nearly 4,000 matches on MyHeritage.

Reply
Malten says:

February 5, 2018 at 6:35 am

Ancestry got 5 million though, so kinda hard for MyHeritage to become that big in the end. Hopefully Ancestry and MyHertage will make a joint venture and exchange data 🙂

Reply
1. thednageek says:
  
  February 5, 2018 at 8:16 am
  
  Competition is good for the industry!
  
  Reply
Pingback: MyHeritage Matching – The DNA Geek
Pingback: My Heritage DNA vs. Ancestry DNA - Who are You Made Of?