I received word today from Yaniv Erlich, the Chief Science Officer at MyHeritage, that their database contains 670,000 individuals! That’s remarkable growth since the launch of their testing service a year ago in November 2016 and cause for me, once again, to update the Autosomal DNA Growth Chart.
Here’s the latest, greatest version:
Dr. Erlich tells me that the majority of the individuals in their database tested directly with MyHeritage (as opposed to doing the free transfer into the database from another testing company) and that most are from the United States, although sales in Europe are strong.
Wait, But …
I know what you’re thinking.
If you are in MyHeritage‘s database, you almost certainly have far fewer matches there than at the other companies. Like, an order of magnitude fewer. So many fewer, in fact, that it’s hard to believe that MyHeritage‘s database is as large as it is.
For example, this table shows how many matches I have at MyHeritage, 23andMe, Family Tree DNA, and AncestryDNA as of 13 November, 2017. It also shows the most recently reported database sizes and the percentage of the database that matches me.
|Company||# Matches||Database Size||% of Database Matching Me|
|Family Tree DNA||2,051||550,000||0.37%|
If MyHeritage‘s database really contains almost 700,000 individuals, why do I match so few of them? Put numerically, why do I match 0.3% of the database at AncestryDNA and 0.37% at FTDNA but only 0.03% at MyHeritage? (The percentage at 23andMe is not a reliable estimator because that company puts an artificial cap on the number of matches you can have.)
My guess is that the matching algorithm at MyHeritage is causing the difference. For one thing, my most distant match there shares only 15.3 cM total, whereas my most distant matches at AncestryDNA share 6 cM. If I count only my matches at AncestryDNA who share 15 cM or more with me, I match 0.04% of the database (2,696 of 6 million) rather than 0.30%. That’s spittin’ range.
Another factor is that the MyHeritage matching algorithm is still a work in progress. I reviewed it here, and I advised caution with their relationship estimates. I have matches there who don’t match either of my parents and/or who don’t match me at the other companies, despite being in those other databases. MyHeritage also failed to find eight matching segments with my first cousin once removed, so presumably they are failing to match me entirely to people who might really share measurable DNA with me.
These two aspects of the matching algorithm at MyHeritage—a high minimum threshold and unreliable matching overall—are probably why most of us have so few matches at MyHeritage relative to the true database size.
The good news is that Dr. Erlich tells me they’re actively working on a new, improved matching algorithm. All of the existing customers will be reanalyzed with the new method when it’s ready, so we can expect big changes to our match lists. A member of the MyHeritage science team will be doing a talk about their new matching pipeline, as well as other features, at the Institute for Genetic Genealogy (I4GG) conference the weekend of December 9, 2017. (Yours truly will be speaking there, as well.)
No word as of yet about when the matching changes will be implemented.