‘So frustrating! 95% of my DNA matches don’t have trees!’
‘People at Company X only test for their ethnicity estimates. If you want to find people who are serious about genealogy, you have to use Company Y.’
‘Back in the day, we did real genealogy research. Today, people just want to take a DNA test and have their pedigrees handed to them. That’s why no one has trees!’
We see these complaints often. They don’t often jibe with my own experiences, though, so last year I did a comparative analysis of the percentage of autosomal DNA matches with trees at AncestryDNA, 23andMe, Family Tree DNA, and GEDmatch. I used 12 unrelated people who each had DNA at all four sites to minimize bias. At the time, I did not have access to many kits at MyHeritage, so that company was not included.
Since that study, AncestryDNA has added more than 4 million testers to their database, 23andMe more than 2 million, and MyHeritage more than 1 million. (The database sizes over time are graphed here.) The continued popularity of direct-to-consumer DNA testing means that the percentages I found 11 months ago may no longer represent the experience of genetic genealogists today. A re-analysis is warranted. I now also have access to multiple kits at MyHeritage, enabling me to include them this time around.
I had access to DNA test results for 10 unrelated people who have either taken DNA tests with or transferred their data to all five of the main databases: AncestryDNA, 23andMe, MyHeritage, Family Tree DNA, and GEDmatch. Each person agreed to have their information used anonymously. They were mostly Americans of European (including Jewish) or African descent, while one was Scottish and one British.
The table below represents the percentages of matches with trees at each site for each of the 10 people, along with averages, maximums, and minimums. (See below for an explanation of how the data were obtained.)
There is a huge disparity across companies in the percentage of users with family trees, ranging from a high of 88.4% at MyHeritage to a low of 2.9% at 23andMe. As in last year’s analysis, every tester had the same rank order of companies; the only difference was that MyHeritage was the top company this year, followed by, in order, AncestryDNA, FTDNA, GEDmatch, and 23andMe. MyHeritage has done a remarkably good job of encouraging their users to associate trees with their DNA results, especially for such a new competitor in the genealogical DNA testing market. 23andMe’s low numbers are undoubtedly the results of their decision not to host trees within their own system (although trees at other sites can be linked to a tester’s profile).
Are There Fewer Matches with Trees Than Last Year?
If the majority of new testers are only interested in their ethnicity estimates and not genealogy, we might expect so. I compared the percentages from last year to the newest ones.
At most sites, there was a slight decline (1%–2.5%) in the percentage of users with trees over the past year, in line with expectation. FTDNA, on the other hand, logged a 2.5% increase in trees, from 37.5% to 40.0%. Keep up the good work, FTDNA!
In summary, more than three-quarters of users at both MyHeritage and AncestryDNA have at least some sort of tree associated with their DNA accounts, so the glass is truly more than half full at those sites. If I multiply the size of each database by the percentage of its users with trees, there are a total of 9,672,056 trees and 18,680,000 tested people across all sites, for 51.8% with trees. Rosy indeed!
Get In On the MyHeritage Action Before the Fees
MyHeritage recently announced that their free transfer program, which gives the full complement of features to those who transfer their raw data from other sites, will be ending on December 1, 2018. After that date, data transfers will still be free, but some tools—most likely the chromosome browser and ethnicity estimates—at the site will incur a charge to use. To transfer for free before the deadline, use this link. (Instructions on how to download your raw data from AncestryDNA are provided here.)
How I Got the Percentages
If you’d like to tally trees for your own matches, here’s how I did it. Please post your results in the comments if you try it!
At each of the sites, I omitted the top matches to avoid bias. For example, my top matches include my parents, children, and cousins that I tested and linked to my own tree. Including them would bias the results by increasing the percentage of matches with trees.
I did not consider tree size, quality, accessibility, or documentation. Some of the sites allow a tree with a single person, and some trees contain only living people who are privatized. In this study, only presence/absence of a tree was tallied.
AncestryDNA: Starting with the second page of matches, I used my internet browser’s search feature to search each of the next 10 pages for the strings “people” (to count the linked trees) and then “unlinked” (to count the unlinked trees). The proportion of matches with trees of either kind was:
23andMe: 23andMe does not host trees, although they allow users to link to a tree on another site. Trees are linked near the bottom of each match’s individual comparison page, as shown.
To calculate the proportion, I opened the comparison pages for every match on pages 2–11 of the list (250 matches total), searched each page for the text “how you are related”, and tallied the matches with trees. The proportion of matches with trees was:
MyHeritage: At MyHeritage, I first set the number of pages per page to 50. Then, for pages 2–11, I searched the page for the string “View tree”. The proportion of matches with trees was:
Family Tree DNA: For pages 2–11 of the match list, I manually counted the number of matches per page with a blue tree icon, indicating that there was a tree attached.
FTDNA displays 30 matches per page. For 10 pages of matches, the proportion with trees was calculated as:
One volunteer had fewer than 11 pages of matches. For that person, the denominator in the equation was the total number of matches minus 30 (the matches on the first page).
GEDmatch: I ran a One-to-Many analysis on each person and copied the results into a spreadsheet. I then deleted the first 50 rows and counted the remaining rows that had entries in the GED/WikiTree column. The proportion of matches with trees was:
For all of the kits examined here, the total number of matches considered was 2000 – 50 = 1950, because the One-to-Many analysis returns a maximum of 2000 matches.