‘So frustrating! 95% of my DNA matches don’t have trees!’
‘People at Company X only test for their ethnicity estimates. If you want to find people who are serious about genealogy, you have to use Company Y.’
‘Back in the day, we did real genealogy research. Today, people just want to take a DNA test and have their pedigrees handed to them. That’s why no one has trees!’
We see these complaints often. They don’t often jibe with my own experiences, though, so last year I did a comparative analysis of the percentage of autosomal DNA matches with trees at AncestryDNA, 23andMe, Family Tree DNA, and GEDmatch. I used 12 unrelated people who each had DNA at all four sites to minimize bias. At the time, I did not have access to many kits at MyHeritage, so that company was not included.
Since that study, AncestryDNA has added more than 4 million testers to their database, 23andMe more than 2 million, and MyHeritage more than 1 million. (The database sizes over time are graphed here.) The continued popularity of direct-to-consumer DNA testing means that the percentages I found 11 months ago may no longer represent the experience of genetic genealogists today. A re-analysis is warranted. I now also have access to multiple kits at MyHeritage, enabling me to include them this time around.
I had access to DNA test results for 10 unrelated people who have either taken DNA tests with or transferred their data to all five of the main databases: AncestryDNA, 23andMe, MyHeritage, Family Tree DNA, and GEDmatch. Each person agreed to have their information used anonymously. They were mostly Americans of European (including Jewish) or African descent, while one was Scottish and one British.
The table below represents the percentages of matches with trees at each site for each of the 10 people, along with averages, maximums, and minimums. (See below for an explanation of how the data were obtained.)
There is a huge disparity across companies in the percentage of users with family trees, ranging from a high of 88.4% at MyHeritage to a low of 2.9% at 23andMe. As in last year’s analysis, every tester had the same rank order of companies; the only difference was that MyHeritage was the top company this year, followed by, in order, AncestryDNA, FTDNA, GEDmatch, and 23andMe. MyHeritage has done a remarkably good job of encouraging their users to associate trees with their DNA results, especially for such a new competitor in the genealogical DNA testing market. 23andMe’s low numbers are undoubtedly the results of their decision not to host trees within their own system (although trees at other sites can be linked to a tester’s profile).
Are There Fewer Matches with Trees Than Last Year?
If the majority of new testers are only interested in their ethnicity estimates and not genealogy, we might expect so. I compared the percentages from last year to the newest ones.
At most sites, there was a slight decline (1%–2.5%) in the percentage of users with trees over the past year, in line with expectation. FTDNA, on the other hand, logged a 2.5% increase in trees, from 37.5% to 40.0%. Keep up the good work, FTDNA!
Take Home
In summary, more than three-quarters of users at both MyHeritage and AncestryDNA have at least some sort of tree associated with their DNA accounts, so the glass is truly more than half full at those sites. If I multiply the size of each database by the percentage of its users with trees, there are a total of 9,672,056 trees and 18,680,000 tested people across all sites, for 51.8% with trees. Rosy indeed!
Get In On the MyHeritage Action Before the Fees
MyHeritage recently announced that their free transfer program, which gives the full complement of features to those who transfer their raw data from other sites, will be ending on December 1, 2018. After that date, data transfers will still be free, but some tools—most likely the chromosome browser and ethnicity estimates—at the site will incur a charge to use. To transfer for free before the deadline, use this link. (Instructions on how to download your raw data from AncestryDNA are provided here.)
How I Got the Percentages
If you’d like to tally trees for your own matches, here’s how I did it. Please post your results in the comments if you try it!
At each of the sites, I omitted the top matches to avoid bias. For example, my top matches include my parents, children, and cousins that I tested and linked to my own tree. Including them would bias the results by increasing the percentage of matches with trees.
I did not consider tree size, quality, accessibility, or documentation. Some of the sites allow a tree with a single person, and some trees contain only living people who are privatized. In this study, only presence/absence of a tree was tallied.
AncestryDNA: Starting with the second page of matches, I used my internet browser’s search feature to search each of the next 10 pages for the strings “people” (to count the linked trees) and then “unlinked” (to count the unlinked trees). The proportion of matches with trees of either kind was:
23andMe: 23andMe does not host trees, although they allow users to link to a tree on another site. Trees are linked near the bottom of each match’s individual comparison page, as shown.
To calculate the proportion, I opened the comparison pages for every match on pages 2–11 of the list (250 matches total), searched each page for the text “how you are related”, and tallied the matches with trees. The proportion of matches with trees was:
MyHeritage: At MyHeritage, I first set the number of pages per page to 50. Then, for pages 2–11, I searched the page for the string “View tree”. The proportion of matches with trees was:
Family Tree DNA: For pages 2–11 of the match list, I manually counted the number of matches per page with a blue tree icon, indicating that there was a tree attached.
FTDNA displays 30 matches per page. For 10 pages of matches, the proportion with trees was calculated as:
One volunteer had fewer than 11 pages of matches. For that person, the denominator in the equation was the total number of matches minus 30 (the matches on the first page).
GEDmatch: I ran a One-to-Many analysis on each person and copied the results into a spreadsheet. I then deleted the first 50 rows and counted the remaining rows that had entries in the GED/WikiTree column. The proportion of matches with trees was:
For all of the kits examined here, the total number of matches considered was 2000 – 50 = 1950, because the One-to-Many analysis returns a maximum of 2000 matches.
There are other analysis done that show that most trees at Ancestry are either 1 or 2 generations only, named private with no info or aren’t public in the first place.
From the large amount of daily trees only less than 25% had any value to a genealogists (means more than 75% fell into the categories I’ve highlighted in the first paragraph).
The analysis did not look at tree size or public/private status by design, because the same issues apply to all of the companies. There are many 1-person trees at both MyHeritage and Family Tree DNA.
That said, I strongly dispute the idea that small trees, or even private ones, are useless to a good genealogist. I’ve solved adoptee cases using trees with a single visible name. Private trees are usually searchable, so you can use surname searches to identify the MRCA couple. And once you can name some surnames and locations, the match is much more likely to respond to a message.
I think anyone who cares enough to post their DNA on all sites is much more likely to have a tree than those who use only one site. I transferred mine from Ancestry to My Heritage, but do not find the site easy to use and have only a few matches there, whereas at Ancestry I have hundreds. Most do not have trees, which is frustrating. I was able to create my Ancestry tree easily from familysearch, and I do not know how to do that on any other site. Others may be in the same position.
If you want to put your Ancestry tree at the other sites, you can download a gedcom in Tree Settings and upload it elsewhere. It’s much easier than trying to recreate it at each site!
The easiest calculation would be “No Trees” / total matches. 24% of my matches have zero trees. Every other match has 76% with a tree of some kind: either private, not attached, public,or just a couple of names. The next question would be: how many of those trees are useable? Very few in my experience.
Yes, I started doing it that way but for my own purposes was interested in the number of unattached trees, so I did the extra work to separate them out.
In my experience, most trees are useful. Often, I have to build the match’s tree out myself to find the connection, but that doesn’t bother me. Genealogists do that for fun, after all!
Thanks for this information. It isn’t totally surprising to me. Like many others I imagine, I came to Ancestry first to do paper research, and I constructed a tree as part of that. Only later did I have my DNA tested. I imagine there are My Heritage users who were the same. So you might expect these companies to have a higher percentage of people with trees.
On the other hand, a lot of DNA testers are what we might term “casual”, e.g. I have known people to be given a test as a gift and so only have a very fleeting interest in ethnicity and virtually none in family history. You’d expect these people to be less likely to have a tree, and you’d also expect them not to transfer to Gedmatch or FTDNA, which would lead to the conclusion that Gedmatch and FTDNA users might be more likely to have a tree because they are more serious about family history.
But you have to admire My Heritage’s success. They started significantly later than the other companies, yet quickly surpassed FTDNA in their overall numbers and in their tree percentage. I personally think FTDNA is the most helpful of the testing companies, but for whatever reason, their database and their tree percentage aren’t so good.
I’m also impressed with how quickly MyHeritage has grown. My (unofficial) estimate based on their past growth trajectory puts them at more than 2 million.
It’s quite possible that FTDNA is larger than we think, but because they won’t release official numbers, all we have is an estimate.