Genealogists often complain about how many of our DNA matches don’t have trees, but is it true? Sure, some people test for fun and don’t care about genealogy, or they don’t have enough information to build a tree, but overall, is the tree situation as bad as it sometimes seems to be?
That’s a question I’ve asked and answered using actual data for the past two years. This post marks the third installment of the annual series. The gist of the analysis is that I had access to DNA kits for people who had tested in or transferred to all five main databases for genetic genealogy, and for each person I determined the percentage of their matches that had trees at each site.
Because genetic genealogy databases have continued to grow rapidly, and because the companies continue to offer new features to encourage tree-building, it’s time for a re-analysis.
Twenty unrelated people volunteered to have their information used anonymously for this year’s study. Each had their data in all five of the main matching databases: AncestryDNA, 23andMe, MyHeritage, Family Tree DNA, and GEDmatch. They were mostly Americans of European (including Jewish), African, or Korean descent.
The table below summarizes the percentages of matches with trees at each site for each of the 20 people, along with averages, maximums, and minimums. (See below for an explanation of how the data were obtained.)
As in previous years, the percentage of matches with trees varies greatly from company to company, with a high of 77.7% at MyHeritage to a low of 2.9% at 23andMe. All but Person 3 had the same rank order of companies, with MyHeritage > AncestryDNA > Family Tree DNA > GEDmatch > 23andMe.
Are There Fewer Matches with Trees Than Last Year?
A common concern is that the majority of new testers are only interested in their ethnicity estimates and not genealogy, so the number of matches with trees is going down over time. But is that true? To find out, I graphed the data for each company for the past three years to find out.
Overall, there were slight year-to-year declines in the percentage of users with trees, as expected if newer testers aren’t interested in genealogy. The decreases were small, though, typically from 1% to 2.5%. Family Tree DNA, on the other hand, showed a slight increase, from 40.0% in 2018 to 40.7% in 2019.
MyHeritage dropped from 88.4% users with trees in 2018 to 77.7% in 2019. I suspect that decline is an artifact, because last year I counted 1-person trees and this year I didn’t.
In summary, more than 70% of the people who test at both MyHeritage and AncestryDNA have at least some sort of tree associated with their DNA accounts, so the glass is truly more than half full at those sites.
In recent weeks, both 23andMe and Family Tree DNA have introduced new features that should encourage tree building. 23andMe is currently beta testing an automated family tree that will eventually be editable, allowing users to build trees and link their DNA matches at the same time. Family Tree DNA’s myFamilyTree 2.0 will include an “onboarding wizard” to walk people without trees through the process of creating one. I expect both companies to have more matches with trees by the time I redo this analysis next year.
How I Got the Percentages
You can tally how many of your own matches have trees using the steps described below. Please share your results in the comments!
Because each site presents matches and their trees in a different way—and because those presentations vary over time—the exact protocol for collecting the data in this series varies from site to site and from year to year. I try to compensate for that variability by collecting data for a large number of matches to smooth out any artifacts of the method itself..
At each site except for GEDmatch, I omitted the top matches to avoid bias. Serious genealogists—the kind of person who volunteers for a study like this one—often test many relatives and link them to a single tree. Including them would bias the results by increasing the percentage of matches with trees, when really I’d just be counting the volunteer’s tree multiple times. At GEDmatch, I was able to easily sample 1,000 matches at a time, so a few close relatives had little effect on the totals.
With the exception of matches at MyHeritage and AncestryDNA, I did not consider tree size, quality, or accessibility, just presence/absence. At those two sites, and only at those two sites, I could easily tell which matches had 1-person trees, and I excluded those from the tallies. At the other sites, 1-person trees, if they existed, were counted as part of the total.
AncestryDNA: AncestryDNA uses an infinite scroll page design, with more matches added to the bottom of the list as you scroll to the bottom of the page. For each volunteer, I scrolled down to load 500–600 matches, then copied everything on the page to a word processor. There, I deleted all matches closer than the 3rd cousin category. Then, I used the word processor’s search feature to count occurrences of the strings “people” (the total number of linked trees), “1 people” (the number of linked 1-person trees), “unlinked” (the number of unlinked trees), “no trees” (the number of matches without trees), and “unavailable” (usually people who were syncing with third-party software at the time I did the analysis).
The proportion of matches with trees was the total of linked trees + unlinked trees – 1-person trees divided by the total of linked + unlinked + no trees.
23andMe: 23andMe did not host trees at the time of this analysis, although they allowed users to link to trees on other sites. Trees are linked under the Family Background section of a match’s individual comparison page, as shown in the screenshot. The first URL (to an Ancestry tree) is my own tree and the second (the MyHeritage tree) is the tree of my match).
I opened the comparison pages for every match on pages 2–11 of the list (250 matches total), searched each page for the text “Family tree”, and tallied the matches with trees. The proportion of matches with trees was the total number of matches with trees divided by 250.
MyHeritage: At MyHeritage, I first set the number of matches per page to 50. Then, for pages 2–11, I searched the page for the string “people”. This automatically excluded 1-person trees. The proportion of matches with trees was the total number of matches with multi-person trees divided by 500.
Family Tree DNA: For pages 2–11 of the match list, I manually counted the number of matches per page with a blue tree icon, indicating that there was a tree attached.
FTDNA displays 30 matches per page. For 10 pages of matches, the proportion with trees was calculated as the number of matches with trees divided by 300.
GEDmatch: I ran a “One-to-Many Beta” analysis on each person with the maximum number of matches set to 1000. The list was copied to a word processor, where I then searched for and tallied the strings “GED Wiki”, “GED”, and “Wiki”. The proportion of matches with trees was the total number of matches with entries in the GED/Wiki field divided by 1000.