Genealogists often complain about how many of our DNA matches don’t have trees, but is it true? Sure, some people test for fun and don’t care about genealogy, or they don’t have enough information to build a tree, but overall, is the tree situation as bad as it sometimes seems to be?
That’s a question I’ve asked and answered using actual data for the past two years. This post marks the third installment of the annual series. The gist of the analysis is that I had access to DNA kits for people who had tested in or transferred to all five main databases for genetic genealogy, and for each person I determined the percentage of their matches that had trees at each site.
Because genetic genealogy databases have continued to grow rapidly, and because the companies continue to offer new features to encourage tree-building, it’s time for a re-analysis.
Twenty unrelated people volunteered to have their information used anonymously for this year’s study. Each had their data in all five of the main matching databases: AncestryDNA, 23andMe, MyHeritage, Family Tree DNA, and GEDmatch. They were mostly Americans of European (including Jewish), African, or Korean descent.
The table below summarizes the percentages of matches with trees at each site for each of the 20 people, along with averages, maximums, and minimums. (See below for an explanation of how the data were obtained.)
As in previous years, the percentage of matches with trees varies greatly from company to company, with a high of 77.7% at MyHeritage to a low of 2.9% at 23andMe. All but Person 3 had the same rank order of companies, with MyHeritage > AncestryDNA > Family Tree DNA > GEDmatch > 23andMe.
Are There Fewer Matches with Trees Than Last Year?
A common concern is that the majority of new testers are only interested in their ethnicity estimates and not genealogy, so the number of matches with trees is going down over time. But is that true? To find out, I graphed the data for each company for the past three years to find out.
Overall, there were slight year-to-year declines in the percentage of users with trees, as expected if newer testers aren’t interested in genealogy. The decreases were small, though, typically from 1% to 2.5%. Family Tree DNA, on the other hand, showed a slight increase, from 40.0% in 2018 to 40.7% in 2019.
MyHeritage dropped from 88.4% users with trees in 2018 to 77.7% in 2019. I suspect that decline is an artifact, because last year I counted 1-person trees and this year I didn’t.
Take Home
In summary, more than 70% of the people who test at both MyHeritage and AncestryDNA have at least some sort of tree associated with their DNA accounts, so the glass is truly more than half full at those sites.
In recent weeks, both 23andMe and Family Tree DNA have introduced new features that should encourage tree building. 23andMe is currently beta testing an automated family tree that will eventually be editable, allowing users to build trees and link their DNA matches at the same time. Family Tree DNA’s myFamilyTree 2.0 will include an “onboarding wizard” to walk people without trees through the process of creating one. I expect both companies to have more matches with trees by the time I redo this analysis next year.
How I Got the Percentages
You can tally how many of your own matches have trees using the steps described below. Please share your results in the comments!
Because each site presents matches and their trees in a different way—and because those presentations vary over time—the exact protocol for collecting the data in this series varies from site to site and from year to year. I try to compensate for that variability by collecting data for a large number of matches to smooth out any artifacts of the method itself..
At each site except for GEDmatch, I omitted the top matches to avoid bias. Serious genealogists—the kind of person who volunteers for a study like this one—often test many relatives and link them to a single tree. Including them would bias the results by increasing the percentage of matches with trees, when really I’d just be counting the volunteer’s tree multiple times. At GEDmatch, I was able to easily sample 1,000 matches at a time, so a few close relatives had little effect on the totals.
With the exception of matches at MyHeritage and AncestryDNA, I did not consider tree size, quality, or accessibility, just presence/absence. At those two sites, and only at those two sites, I could easily tell which matches had 1-person trees, and I excluded those from the tallies. At the other sites, 1-person trees, if they existed, were counted as part of the total.
AncestryDNA: AncestryDNA uses an infinite scroll page design, with more matches added to the bottom of the list as you scroll to the bottom of the page. For each volunteer, I scrolled down to load 500–600 matches, then copied everything on the page to a word processor. There, I deleted all matches closer than the 3rd cousin category. Then, I used the word processor’s search feature to count occurrences of the strings “people” (the total number of linked trees), “1 people” (the number of linked 1-person trees), “unlinked” (the number of unlinked trees), “no trees” (the number of matches without trees), and “unavailable” (usually people who were syncing with third-party software at the time I did the analysis).
The proportion of matches with trees was the total of linked trees + unlinked trees – 1-person trees divided by the total of linked + unlinked + no trees.
23andMe: 23andMe did not host trees at the time of this analysis, although they allowed users to link to trees on other sites. Trees are linked under the Family Background section of a match’s individual comparison page, as shown in the screenshot. The first URL (to an Ancestry tree) is my own tree and the second (the MyHeritage tree) is the tree of my match).
I opened the comparison pages for every match on pages 2–11 of the list (250 matches total), searched each page for the text “Family tree”, and tallied the matches with trees. The proportion of matches with trees was the total number of matches with trees divided by 250.
MyHeritage: At MyHeritage, I first set the number of matches per page to 50. Then, for pages 2–11, I searched the page for the string “people”. This automatically excluded 1-person trees. The proportion of matches with trees was the total number of matches with multi-person trees divided by 500.
Family Tree DNA: For pages 2–11 of the match list, I manually counted the number of matches per page with a blue tree icon, indicating that there was a tree attached.
FTDNA displays 30 matches per page. For 10 pages of matches, the proportion with trees was calculated as the number of matches with trees divided by 300.
GEDmatch: I ran a “One-to-Many Beta” analysis on each person with the maximum number of matches set to 1000. The list was copied to a word processor, where I then searched for and tallied the strings “GED Wiki”, “GED”, and “Wiki”. The proportion of matches with trees was the total number of matches with entries in the GED/Wiki field divided by 1000.
Very interesting. I assume the word Ancestry is missing in the section describing how you counted trees linked and unlinked. (For some reason the word Ancestry doesn’t show up for me there or in an earlier paragraph)—if I read it correctly, it looks like you discounted trees with only one person. But even a tree with two people is pretty useless. In fact, any tree with fewer than five people is of limited utility. I am on Ancestry, FTDNA, MyHeritage, 23andme, and GEDmatch, and I know that I am not finding a useful tree attached to even a quarter of my matches on any of the sites.
If you use an ad-blocker, it may be blocking the affiliate links.
I did not consider tree “quality” because it’s so subjective. That said, when the tree is small, or even absent, I try to build it myself. Quite often I’m successful and able to answer the question I’m asking. Even if I can’t build the tree myself, I can usually come up with some likely surnames or places to use when I message them. I get a much higher response rate when I do that.
I could see creating a tree if I had a limited number of matches, but with thousands, it’s just impossible when I have absolutely no idea how, if at all, that person is connected to me. Since I have researched almost all my ancestors’ descendants out to my fourth to fifth cousins, I know who they are, so 90% of my matches are not nearly as closely related as estimated.
Most people don’t put all of their matches into their tree, just their direct ancestors and maybe a few collateral lines.
I don’t understand what you are suggesting. That for most people they wouldn’t have located their fourth or fifth cousins so the DNA helps? Thus, if I already have, it won’t be of much help to me? Or are you saying something else??
Most genealogists aren’t as thorough as you are!
Aw, shucks. Thanks, Leah. 🙂
Thanks for this reminder, Leah!
Maybe it might be more useful for genealogical purposes to look at it from the perspective of not only the total number of trees but also the size(s) of those trees? At least for my matches, while there are many of them that do have trees that I am able to see how we are related to each other (and even better the more recent technologies that weave together multiple trees to find ways we are related that we likely wouldn’t find ourselves), it seems that there are a great number of people who have tested who only put their parents in their trees or possibly their grandparents, making it difficult to determine how we are related. While it’s certainly everyone’s prerogative to post as much or as little of their trees, or none at all, from a genealogical standpoint, matches aren’t not of nearly as much use if we don’t know how these people are related to us. My experience has been that I can only determine relationships in about 10-15% of the time.
I would love to be able to do that, but it’s not feasible for a number of reasons. First, it’s too much work. I would have to click through to every single tree (more than 20,000 of them) and manually count how many direct ancestors there are. Second, just because a tree is big doesn’t mean it’s accurate, and I can’t vet every single person in every single tree. Finally, even if the match’s tree is small, it’s usually easy to build it out myself to find a connection.
By no means was I suggesting that you do this. Rather that the percentage of trees that are useful for determining how people are related to each other is far smaller than the total number of trees that are posted with DNA profiles. I do think that the technologies that have been rolled out by the vendors in the last year greatly aid in tying trees together that would be difficult, if not impossible, to do by oneself. As more people test and more people enter their trees, this will only get better.
If there were an objective way to evaluate trees that was feasible from a workload standpoint, I would do it. For the survey, I assumed that the proportion of “useful” trees was the same across the companies. The exceptions might be 23andMe and GEDmatch, because only the most serious genealogists will bother to link a tree, and those trees are therefore likely to be both larger and better researched.
Interesting, surprised you used only US volunteers for this, so I tried my own (UK) Ancestry results using your method. The result was 198+64-15 / 198+64+83 = 247/345 = 71.6%
Exactly as two of your Persons. So no different, and I suppose there was no reason to expect that it would be.
Most, but not all, of the volunteers were Americans. Thanks for sharing your results!
I didn’t know any of my relatives, on either side, so I just messaged the closest DNA ones, and built my tree from there. I am trying to unite our fifth and sixth generations, very difficult, as can’t go by family trees. The DNA helps a bit, have to look for lots of records