The past year has seen a chilling in the genetic genealogy industry: DNA kit sales are down drastically since April 2018.
I don’t work for any of the testing companies, nor do I have any special insight into their official sales figures. However, I have been tracking their database sizes for a couple of years now (with data retroactive to 2013), and the decline in growth rate is obvious. Yes, the databases are still growing, but they’re growing more slowly than before.
Consider the graph below. The slope (steepness) of each line indicates how fast the database is growing at that point in time. Notice that the slopes for AncestryDNA (green) and 23andMe (purple) got steeper and steeper until April 2018, after which growth of both databases slowed (the region in the grey box).
Growth at the other databases (with the possible exception of MyHeritage; see below) has also slowed since April 2018, although it’s harder to see from the graph because the scale on the y-axis is set to the larger companies.
How much has growth slowed?
We can predict how large the databases would be had they continued to grow at the rates prior to April 2018 using curve fitting. Curve fitting is a mathematical process in which an equation is found that “fits” the real-life data as best as possible. Once a good equation is found, it can be used to extrapolate the expected values beyond the range of the existing data.
We can gauge how well the equation fits the data using a metric called R² (pronounced R-squared). The R² value is always between zero and one. The closer it is to one, the better the equation fits the real data.
I used an online curve fitting tool called MyCurveFit to fit exponential equations to each company’s growth trajectory through April 2018. In exponential growth, the rate of change increases over time, which is what we see in the graph above prior to April 2018.
For each database, I plotted the database sizes as currently known on the same graph as the values calculated from the fitted equation.
AncestryDNA has the largest database of the genealogical testing companies, larger than the others combined. In May 2019, they announced that their database contained more than 15 million people. Previously, they’ve announced growth milestones three or four times per year, giving me 19 data points prior to April 2018 and two after that date.
The graph below compares the actual reported values from Ancestry with the values projected by the equation.
The two lines overlap nearly perfectly prior to April 2018. In fact, the R² value is 0.9970, or almost one. However, the lines diverge sharply after that date. Had AncestryDNA’s database continued to grow at the previous rate, the equation projects it would have had more than 21 million people in May 2019 rather than the reported 15 million.
Put another way, from April 2018 to May 2019, the database added 6 million people, when it was predicted to add more than 12 million. That’s a decline in growth of 51%.
23andMe is the second largest genealogical database, with more than 10 million people as of April 2019. The company reports their database size once or twice a year, giving 14 data points prior to April 2018 and one after.
The graph below compares the actual reported values from 23andMe with the values calculated from the equation.
The two lines overlap well prior to April 2018, with an R² value of 0.9612. As with AncestryDNA, the lines then follow different trajectories. Had 23andMe’s database continued to grow according to the equation, it would have had nearly 14 million people rather than the 10 million reported in April 2019.
From February 2018 to April 2019, the database added 5 million people. It was projected to add nearly 9 million, a decline in growth of 43%.
FamilyTreeDNA is the smallest of the databases discussed here, and they have never officially announced how many autosomal DNA testers they have. The values used here were estimated by Tim Janzen, a long-time customer of the company, and published on the ISOGG wiki. There were 20 data points prior to April 2018 and three after.
The graph below compares Tim Janzen’s estimates for FamilyTreeDNA’s autosomal database with the values projected by the equation.
The R² value prior to April 2018 is 0.9901 and, again, we see a decline in growth after that point. Had FamilyTreeDNA’s autosomal database continued to grow according to the equation, it would have had about 1.5 million people rather than the 1 million estimated in February 2019.
Assuming the estimated numbers are correct, FamilyTreeDNA added 200,000 people from March 2018 to February 2019, when it was projected to add 700,000. In other words, growth declined 71%.
The owners of GEDmatch have kindly reported their database size to me personally at intervals since January 2016, giving seven data points prior to April 2018. Either directly from GEDmatch or from media reports, I had five data points after April 2018.
The graph below compares the actual values with those projected by the best-fit equation for GEDmatch.
The two lines are almost identical before April 2018, with an R² value is 0.9977. After that point, the GEDmatch database initially grows faster than expected, then declines below the values predicted by the equation. Had GEDmatch continued to grow as projected, it would have had more than 1.4 million people in May 2019 rather than 1.2 million.
GEDmatch added 387,000 people from February 2018 to May 2019, when it was projected to add nearly 650,000. Growth declined 40%.
MyHeritage is the most recent entry to the DNA testing market that is discussed here. Thus, there were only four data points prior to April 2018, not enough to fit a reliable curve. (The projected database size based on those four points was 79 million, which is simply not credible.) Thus, for MyHeritage—and only for MyHeritage—I included one data point from May 2018.
The graph below compares the actual growth trajectory for MyHeritage with that projected based on those five points.
The R² value prior to May 2018 is 0.9891. Like the other databases, growth was slower than expected after that point. Had MyHeritage’s database continued to grow according to the equation, it would have had nearly 3.8 million people rather than the 3 million reported in May 2019.
Between May 2018 and May 2019, MyHeritage added 1.6 million people. If the projections are correct, it was expected to add nearly 2.4 million, a decline in growth of 32%.
The Obvious Question
The pattern is clear: something happened early in 2018 to cause database growth to slow across the board, from 32% at MyHeritage to as much as 71% at FamilyTreeDNA. The question is: What? What caused the decline?
One possibility is market saturation. Perhaps genetic genealogy is approaching its natural consumption level, where those who are inclined to purchase a test already have. The counter to that argument is that 23andMe is not a genealogy company; it’s a biomedical one. Theirs is a different market, yet the company’s growth declined along with those of the genealogy companies.
What’s more, one might reasonably argue that the market in the United States is approaching saturation, but relatively few people in Europe have tested, meaning there’s still ample room for growth there. Yet MyHeritage, which is based in Israel and sells most of their DNA kits in Europe, also experienced a decline in growth.
It’s also possible that my numbers for past growth are wrong because the testing companies don’t report the exact database size on a specific date. Rather, they usually report that their database is “larger than X”, and the precise date it hit that threshold is not publicly known. However, the numbers I have for GEDmatch are largely date-specific, and GEDmatch’s growth slowed, as well.
The elephant in the room, of course, is the use of some genealogy databases, specifically GEDmatch and FamilyTreeDNA, by law enforcement. That fact first became public knowledge on April 25, 2018, when the Golden State Killer story broke. And April 2018 is precisely when we see a decline in growth across the board.
Public concern over law enforcement using genetic databases seems the most likely explanation for the cooling of the market. In fact, Anne Wojcicki, the CEO of 23andMe, publicly speculated that law enforcement and privacy concerns were indeed behind their decline in growth. And she actually does have inside information on their sales figures!
This explanation fits a few observations well. First, the company showing the smallest decline, MyHeritage (32%), is also the company whose market is furthest removed from the American judicial system and thus either unaware of or unthreatened by US law enforcement using genealogy databases.
Second, the company most welcoming of law enforcement, FamilyTreeDNA, showed the largest decline in growth (71%).
Third, the graph for GEDmatch shows an increase in slope immediately after the Golden State Killer arrest—a change widely attributed to the positive press that GEDmatch received at the time—followed by a sustained decline. If the GSK case could have caused the short-term increase, fallout from that case could also have caused the long-term decline.
Whether public concerns over law enforcement truly are the explanation for the market slow-down is still an open question. The community should be aware that a decline in growth is occurring and discuss rationally and maturely the possible reasons behind it.