Fool Me Once …

The Autosomal Database Growth Graph is a popular feature of the DNA Geek site.  But it’s wrong.  And it’s been wrong for some time.

Let me explain.  AncestryDNA and MyHeritage both report on their websites the number of people in their DNA databases.  23andMe reports the number of test kits sold.  I’ve dutifully plotted those numbers—the only ones available—on the graph all these years assuming that “kits sold” and “kits in database” would track fairly closely to one another.

But I was wrong.  To the tune of about 25%.  Mea culpa.

Let’s review.  As noted above, 23andMe has historically reported “kits sold” rather than “kits genotyped”.  That little slight-of-hand, while honest, was misleading about what we genealogists care about:  the size of their database.  However, in their Spring 2021 Investor Presentation, they came clean with the actual genotyped numbers, dating back to fiscal year 2017, which ended 31 March 2017.

At the time, I was stumped as to what to do.  I didn’t want to redo the database growth graph until I knew whether they would continue to report “kits genotyped” going forward or whether they would stick to “kits sold”.  Now we have three fiscal quarters in a row that they’ve reported “kits genotyped”, so it’s time for a revamp.

We can now compare how “off” we were in our understanding of the database.  As you can see in the graph, the two lines are quite different.  The number of kits sold (pale line) is roughly 25% higher than the number genotyped (dark line) in 2019 and 2020.  The difference isn’t as stark in 2018, and the values are the same in 2017.

How did this happen?

Late 2016 is when genealogical DNA testing really took off.  Around then, 23andMe began selling kits in some drugstores for about $30, not including the lab fee.  My guess is two things happened.  First, some people bought the $30 kit on impulse but balked at the $169 lab fee.  Second, avid genealogists began buying kits in hopes that family and friends would test.  (Hands up:  Who has a couple of extra kits lying around?  💁‍♀️)  Both situations resulted in kits sold but not getting sent back to be genotyped.  I would never have guessed there were so many, certainly not 2 million!

I won’t make that mistake again.

Now, without further ado, here’s the updated graph, as of December 2, 2021.  Based on recent trends, I estimate that AncestryDNA has 21.2 million kits, 23andMe 12.1 million, MyHeritage 5.4 million, GEDmatch 1.7 million, and FamilyTreeDNA 1.6 million.  With only two data points for Living DNA, I’m not comfortable projecting their current size just yet.







17 thoughts on “Fool Me Once …”

  1. Investors are pleased with sold vs. genotyping, I imagine. I wonder the profit ratio of kit cost:price to genotyping cost:price.

    1. 23andMe is a biomedical research company. Their value line doesn’t lie in kits sold but in the size of the database, specifically the 85% or so that has opted in to medical research and answered the surveys.

  2. There is something I can’t explain putting these graphs alongside my own DNA matches. My DNA is on all three leading sites, so I might expect to have found matches that I can reliably place on my tree in numbers comparable to the numbers on the graph. But in fact, while I have just one or two with MyH and 23am, I have forty or so with Ancestry, well above the ratio suggested.

    1. Depends on local advertising and preferences.
      I live in Australia, which is saturated with AncestryDNA ads.
      My local cousins with German descent test with them.
      Meanwhile in Germany, my DNA cousins test with MyHeritage.
      Have just tested with 23andMe so here’s hoping.
      Some friends find their families decide to test with different siblings in different places. But then they need to copy all of their results to one place.
      Leah, what do you find works best for siblings?

      1. It depends on the goals and budget. For genealogy, adding a sibling or two to the mix helps, but the returns diminish with each additional one. For health reports, each sibling needs their own test. I typically recommend testing with Ancestry first, transferring if you’re comfortable, then testing at 23andMe.

  3. This reminds me so much of what happened in the UK at the height of the Covid crisis. We did not have enough testing capacity (and the NHS system was not capable of responding fast enough) – so every day in Parliament and the media were questions on testing capacity. So the government sort of cheated, as it was far easier to measure the Kits sent out then correlate the figures as to what came back. That requires far more people, database inputs, etc. So the figures were massaged to show Kits being sent out – which is never the same as Kits returned, for one very obvious reason. The latter version involves humans doing something, with all the associated vagaries that can happen.

  4. Have you posted this to the main ISOGG Facebook page, as it is so interesting to know. I posted it to the Swann/Swan Facebook page we use to communicate on that surname project. I have a lovely expression for this sort of thing which is attributable to a Conservative minister from the early 1990s – Alan Clarke: “We were economical with la verite (the truth)”.

    1. Feel free to post it to the ISOGG group. I’m not a member. That quote is hilarious (in a painful sort of way).

  5. Ancestry a few years ago had a strong of ads focusing on people testing their DNA just for ethnicity. Maybe some of those people went on to study their family. But I have a couple of non-responsive matches known by other cousins who told me that that is all they did. And friends tell me the same thing. Those ads are gone, so maybe that effect is being diluted.
    With 23andMe, some people test just for the medical stuff*. They don’t offer that in Australia, so at least I will know when I get them that my local 23andMe matches are interested in genealogy.
    *Am hearing reports that while most of those people used not to respond to enquiries, more are now beginning to do so.

  6. Thanks for this, great information.

    Does this mean that the numbers for websites like do NOT count DNA data uploaded from other companies? That is, they count only those tests that they genotype themselves? That would also be very useful to know.

  7. Thanks for that interesting info. I have been wondering about this. I know I have some extra kits around (both 23andme and ancestry, I believe). I presume they have expired. At one point, perhaps both companies sent me new kits when old ones expired. Any idea whether they still have that practice?

    1. If they still have preservative liquid, you can use them and send them back. If they fail, the company will send you a replacement for free.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.