Yaniv Was Right

At a family history event in mid-August, a company employee let slip that AncestryDNA was considering allowing DNA files created by other companies to be uploaded into their database.  This report was confirmed to me directly by three independent sources.

If this were to happen, it would rock the genetic genealogy world, in more ways than one.  Let’s think this through.

The Good

Genetic genealogy is a numbers game, and AncestryDNA has—by far—the largest genealogical DNA database out there. It’s about as large as the other direct-to-consumer databases combined.  Almost everyone will benefit by having their family members in the AncestryDNA database.

That is not always possible, however.  There are thousands of people in the smaller databases who have since passed away.  Many of them tested at the request of avid genealogists who would dearly love to leverage AncestryDNA for their research but can’t.

These genealogists have often invested thousands of dollars to test extended family and would be willing to pay a little more to get those DNA kits into AncestryDNA’s database.  It would be a win-win:  a new revenue stream for AncestryDNA and a new life for legacy kits that are now stranded.

The Bad

Uploads at AncestryDNA would upend the genetic genealogy industry.  Genealogists managing legacy kits would no longer need to cajole their AncestryDNA matches to transfer to the smaller databases to evaluate matches; they could simply transfer their own family kits the other way.

Smaller companies would falter, because there would be little incentive to upload there anymore.  I’d hazard that 40% of the kits at MyHeritage and FamilyTreeDNA tested elsewhere, so growth at both places would be seriously impacted.  MyHeritage has a solid foothold in Europe, so I think they’d adapt and survive.  I’m not so sure about FamilyTreeDNA’s autosomal database.

As much as I’ve questioned FamilyTreeDNA’s actions over the past few years, I don’t want them to go under.  A strong market needs competition, and genealogists need the Y-DNA and mitochondrial DNA offerings that only FamilyTreeDNA offers.

Sites like GEDmatch that are 100% uploads might well collapse.

The Ugly

There is a more sinister side to the possibility that AncestryDNA might accept uploaded files:  law enforcement and bad actors could, nay would, invade the database, regardless of Ancestry’s terms of service or what consumers want.

We’ve already seen that some law enforcement agents are uploading to sites forbidden to them by both Department of Justice policy and the databases themselves.

We also recently learned that prominent forensic genetic genealogists—some of whom have spoken eloquently about ethics and trust—had been using a privacy hole at GEDmatch to see people who had not consented to law-enforcement matching.

It only takes a few bad apples to spoil everything.  If leaders in forensic genetic genealogy cannot be trusted to play by the rules, why should anyone else?  Ethical practitioners can’t compete with people who cheat to solve cases and draw fawning press coverage.

If AncestryDNA starts taking unvetted uploads, law enforcement will be there in a heartbeat.  The entire industry will suffer like it did in the wake of the Golden State Killer’s arrest.  You can see the damage in the graph below.

The long straight lines represent the growth rates at AncestryDNA (green) and 23andMe (purple) in the year prior to the arrest (thicker lines) and since then (thinner lines).  Sales took a huge hit when the public learned that law enforcement had infiltrated some databases.  The industry still hasn’t recovered.

There is a solution, though.

Cryptosignatures

In 2018, Yaniv Erlich, then the Chief Science Officer at MyHeritage, proposed that DNA testing companies cryptographically sign their raw data files.  (See the last paragraph of this scientific article.)  The cryptosignatures could then be used by other sites to authenticate a file’s origin before accepting it.

Such a system requires collaboration, though.  If AncestryDNA wants to accept uploads from 23andMe, or vice versa, both companies would have to agree to use cryptosignatures.  Each would have to negotiate its own arrangement with MyHeritage.

Back in 2018, there wasn’t much incentive for the big companies to agree to all that.  But now that AncestryDNA is considering uploads, the calculus has changed.  (Calculus does that. 😉)

FamilyTreeDNA and GEDmatch, which both collaborate with law enforcement, would also benefit from cryptosignatures.  They charge $700 per kit for forensic uploads, so law enforcement has a dual incentive to skirt the rules and upload as “normal” kits:  the agents avoid a hefty fee and get to see kits that have not consented to forensic matching.  Cryptsignatures would protect both the regular users at FamilyTreeDNA and GEDmatch as well as their bottom lines.

Yaniv was right!  It’s time for the DNA testing companies to adopt cryptographic signatures to protect the entire industry.

28 thoughts on “Yaniv Was Right”

  1. Change is the major constant in almost anything . . . and there will be changes . . . it just remains to be seen what will change and in what ways . . .

  2. Liah, How is a forensic upload identified in our match lists? I am so curious, and would like to know. I have seen a few squirrly-looking ones in my match lists. Years ago, I had 4 siblings as 4th C matches, and when the LE Opt-in/out at Gedmatch started, all 4 siblings kits disappeared from my match list. Too bad, I had already done a TG on them and saved. Does that make me wonder if they had something to hide? LOL

    1. They aren’t identified in our match lists. The agents uploading against ToS for the most part know not to self-identify.

  3. The reason we “cajole” matches to upload elsewhere is that Ancestry still lacks a chromosome browser. More matches without that are not fully useful.

      1. Good for you that you rarely need one. Personally, I use one extensively, as it’s the only way to identify which line a match comes to a kit through with any degree of certainty.

        Shared matches aren’t foolproof, as your matches can be related to each other by means other than their common ancestors with you. Without any segment-level tools, be they a proper browser or just a triangulation indicator, you’re just speculating.

        I’ve thought about signing raw data files at length for my own purposes, and whilst it would be good if only used to flag the provenance of uploaded kits, if it was used to block them that would be awful.

        Take the use case for synthetic kits for deceased ancestors. You can create these yourself if you’re skilled enough, but restricting uploads to only certified sources kills all that potential stone dead.

        1. I’ve “speculated” my way through several hundred solved cases at this point, and thousands more have been solved using WATO without any triangulation whatsoever. 🙂

        2. I think we may be talking broadly when I find many differences. For me Ancestry is unparalleled in finding matches out to 4th cousin.
          Leah, if you are working heavily for others, I suspect their mysteries lie within this range.
          I got that far many years ago, by documentation requested by snail mail over a year or so. Now I work out at 5C to 9C. Few trees go out that far and a high proportion have accepted an incorrect Ancestry suggestion or found something too convenient for themselves and are wildly wrong at the extremities. Trying to extend their trees for them is hampered by too many possibilities with identical names.
          I have a bucket load of part trees in Unlinked Family Clusters. They are being solved by chromosome segments although I continue to try every other available technique as well.

        3. How do you prove that the segments came from a specific ancestor and not through a different relationship path?

  4. My crystal bowl only sees collapse at Gedmatch, manly because user threshold is too steep for the average joe/jane. Ftdna’s owners (gene by gene) has licensed their kits to MyHeritage. My guess they crystal bowl says they will be able to restructure/downsize and survive for at least 5 years. Other companies like Livingdna will suffer more.

    Interoperability is one of the pillars in EU’s gdpr, the right to move your data will at some point force Ancestry and 23&me to accept imports to keep growing.

    Single most negative factor for Ancestry is lack of chromosome data. Many might need it, but I will never trust Ancestry unless I can validate by chromosome data

  5. Why doesn’t law enforcement open their own searchable forensic website?
    Then we could see how many family matches we have with them.

    1. AncestryDNA announced that they would extend SideView to the grandparent level, so I suspect they’ll do that instead of clustering. Personally, I’d like both.

      Ancestry has long argued that a chromosome browser is a privacy risk. They’re right. I don’t see them introducing one any time soon, if ever. And if they do, it would be an opt-in system.

      1. The privacy aspect is often underrated. There are still relatives who are reluctant to test. Some come around if they come to realize they won’t be prosecuted by accident. Recent stories of police appearing on the doorsteps of non-offenders who just happen to be DNA cousins are not helping.

        1. It also doesn’t help that LE/FGGs are doing genealogical testing (as opposed to CODIS STRs) on relatives without their knowledge or consent.

  6. it’s not like millions of people would rush to get their DNA into the databases of the law enforcement folks – we have put our DNA into various databases at various testing companies because our kids gave us a Christmas gift so we could see whether we should wear Lederhosen or a kilt (not that we cared, but we obliged the kids), or we wanted to see who our cousins were, or we wondered when someone got Finnish blood, or whatever . . . we just were a little crazy. And we aren’t crazy to get our DNA into some database for the cops to find a serial rapist, at least not on the same level over the same number of years . . . and we are all alive . . . the dead ones are also in the current databases.

  7. I guess the DNA testing company field is like any other, in that the company that becomes the favorite or most popular among customers is the one that ultimately wins the market. I don’t think that this is happening so far with DNA testing, but in the U.S. both 23andMe and Ancestry.com both have the largest presence, so large that most people don’t even know of any other companies.

    Both Ancestry and MyHeritage offer research databases separately, which is a plus for those who use those plans and then decide to do a DNA test. The databases and the DNA test results are well linked by both companies. 23andMe and FTDNA don’t have that advantage, but do have large entities behind them (Gene by Gene for FTDNA) which help to pay for the cost of operation. But I agree, it could be damaging for FTDNA if Ancestry starts accepting transfers.

    For my own ancestry, transferring kits to MyHeritage has been more productive. I, myself, have tested at 23andMe and Ancestry, as well as transferring to FTDNA and then to MyHeritage. Around ten years ago, I had managed to persuade a good number of relatives to test at FTDNA, then did mtDNA and Y-DNA testing on a few, and later a couple of them tested on their own at Ancestry or 23andMe.

    I don’t think transferring any of the kits to Ancestry for my relatives, deceased or otherwise, would yield more of the matches I need (which would be those living in Europe), but I could be wrong. Most of the matches I have there are from the U.S., and I don’t have colonial or otherwise long in the U.S. ancestors. Mine range from Irish famine immigrants, and later those immigrating otherwise from Europe, north and south.

    I have used the matching segment data with DNA Painter (for matches at the companies that do have a chromosome browser) for some of my relatives. Although I’ve neglected that for a while, I found it very interesting and potentially valuable, if I would make the time to utilize the information about which segments come from which ancestors with new matches.

    1. You make some good points. As for winning “the market,” we should consider that there’s not one market. Ancestry is a genealogy company. 23andMe is a biomedical company that happens to be useful for genealogy. MyHeritage is also a genealogy company, albeit with a different geographic footprint than Ancestry. FTDNA is the only real option for Y-DNA and mtDNA. They may all be able to co-exist for a very long time.

      1. 23andMe recently changed their comparison statuses of Yes and No to Compare. As my Yes rate is between 5 and 10% this is now just a time waster for me. I might as well watch the lawn grow. And I told them so.

  8. I don’t understand all of this…….but I truly fail to see why anybody would be sorry to see Murderers (Especially of more than one person) brought to justice…and grieving families at least given “answers” & “closure”!! If it were my child or other loved one…I would certainly like to know that the utmost of available “intelligence” & “methods” was used to solve the case!! (& perhaps those innocent persons charged , exonerated!!!) I have a fairly large Tree on Ancestry, a good bit of which is due to DNA. I am a 75 year old woman….

  9. You make a good case for Yaniv. I really think someone somewhere is guilty of fraud and should be prosecuted. At the very least there clearly is unprofessional behaviour occurring and sanctions are warranted. As none of this is apparently happening, I truly hope someone is planning a class action. This has gone on too long.

  10. Here’s a Con: What I see coming is the government for monopolizing the ancestry dna. Putting the little guys out of business. Unfair competition. This is what usually happens when companies get too big.

    I happen to like that FTDNA has surname groups you can join. I truly hope they do not go out of business.

    I appreciate the person you are not naming for helping to solve cold cases. That peace of mind to those families is far more important than earning another dollar for a large corporation.

    I do love the improvements Ancestry has made and I’ve used them from the beginning of my family search.

    I am wondering if the Ydna I have on FTDNA will be able to be used like Ydna is intended to be used to solve paternal lines particularly with popular surnames.

    I hope they go a little slowly with this process. I know someone who has been using FTDNA since they began and had all of his findings published on roots web. Only now that gone. He has been posting all he has found on the FTDNA site. If that goes down, where does that leave all of his work again? I’m concerned that we will have lost history. Sorta of like the One Million (or something similar named on ancestry) records or index that were lost and we are left wondering why we used information. Now you can’t even search the source.

    I’m up in the air. In one hand I like this and in the other I am concerned.

  11. If Ancestry did provide a chromo browser, can we even imagine how dastardly that would impact their phone/customer service? This whole genetic genealolgy business is a huge learning curve. Can you imagine the zillions of calls Ancestry would get each day from people who have not even started their learning curve? And then the complaints about not getting through to customer service? With Ancestry’s huge data base………Never will they have a chromo browser. Their “Shared Matches” feature is somewhat helpful for me.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.