Yaniv Was Right

At a family history event in mid-August, a company employee let slip that AncestryDNA was considering allowing DNA files created by other companies to be uploaded into their database.  This report was confirmed to me directly by three independent sources.

If this were to happen, it would rock the genetic genealogy world, in more ways than one.  Let’s think this through.

The Good

Genetic genealogy is a numbers game, and AncestryDNA has—by far—the largest genealogical DNA database out there. It’s about as large as the other direct-to-consumer databases combined.  Almost everyone will benefit by having their family members in the AncestryDNA database.

That is not always possible, however.  There are thousands of people in the smaller databases who have since passed away.  Many of them tested at the request of avid genealogists who would dearly love to leverage AncestryDNA for their research but can’t.

These genealogists have often invested thousands of dollars to test extended family and would be willing to pay a little more to get those DNA kits into AncestryDNA’s database.  It would be a win-win:  a new revenue stream for AncestryDNA and a new life for legacy kits that are now stranded.

The Bad

Uploads at AncestryDNA would upend the genetic genealogy industry.  Genealogists managing legacy kits would no longer need to cajole their AncestryDNA matches to transfer to the smaller databases to evaluate matches; they could simply transfer their own family kits the other way.

Smaller companies would falter, because there would be little incentive to upload there anymore.  I’d hazard that 40% of the kits at MyHeritage and FamilyTreeDNA tested elsewhere, so growth at both places would be seriously impacted.  MyHeritage has a solid foothold in Europe, so I think they’d adapt and survive.  I’m not so sure about FamilyTreeDNA’s autosomal database.

As much as I’ve questioned FamilyTreeDNA’s actions over the past few years, I don’t want them to go under.  A strong market needs competition, and genealogists need the Y-DNA and mitochondrial DNA offerings that only FamilyTreeDNA offers.

Sites like GEDmatch that are 100% uploads might well collapse.

The Ugly

There is a more sinister side to the possibility that AncestryDNA might accept uploaded files:  law enforcement and bad actors could, nay would, invade the database, regardless of Ancestry’s terms of service or what consumers want.

We’ve already seen that some law enforcement agents are uploading to sites forbidden to them by both Department of Justice policy and the databases themselves.

We also recently learned that prominent forensic genetic genealogists—some of whom have spoken eloquently about ethics and trust—had been using a privacy hole at GEDmatch to see people who had not consented to law-enforcement matching.

It only takes a few bad apples to spoil everything.  If leaders in forensic genetic genealogy cannot be trusted to play by the rules, why should anyone else?  Ethical practitioners can’t compete with people who cheat to solve cases and draw fawning press coverage.

If AncestryDNA starts taking unvetted uploads, law enforcement will be there in a heartbeat.  The entire industry will suffer like it did in the wake of the Golden State Killer’s arrest.  You can see the damage in the graph below.

The long straight lines represent the growth rates at AncestryDNA (green) and 23andMe (purple) in the year prior to the arrest (thicker lines) and since then (thinner lines).  Sales took a huge hit when the public learned that law enforcement had infiltrated some databases.  The industry still hasn’t recovered.

There is a solution, though.

Cryptosignatures

In 2018, Yaniv Erlich, then the Chief Science Officer at MyHeritage, proposed that DNA testing companies cryptographically sign their raw data files.  (See the last paragraph of this scientific article.)  The cryptosignatures could then be used by other sites to authenticate a file’s origin before accepting it.

Such a system requires collaboration, though.  If AncestryDNA wants to accept uploads from 23andMe, or vice versa, both companies would have to agree to use cryptosignatures.  Each would have to negotiate its own arrangement with MyHeritage.

Back in 2018, there wasn’t much incentive for the big companies to agree to all that.  But now that AncestryDNA is considering uploads, the calculus has changed.  (Calculus does that. 😉)

FamilyTreeDNA and GEDmatch, which both collaborate with law enforcement, would also benefit from cryptosignatures.  They charge $700 per kit for forensic uploads, so law enforcement has a dual incentive to skirt the rules and upload as “normal” kits:  the agents avoid a hefty fee and get to see kits that have not consented to forensic matching.  Cryptsignatures would protect both the regular users at FamilyTreeDNA and GEDmatch as well as their bottom lines.

Yaniv was right!  It’s time for the DNA testing companies to adopt cryptographic signatures to protect the entire industry.

40 thoughts on “Yaniv Was Right”

  1. Change is the major constant in almost anything . . . and there will be changes . . . it just remains to be seen what will change and in what ways . . .

  2. Liah, How is a forensic upload identified in our match lists? I am so curious, and would like to know. I have seen a few squirrly-looking ones in my match lists. Years ago, I had 4 siblings as 4th C matches, and when the LE Opt-in/out at Gedmatch started, all 4 siblings kits disappeared from my match list. Too bad, I had already done a TG on them and saved. Does that make me wonder if they had something to hide? LOL

    1. They aren’t identified in our match lists. The agents uploading against ToS for the most part know not to self-identify.

  3. The reason we “cajole” matches to upload elsewhere is that Ancestry still lacks a chromosome browser. More matches without that are not fully useful.

      1. Good for you that you rarely need one. Personally, I use one extensively, as it’s the only way to identify which line a match comes to a kit through with any degree of certainty.

        Shared matches aren’t foolproof, as your matches can be related to each other by means other than their common ancestors with you. Without any segment-level tools, be they a proper browser or just a triangulation indicator, you’re just speculating.

        I’ve thought about signing raw data files at length for my own purposes, and whilst it would be good if only used to flag the provenance of uploaded kits, if it was used to block them that would be awful.

        Take the use case for synthetic kits for deceased ancestors. You can create these yourself if you’re skilled enough, but restricting uploads to only certified sources kills all that potential stone dead.

        1. I’ve “speculated” my way through several hundred solved cases at this point, and thousands more have been solved using WATO without any triangulation whatsoever. 🙂

        2. I think we may be talking broadly when I find many differences. For me Ancestry is unparalleled in finding matches out to 4th cousin.
          Leah, if you are working heavily for others, I suspect their mysteries lie within this range.
          I got that far many years ago, by documentation requested by snail mail over a year or so. Now I work out at 5C to 9C. Few trees go out that far and a high proportion have accepted an incorrect Ancestry suggestion or found something too convenient for themselves and are wildly wrong at the extremities. Trying to extend their trees for them is hampered by too many possibilities with identical names.
          I have a bucket load of part trees in Unlinked Family Clusters. They are being solved by chromosome segments although I continue to try every other available technique as well.

        3. How do you prove that the segments came from a specific ancestor and not through a different relationship path?

  4. My crystal bowl only sees collapse at Gedmatch, manly because user threshold is too steep for the average joe/jane. Ftdna’s owners (gene by gene) has licensed their kits to MyHeritage. My guess they crystal bowl says they will be able to restructure/downsize and survive for at least 5 years. Other companies like Livingdna will suffer more.

    Interoperability is one of the pillars in EU’s gdpr, the right to move your data will at some point force Ancestry and 23&me to accept imports to keep growing.

    Single most negative factor for Ancestry is lack of chromosome data. Many might need it, but I will never trust Ancestry unless I can validate by chromosome data

    1. GEDmatch will survive. Even zero growth in their database wouldn’t be a problem, since they’re not trying to make money (they’re a nonprofit).

      1. GEDmatch is not and has never been a nonprofit. Curtis Rogers sold it to Verogen for $15 million in December 2019, and Verogen was sold to QIAGEN in January 2023 for $150 million, with the GEDmatch portion estimated at $60 million. GEDmatch charges law enforcement $700 per upload.

  5. Why doesn’t law enforcement open their own searchable forensic website?
    Then we could see how many family matches we have with them.

    1. AncestryDNA announced that they would extend SideView to the grandparent level, so I suspect they’ll do that instead of clustering. Personally, I’d like both.

      Ancestry has long argued that a chromosome browser is a privacy risk. They’re right. I don’t see them introducing one any time soon, if ever. And if they do, it would be an opt-in system.

      1. The privacy aspect is often underrated. There are still relatives who are reluctant to test. Some come around if they come to realize they won’t be prosecuted by accident. Recent stories of police appearing on the doorsteps of non-offenders who just happen to be DNA cousins are not helping.

        1. It also doesn’t help that LE/FGGs are doing genealogical testing (as opposed to CODIS STRs) on relatives without their knowledge or consent.

  6. it’s not like millions of people would rush to get their DNA into the databases of the law enforcement folks – we have put our DNA into various databases at various testing companies because our kids gave us a Christmas gift so we could see whether we should wear Lederhosen or a kilt (not that we cared, but we obliged the kids), or we wanted to see who our cousins were, or we wondered when someone got Finnish blood, or whatever . . . we just were a little crazy. And we aren’t crazy to get our DNA into some database for the cops to find a serial rapist, at least not on the same level over the same number of years . . . and we are all alive . . . the dead ones are also in the current databases.

  7. I guess the DNA testing company field is like any other, in that the company that becomes the favorite or most popular among customers is the one that ultimately wins the market. I don’t think that this is happening so far with DNA testing, but in the U.S. both 23andMe and Ancestry.com both have the largest presence, so large that most people don’t even know of any other companies.

    Both Ancestry and MyHeritage offer research databases separately, which is a plus for those who use those plans and then decide to do a DNA test. The databases and the DNA test results are well linked by both companies. 23andMe and FTDNA don’t have that advantage, but do have large entities behind them (Gene by Gene for FTDNA) which help to pay for the cost of operation. But I agree, it could be damaging for FTDNA if Ancestry starts accepting transfers.

    For my own ancestry, transferring kits to MyHeritage has been more productive. I, myself, have tested at 23andMe and Ancestry, as well as transferring to FTDNA and then to MyHeritage. Around ten years ago, I had managed to persuade a good number of relatives to test at FTDNA, then did mtDNA and Y-DNA testing on a few, and later a couple of them tested on their own at Ancestry or 23andMe.

    I don’t think transferring any of the kits to Ancestry for my relatives, deceased or otherwise, would yield more of the matches I need (which would be those living in Europe), but I could be wrong. Most of the matches I have there are from the U.S., and I don’t have colonial or otherwise long in the U.S. ancestors. Mine range from Irish famine immigrants, and later those immigrating otherwise from Europe, north and south.

    I have used the matching segment data with DNA Painter (for matches at the companies that do have a chromosome browser) for some of my relatives. Although I’ve neglected that for a while, I found it very interesting and potentially valuable, if I would make the time to utilize the information about which segments come from which ancestors with new matches.

    1. You make some good points. As for winning “the market,” we should consider that there’s not one market. Ancestry is a genealogy company. 23andMe is a biomedical company that happens to be useful for genealogy. MyHeritage is also a genealogy company, albeit with a different geographic footprint than Ancestry. FTDNA is the only real option for Y-DNA and mtDNA. They may all be able to co-exist for a very long time.

      1. 23andMe recently changed their comparison statuses of Yes and No to Compare. As my Yes rate is between 5 and 10% this is now just a time waster for me. I might as well watch the lawn grow. And I told them so.

  8. I don’t understand all of this…….but I truly fail to see why anybody would be sorry to see Murderers (Especially of more than one person) brought to justice…and grieving families at least given “answers” & “closure”!! If it were my child or other loved one…I would certainly like to know that the utmost of available “intelligence” & “methods” was used to solve the case!! (& perhaps those innocent persons charged , exonerated!!!) I have a fairly large Tree on Ancestry, a good bit of which is due to DNA. I am a 75 year old woman….

  9. You make a good case for Yaniv. I really think someone somewhere is guilty of fraud and should be prosecuted. At the very least there clearly is unprofessional behaviour occurring and sanctions are warranted. As none of this is apparently happening, I truly hope someone is planning a class action. This has gone on too long.

  10. Here’s a Con: What I see coming is the government for monopolizing the ancestry dna. Putting the little guys out of business. Unfair competition. This is what usually happens when companies get too big.

    I happen to like that FTDNA has surname groups you can join. I truly hope they do not go out of business.

    I appreciate the person you are not naming for helping to solve cold cases. That peace of mind to those families is far more important than earning another dollar for a large corporation.

    I do love the improvements Ancestry has made and I’ve used them from the beginning of my family search.

    I am wondering if the Ydna I have on FTDNA will be able to be used like Ydna is intended to be used to solve paternal lines particularly with popular surnames.

    I hope they go a little slowly with this process. I know someone who has been using FTDNA since they began and had all of his findings published on roots web. Only now that gone. He has been posting all he has found on the FTDNA site. If that goes down, where does that leave all of his work again? I’m concerned that we will have lost history. Sorta of like the One Million (or something similar named on ancestry) records or index that were lost and we are left wondering why we used information. Now you can’t even search the source.

    I’m up in the air. In one hand I like this and in the other I am concerned.

  11. If Ancestry did provide a chromo browser, can we even imagine how dastardly that would impact their phone/customer service? This whole genetic genealolgy business is a huge learning curve. Can you imagine the zillions of calls Ancestry would get each day from people who have not even started their learning curve? And then the complaints about not getting through to customer service? With Ancestry’s huge data base………Never will they have a chromo browser. Their “Shared Matches” feature is somewhat helpful for me.

  12. Would the uploads to Ancestry be free (as they are at MyHeritage and FTDNA) vs this being a way for Ancestry to earn additional revenue (by charging a fee for uploads)?

    1. Two separate issues. First, 23andMe wasn’t technically breached; a cybercriminal used stolen credentials to log into user accounts. For those users, the breach was system wide. The cybercriminal had access to everything the users did, including raw data. Mothballing the chromosome browser is a move to protect the rest of us.

      AncestryDNA forbids law enforcement and their agents from using their services in investigations. Now that it’s known that some of the most prominent forensic genetic genealogists have been colluding to behave unethically, Ancestry can’t trust them to abide by ethical norms and Department of Justice policy. Again, to protect the 24 million people already in their database, they can’t accept uploads, at least not without cryptographic signatures.

  13. Leah, someone here mentioned finding cousins out to 9th Cousin. Years ago, it was “advertised” that autosomal Dna only picks up out to about 5th cousin. And how much shared Dna could a person possibly have with a 9th Cousin? Many people today only work with 15 cMs. You could not have a 15 cMs match with a 9th cousin. Has this school of thought changed?

    1. I’ve done computer simulations out to 8C1R, and even at that level there’s a (very slim) chance that any given cousin will share DNA. When they do, the average is about 11 cM and can be as high as 22 cM. Although only about 0.1% of 8C1R will match, when you consider that we have millions of 8C1R, there could be thousands who actually share DNA with us.

      The bigger question is: how would we use a segment like that? To use DNA as proof that two people are related through Ancestor X, you need to show that the DNA couldn’t have been inherited any other way. For close cousins, that’s easy to do. For 9th cousins, it seems almost impossible.

      1. It always depends on the situation, tested people, shared segments etc.

        On my paternal side, I have a few matches that are (according to papers) 9C to me, triangulate on the same segment and share the same ancestor. However, I cannot reliable verify that it is correct or just a co-incidence.

        On the other side, on my maternal side, I have a long list of matches that all connect to ancestors on one specific line, starting at 2C1R – 8C. Almost every ancestor on that line has some match and most of them shares at least a small part of the specific segment on chr17 with me.

        1. If it looks like most of your ancestors have contributed DNA to you, that’s a good sign that something is off in your analysis. You probably did inherit at least some DNA from each of your 4GGP, but beyond that, the chance of inheriting DNA from any given ancestor drops sharply. If these matches are mainly on the same small segment, you’re probably looking at excess IBD (a.k.a., a pile-up) rather than an indicator of relationship.

        2. I’m talking about one specific line. If it was pile-up segment then there would be a lot of unidentifiable matches. But that’s not the case. All these matches are reliably connected to the ancestors on one specific line.

          If you have a segment inherited from your 4GGP then it had to naturally be inherited from his parents. So it still comes from your 5GGP. Although a part can be from his father and a part from his mother, it still can be whole from one parent only.

        3. We should always ask ourselves ‘what else can explain what I’m seeing?’ Then work to rule out those alternative explanations. This is especially true of distant matches, where the chances of matching at all are so slim.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.