A Comparison of GEDmatch and the FBI’s CODIS Database

The arrest of the Golden State Killer has sparked debates about the ethics of government representatives using a private genealogy database for an official investigation and whether that violates the 4th Amendment protection against unreasonable searches.

The Federal Bureau of Investigations maintains an integrated database of genetic data expressly for use by law enforcement and the military. It’s called CODIS (for Combined DNA Index System). The DNA markers in the CODIS database were selected specifically for one-to-one identification and to have no biomedical relevance.

Comparing CODIS with GEDmatch is an interesting exercise; the criminals in the CODIS database have more privacy protections from government intrusion than do the genealogists at GEDmatch.

maintained by FBI private citizens
purpose of database criminal, missing persons, & military investigations genealogy
source of data accredited forensic laboratories any lab with the right equipment
# of participants > 15 million < 1 million
DNA participants convicted offenders, missing persons, forensic samples, military genealogists & their relatives
participation mandatory for felons voluntary
profile deletable not for felons yes
geographic coverage US global
identifying data no sometimes
contact information no email addresses
marker type 20 atDNA STRs; mtDNA or ySTRs in some cases 600,000+ SNPs
biomedical data no yes
who can access it state, federal, and military officials anyone, anywhere
use for familial searching legal in 12 states with varying restrictions; banned in Maryland, DC no restrictions
limitations close relatives only close to distant relatives

29 thoughts on “A Comparison of GEDmatch and the FBI’s CODIS Database”

  1. Can a documentary be made about this whole case only bigger and better than the French connection And A scream in the dark series

  2. Why haven’t users of CODIS started using SNP data long before this? Then the same users can develop their own genealogy research labs to create “one world tree”, and (hopefully) far more accurately than what family historians have to deal with as an enjoyable hobby.

  3. I don’t think that you can map the CODIS STR markers to autosomal SNP data. So, if seems that a SNP based, or sequencing data needed to be collected from a criminal sample (semen?). What kind of data was collected? Who collected the data? Has this been discussed?

    1. No, you can’t map the CODIS STRs to SNPs. The forensics lab was able to create a data file that mimicked one of our genealogy SNP tests from a crime-scene sample.

  4. You seem to have left out some critical comparison categories such as participation: government mandatory, GEDmatch voluntary and ability of participant to delete data: government no GEDmatch always

    1. Thanks Curtis, I added a field for deletion. At the moment, I’m not comfortable saying that GEDmatch users volunteered to be searched by law enforcement, because they weren’t told that was happening when they uploaded. I would very much like to see the wording “intended solely for genealogical research” removed entirely from the Site Policy. It mitigates the import of the other uses, which are the ones people will find objectionable.

      1. I think saying Criminals in CODIS have more privacy protections from government intrusion than GEDmatch users is misleading and obscures the real issues. CODIS and GEDmatch is comparing apples and oranges in nearly every aspect. You may as well say users of CODIS have more privacy protections than users of Facebook. Curtis is absolutely right. His Terms and Conditions are as clear as they can be, and participation has always been voluntary. You may personally feel that people needed and do need more help in interpreting the language and in imagining purposes beyond genealogy. But that is not the same as saying they participated involuntarily. Let’s continue the discourse surrounding “informed consent” in a more even-handed manner, as other bloggers are striving to do.

        1. The people in CODIS *do* have more privacy protections than the users of GEDmatch or Facebook. Unless they are a hit, LE can’t see anything about them. Their names aren’t even stored as part of the CODIS record.

          If the Site Policy (what you’re calling Terms and Conditions) said “You understand that anyone, anywhere in the world, can use your data for any reason”, and omitted “intended solely for genealogical research”, then they would be clear. And if the Policy was clear and informed consent could be assured, there would be no issues with law enforcement using the database.

        2. Felons are compelled to be in CODIS. Of course the protections should be different. Are you advocating that GEDmatch users’ results be locked down in the same way, or that privacy protections should be lifted for those in CODIS? The statement that “the criminals in the CODIS database have more privacy protections from government intrusion than do the genealogists at GEDmatch” is a specious argument and seems intended not to clarify but to provoke.

          I am betting that most of the people who are switching kits they manage to research mode, or deleting them, are themselves leaving their own kits public. I don’t argue with doing either. But it does seem to suggest that those experts – at the end of the day – are not themselves finding the privacy issue so onerous.(See Kitty Cooper’s blog yesterday: “Although I have sympathy with the concerns of people who fear false identification using DNA techniques, this is not my fear.”) You may be an exception. That’s fine. But the fundamental argument that’s been made over the past few days has been that everyone should make this choice for themselves – with informed consent. In that spirit, let’s give them objective information and balanced arguments pro and con (both do exist) so that consent is truly informed. (Again – Kitty Cooper’s blog is a great example of how to do that.) I don’t see your statement serving that purpose. None of us want to frighten people away from GEDmatch. It is a wonderful, wonderful resource. Give people some facts, educate them, and let them decide for themselves.

        3. This post is an objective, side-by-side comparison of basic facts. If I’ve left something out, please let me know and I’ll consider adding it. I’ve already edited the table based on input from Curtis Rogers of GEDmatch.

          Informed consent is precisely what I am advocating for. I would love to see an affirmative opt-in feature for law enforcement use. Because that is not currently an option, I want to give readers the information they need to decide for themselves whether to stay at GEDmatch. If they do, great! And if they are not comfortable with how GEDmatch is being used (and could be used in the future), they *should* be frightened away.

          I don’t have a dog in this race; I am not working on criminal cases or the DNA Doe Project, and I have used GEDmatch for years. if I did have a conflict of interest, I would disclose it.

  5. Thanks, but I don’t understand.
    I understand how a data file could be created to mimic the needed file format, but I don’t understand where the data came from to create this file. Given that a STR locus allele can’t be directly linked informatively to a SNP call how is a connection made to SNP calls?

    I could understand this if an individual with a CODIS near match was found and if a SNP data file from that individual was “available” or if additional SNP or sequencing data was collected from the crime-scene sample.


    1. They used a rape kit from one of his old crimes to create a SNP profile from him. It was new lab work, not an analysis of the CODIS markers. Although I haven’t seen the methods officially stated, what they most likely did was whole genome sequencing on the rape kit sample, then they extracted the SNPs that are used in genealogy and created a file that GEDmatch would take.

      1. They might have done a targeted sequencing approach. With WGS all of the DNA in a sample is sequenced. Microbial DNA reads can be a problem when sequencing some kinds of DNA samples. Microbial DNA isn’t a problem with PCR or sequence capture methods.

        Do you know if they purify / enrich for sperm before they extract DNA?

        It was quite a nice piece of work. I hope they get to present or publish this.

        1. I wasn’t involved in the investigation, so I can only tell you what I think they did rather than definitive answers. Assuming they did WGS, they would have had to compare the results to a reference genome, which would have told them where the relevant SNPs were to extract.

  6. You’re quite right, people who don’t their scumbag relatives to get their just desserts shouldn’t upload their DNA results to GEDmatch.

    Meanwhile, back in the real world most people are applauding what the police have done, not least the ladies who live in the cities where the crimes were committed.

    1. Try rephrasing it: “People who don’t want their scumbag relatives to get their just desserts shouldn’t object to the cops rifling through their underwear drawers without a warrant.” There’s a lot more personal information in my DNA than there is in my panty drawer, that’s for sure!

  7. The crux of the issue is whether LE conducted an unreasonable search of the 900,000 unwitting users of GEDmatch. There would be no question at all if the users had been informed that their DNA could be used in a criminal investigation and affirmatively opted in to that use. If such a database were created, I’d be right there helping LE put monsters behind bars.

  8. Is there something missing from your table?
    I cannot see anyone’s DNA at GEDmatch if they don’t match me.

    I can also use an anonymous id and an anonymous email address if I want.
    The most vociferous critics of GEDmatch keep not mentioning this to the point where I am starting to wonder whether this is intentional.

    I don’t understand. You are usually such a balanced and informative commentator.
    This is quite out of character.

    1. You most certainly can see someone’s DNA information at GEDmatch if they don’t match you. You can pick any random kit number from your match list to see who they match. You can search the User Lookup for an email address. There are entire Facebook groups dedicated to people posting their GEDmatch kit numbers.

      And if your matches are close enough, it doesn’t matter if you use an anonymous name and email address. A good genealogist can figure out who you are.

      I appreciate the “balanced” compliment. Believe it or not, I’m holding back. There are so many nefarious possibilities (all 100% scientifically plausible) with this information that I’m thinking of giving up blogging and becoming a dystopian novelist!

      1. Just to clarify, you can see someone else’s DNA matches, but you cannot see their raw DNA results. As it states on the GEDmatch home page:

        “April 27, 2018 To correct a BIG misunderstanding, we do not show any person’s DNA on GEDmatch. We only show manipulations of data such as DNA matches .”

        I appreciate you know what you meant but someone else might misinterpret it.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.