Genealogy and the Golden State Killer

This post has been updated.

On April 25, 2018, news broke that Joseph James DeAngelo was arrested as the serial rapist and murderer who terrorized California in the 1970s and ’80s. The criminal was variously known as the East Area Rapist, the Original Night Stalker, the Visalia Ransacker, and the Diamond Knot Killer, with brutal rapes and murders committed in 10 counties across the state.  Not until years after his last known attack in 1986 were DNA methods available to connect all of those crimes to the same man. Yet, for decades, no one knew who he was. The arrest was the result newer DNA technology being brought to a very cold case that had resisted decades of prior investigation.

But, what were those methods? The Sacramento County Sheriff and District Attorney have been very circumspect in describing what was done. Details are still trickling out, but it’s clear that they used the same methods that genetic genealogists use to identify unknown parents and grandparents.

Consider these quotes:

  • “In some way, a DNA link was identified between the suspect in the case and an unknown member of the public who also shared various components of DNA with a bunch of other people, several dozen people. And so the Sheriff’s Department then had to take that information, whittle out the people that couldn’t possibly be suspects, and then zero in on people who could be suspects. Which is how they ended up at Mr DeAngelo’s door.” – Bob Moffitt in an NPR interview, 25 April 2018
  • “It was DNA that was related. That’s all I can say,” [Sacramento County Sheriff Scott] Jones said. “I can’t say it’s a family member’s DNA. I can say there was, in employing this technology, there was a link between the DNA that we had and the potential for examining a universe of folks of which our guy was a member.” – Capital Public Radio, 25 April 2018
  • “His DNA was not in a criminal database. I would say it that way.” and “The arrest warrant is under seal, and the mechanism of the DNA will eventually come out but as I say it’s really some innovative DNA work that led to him being a suspect and then the sample ultimately which identified him through his DNA.” – Sacramento County District Attorney Anne Marie Schubert, in an interview with Megyn Kelly, 26 April 2018

All point to a genealogy-based search, which was confirmed today by multiple news outlets, including the New York Times.

You probably have questions about how they did it, and I’ll answer to the best of my ability. I want to be clear about a few things first. I do unknown parentage searches every day, but I was not involved in this case. And although I love the science and laud the outcome here, I have deep misgivings about a genealogical database being used by law enforcement without the knowledge or consent of its participants.

The Methods

Imagine that you are an adoptee who wants to find her biological parents. You would submit your DNA to one or more genealogy-oriented testing companies and hope to be matched to close relatives. In this case, “close” means they share enough DNA with you to be 3rd cousins or better. Sometimes you strike gold and find an uncle, cousin, half sibling, or even parent, but usually the search takes a lot of work.

Closer matches are better of course, but the strategy stays the same: you pore through the family trees of your closest matches looking for connections among them. If you have a few matches who are all descended from John Jacobs and Mary Mallone, it’s a good bet that you’re descended from them as well. You then flesh out the families of these probable ancestors looking for candidates who were in the right place at the right time to be the person you’re seeking.

But how did the police get this guy’s DNA into a genealogy database, given that they didn’t know who he was in the first place?

We can use the case of the Jane Doe known as Buckskin Girl to understand the methodology. In 1981, a young woman was found murdered in a roadside ditch in Troy, Ohio. Her identity was unknown. Recently, a blood sample collected during her autopsy was rediscovered and subjected to whole genome sequencing. Although the sample had degraded over the years, the lab was able to recover more than half of her genome. That was enough to create a mock genealogy DNA test that could be uploaded to GEDmatch.com, a private, third-party database meant to be a common meeting ground for genealogists who tested at different commercial companies. Volunteers for the DNA Doe Project lucked upon a close match to Buckskin Girl who turned out to be a first cousin once removed. That cousin led them to the identity of Marcia King after 37 years of anonymity.

Facial reconstruction of “Buckskin Girl”, Marcia King, from the National Center for Missing and Exploited Children

This process is almost certainly what happened with the Golden State Killer, except that the DNA sample came from crime scene evidence rather than an autopsy, and the initial matches may not have been as close. (I infer this from statements by law enforcement that they had to filter through a hundred or so candidates.) AncestryDNA, 23andMe, and MyHeritage, and Family Tree DNA all denied that authorities had asked them to use their databases. Two days after the arrest, The Mercury News confirmed that the database used was GEDmatch.

GEDmatch has since issued a statement:

April 27, 2108 We understand that the GEDmatch database was used to help identify the Golden State Killer. Although we were not approached by law enforcement or anyone else about this case or about the DNA, it has always been GEDmatch’s policy to inform users that the database could be used for other uses, as set forth in the Site Policy (linked to the login page and https://www.gedmatch.com/policy.php). While the database was created for genealogical research, it is important that GEDmatch participants understand the possible uses of their DNA, including identification of relatives that have committed crimes or were victims of crimes. If you are concerned about non-genealogical uses of your DNA, you should not upload your DNA to the database and/or you should remove DNA that has already been uploaded.To delete your registration contact [redacted]@gmail.com

Two Ideas, Both Important

There are two key ideas vying for attention in these stories. One-the most obvious-is the incredible power of genetic genealogy to bring closure and justice to truly horrendous tragedies. If enough DNA is left behind, cases like these can be solved, even decades later. And knowing that they will be caught eventually may be enough to deter violent criminals in the first place.

The second involves ethical considerations that might seem secondary immediately after the arrest of so evil a man. Nonetheless, the long term implications are worth considering. I wish I had answers, but I have mostly questions:

  • Should law enforcement be accessing private DNA databases that were created by and for genealogists pursuing a hobby?
  • Does this violate the 4th Amendment right against unreasonable searches and seizures? Or perhaps the General Data Protection Regulations (GDPR) set to take effect soon in the European Union?
  • What happens when the wrong person is identified publicly?
  • What happens when a case is solved using DNA that was put into a database by someone other than the tester?
  • Should the genealogists doing this work have formal qualifications?
  • Could fear of government overreach deter people from testing in the first place or cause them to delete their results? This would harm both genealogy and forensics.
  • Worse, might people sue a database if they learn that their data was used for a purpose to which they hadn’t consented?
  • Could defense attorneys use an “unreasonable search” argument to throw out evidence against their clients.

Perhaps the best solution would be for law enforcement to create a separate database comprised only of people who have given explicit informed consent for forensic uses. The responses to both the Buckskin Girl and Golden State Killer cases have been largely positive, so a volunteer database would likely grow rapidly.

In the meantime, those of us doing genetic genealogy as a hobby or profession should ensure that everyone we ask to test or transfer into a database understands how the databases are being used first. In addition to our standard warnings that family secrets might be uncovered, we must tell them that their data might be used by government authorities.

What Do We Tell Our Relatives?

When considering where to test or transfer, the Terms of Service are important considerations. Here’s what the main genealogy databases have to say:

  • AncestryDNA is a testing company that does not accept data from other sources: “Any saliva sample you provide is either your own or the saliva of a person for whom you are a parent or legal guardian.” A sample submitted by law enforcement would probably violate these Terms.
  • 23andMe is a testing company that does not accept data from other sources for relative matching: “You are guaranteeing that any sample you provide is your saliva; if you are agreeing to these TOS on behalf of a person for whom you have legal authorization, you are confirming that the sample provided will be the sample of that person.” Law enforcement might be considered to have legal authorization over a forensic sample, but since 23andMe does not take transfers, getting a degraded sample into their database would be quite difficult.
  • MyHeritage is a testing company that does accept data from other sources: “you represent that any DNA sample you provide and any information that you transfer or upload that associates an individual with his/her DNA Results are either your DNA or the DNA of a person for whom you are a legal guardian or have obtained legal authorization to provide their DNA to us.” Law enforcement might be considered to have legal authorization over a forensic sample and could transfer a file from a private lab into their database.
  • FTDNA is a testing company that does accept data from other sources. They do not appear to have any restrictions on who may submit samples to their database. Law enforcement could transfer a file from a private lab into their database.
  • GEDmatch is a third-party site that only accepts data transferred from other sources: “Please acknowledge that any sample you submit is either your DNA or the DNA of a person for whom you are a legal guardian or have obtained authorization to upload their DNA to GEDmatch.” (From the upload form rather than their Site Policy.) Law enforcement could transfer a file from a private lab into their database.

Ensuring that we and our relatives are fully informed and consent to how their DNA is used will protect us all. We now know for a fact that law enforcement is using the GEDmatch database, so that must be taken into consideration when deciding whether to transfer data there.

Additional Information

Updates to This Post

  • 9 December 2020 — Some dates added to quotes for historical context.

55 thoughts on “Genealogy and the Golden State Killer”

  1. Totally disagree with your assessment. I feel like you are crying wolf a little too much here. Sorry but it’s just not as big a deal as you are making it to sound in my opinion.

    1. As long as users understand how the databases are being used, they can make their own decisions on whether to participate.

      1. Anhinga Opal Birdstein
        2 mins ·
        I am a huge fan of the GEDMATCH site and have no problem with using DNA to solve murders, and to identify veterans, and homeless. DNA is just another tool to ID people; as fingerprint ID was once questions, now DNA is questioned.. just human nature not understand and accept new things.

        1. The question is not whether you have no problem with using DNA to solve murders, but whether it is an unconstitutional search of the 900,000 people at GEDmatch who did not give informed consent for this use.

    2. You have every right to disagree. The courts will have to decide whether this was an unreasonable search.

  2. I share your concerns Leah, re: lack of informed consent. It also begs the question who else is fishing in databases that were established for genealogical purposes and what are the potential consequences of this? We all want to see the bad guys taken off the street. We all want Does to be given an identity and for their families to have closure. But using these genealogical databases for other secret purposes can have unintended and devastating consequences. They can be used in ways we never imagined. I’ve seen numerous discussions about this on social media, and the issue of informed consent for all people who have DNA in those databases has clearly gone over the heads of many of the people commenting. I then have to ask how many relatives who have tested for these people have given informed consent, when many of those commenting do not fully understand the issues?

    1. You realize that when YOU give consent, you are also giving consent for 50% of your mother’s and 50% of your father’s DNA too, right? Genomes may seem personal, they are very much shared.

      What and how to handle a person’s DNA is not like anything that we have had to deal with before, and simplistic concepts of consent are no longer applicable.

  3. I don’t see a privacy problem with law enforcement using these databases for legitimate violent felony-solving purposes. Why should we be concerned about the privacy rights of a violent felon? By the same token, a ‘head in the sand’ approach to the government’s (and the private sector’s for that matter) potential data exploration and exploitation won’t do either. The key here is ‘legitimate violent felony-solving purposes’ and that will require judicial approval of any such governmental use (i.e. a dna database warrant, not a secret ‘FISA’ warrant, but a regular crime-enforcement warrant) so as to strictly limit the use to this, and only this, legitimate purpose. Anyone who has committed a violent felony, or the relatives of such a person who act as accessories, use these databases at their own peril. All others should continue to enjoy the benefits of genetic genealogy without fear or anxiety. I think it is clear that blood-relatives of a violent felon bear no responsibility for their relative’s criminal behavior unless they somehow participated in or attempted to conceal the crime after the fact.

    1. We aren’t concerned about DeAngelo’s privacy. We’re concerned about the privacy of everyone else in the database.

      1. So don’t upload your data. It’s not like there’s any need to sequence yourself or to put your DNA info into a public database. Everyone who consents to put this type of information into any sort of a public repository has surrendered any expectation that that information is private.

        1. That’s a bit like saying “if you don’t want to be frisked by the cops, don’t leave your house”.

    2. Clearly you do not understand the concept. The Constitutional protections are for EVERYONE including felons. They exist to prevent government from running amok. Which government REGULARLY does. DNA is not foolproof but law enforcement and prosecutors have NO scientific understanding and little interest in protecting anyone’s rights. And I work with law enforcement and prosecutors every day. You know what interests them…the number of cases closed and the number of convictions obtained…irrespective of the ACTUAL guilt of the person convicted. Half of the people I went to school with have left criminal prosecution because of the lack of ethics. Schubert doesn’t care is DeAngelo is guilty or innocent, she only cares if making this arrest gets her re-elected in November…I have first hand knowledge of this.

      But aside from the government, what about employers? Insurance companies? Identity thieves? Or worse, terrorists out to identify an involuntary donation.

      Law enforcement had to violate the TOS of GEDMatch to enter the data. They LIED about the identity of the submission because they DIDNT KNOW the identity. GEDMatch requires real name use when creating an account and only allows you to upload DNA data of the user or a person the user has legal control over. Law Enforcment met neither criteria. They LIED again.

      Do you really want Law Enforcment to lie? Schubert used Law Enforcment Officers to lie for her because it is actually illegal for her to tell this type of material lie.

      If they’ll lie today they’ll lie tomorrow and maybe tomorrow it won’t be to catch a murder but to frame the innocent.

  4. I have seen well experienced and highly trained genealogists make mistakes in relative identification in these databases, I’m worried most that the LE folks using the databases for familial matching are most likely less skilled at it than a typical hobby genealogist. How do they correct a mistake in identification of this magnitude if one is made? My hope is that whoever is working these cases truly knows what they’re doing because if they don’t the potential for errors is very high.

    1. That’s an important point, especially because there is currently no certification in genetic genealogy. While I have every reason to believe that the genetic genealogists who did the actual work on the Golden State Killer and Buckskin Girl cases were qualified, the attention these cases are getting will draw all manner of people in to offer their services to law enforcement. And LE will have no way of knowing which are skilled enough to do this work.

      That said, the killer’s ID was confirmed using CODIS markers before he was arrested, so LE had no doubt.

    2. Take it from someone in the industry. They don’t know and they don’t care that they don’t know. They only care about political advantage.

  5. This may be a problem that solves itself. Soon after I uploaded to My Heritage I got an email offering me the chance to add nine new ancestors to my tree fom the Family site of person X. I was appalled. I have invested nearly fifty years to family history and did not know person X from Adam (so to speak), and so had no idea if the paper trail justified such action. And I try to protect my work from mistakes. (By the way, I will keep the clues, but not add to my tree until I am sure). Then, hardly a day goes by that I do not see an egregious error in someone’s tree.
    My point is that the various databases are becoming so filled with erroneous information that in another few years they will be useless to anyone trying to simulate a paper trail (which law enforcement needs) from a DNA database.

    1. Those of us who do unknown parentage work (the same search strategy used to find the killer) do exactly what you’re planning to do with the hints: we take them under advisement and validate them before reaching any conclusions.

  6. “23andMe is a testing company that does not accept data from other sources for relative matching”
    You posted 2 days ago about 23andMe accepting transfers from Ancestry for one day only. So their statement is not entirely true. Perhaps they meant to add “Except when we feel like it.”
    I think it’s also important to note that some companies, 23andMe and Ancestry I think, will sell your anonymized data to 3rd parties and you cannot opt out of this. You can opt out of transfers to 3rd parties at FamilyTreeDNA and MyHeritage. If police found a match in a anonymized 3rd party database, you can bet that they would get a warrant to “de-anonymize” that sample.

    1. They do not accept transfers for relative matching. The one-day free transfer offer includes only ethnicity estimates and a few trait reports, but not cousin matching. So, even if law enforcement had uploaded the killer’s DNA to 23andMe, all they would be able to tell is his percentage of Southern European and his muscle composition.

      Also, it is incorrect that you cannot opt out of the research collaborations at AncestryDNA and 23andMe. AncestryDNA, 23andMe, and MyHeritage all have explicit consent policies that are available to read at any time. If you do not opt in, your data is not shared with their 3rd party affiliates. FTDNA says they also have a consent policy that they offer to select individuals, but I no of no one who has seen the text of it.

  7. Nice post. Also note that 23andMe violated their own TOS just this week on DNA Day, because they accepted raw AncestryDNA data into their database. A raw data file can easily be prepared to look exactly like it came from AncestryDNA.

    1. The transfers from AncestryDNA only get ethnicity reports, three trait reports, and something called “Your DNA Family”, which shows an overview of where your DNA matches come from and some of their traits. Transfers are not eligible to see their matches are, so even if law enforcement had spoofed a kit to upload to 23andMe, it wouldn’t have helped their investigation.

  8. Leah I think it needs to be made clear that California has a DNA familial search law. The law specifically sets forth how family DNA can be used and when.

    1. That’s an excellent point. Whether using GEDmatch falls under the CA familial search law (which relates to criminal databases) will have to be decided by the courts.

      1. It certainly is interesting. As GEDmatch is a public database, you really aren’t entitled to any sort of privacy. Other than what you create yourself when you upload your DNA. My guess is that California followed their familial search law and then used the databases available to them, both the criminal and public database.

        1. The database is free and easy to access, but it is not public. You have to log in to see it.

        2. There is no legal expectation–unless it is stated explicit–that any information you give to (or really, don’t explicitly withhold from) a private company is in any way private. Whether or not it’s an email address, a search history, or annual income, one you put it that information into the hand of a private company it is de facto public information.

    2. CA doesn’t have a familial DNA search law it has a “policy” issued by the California Department of Justice. It pertains exclusively to searches in CODIS and has existed since 2008 and was issued by then Attorney General Jerry Brown.

  9. I never thought about this when I started dna 2 years ago
    However thinking about it and doing some rough maths suggests that everyone is going to have approx 10000 4th cousins and 100000 5th cousins – some of which are going to be criminals.
    It puts the tester and uploaded and kit administrator if they are a different person to the tester in a difficult position – the criminal is an offender so do you want to dob in your own relatives.
    Yes I do.
    But what happens if they are multi million $$$ drug Barrons and come to track me or you down in 20 years time?
    Are there corrupt police and can evidence be tampered with.
    What happens if you are falsely accused and an executive order from the president which includes a deportation without trial clause based upon police evidence results in your permanent deportation?
    Will the golden state killer get off on a technicality such as a possible twin or tampered evidence.
    Can it implicate your own children or grandchildren in the future.
    Divergent comment – there was an article in a scientific journal several years ago suggesting that a non-usa student didn’t attend Stanford University – he did attend and counter claimed that a pot plant or tree or similar had moved strongly confirming his attendance.
    These scenarios strongly recommend that people keep a good diary and receipts for everything – you might need it one day.

  10. I wonder if forward-thinking criminals would send in their own samples under a foe’s name? This is likely to be an extremely rare occurrence. So far, DNA evidence has been thought to be free of this kind of error, unlike other sources of genealogical evidence.

    1. At the moment, final confirmation in a criminal case still needs to be through the CODIS database, so the mistake would be quickly discovered.

    2. Not so. DNA testing with minuscule sample amounts can capture transfer DNA. Cops thought they had a killer red handed. Tracked a woman whose DNA showed up a numerous crime scenes back to the VINYL GLOVE manufacture the police use for crime scene gloves. Just two weeks ago scientific America had an entire article about a homeless guy who was locked up for months as having been identified on DNA alone as a killer…turned out he was an admitted to the hospital at the time of the murder with a ETOH level almost 5 times the “legal limit”. He couldn’t open his eyes let alone kill someone. Turns out that three days before the murder he and the victim were in the same bodega together for like 3 min. The cops had the guy so convinced he had to have committed the murder he was ready to take a plea for LIFE IN PRISON until his PD found his hospital records.

      Scientific studies demonstrate that at any given moment more than 20% of the population on the planet has the DNA of someone else under their fingernails. Sometime from someone they have never even met because it was transferred to someone else who met them. DNA found under fingernails is a prime source of “incriminating” DNA for rapes, murders, and great bodily injury cases. But if 20% of these victims will have DNA under their fingernails from people that had nothing to do with their crime, a whole lot of people have been and will be convicted of serious crimes on the basis of the “unassailable evidence” yet be factually innocent of any wrongdoing.

      DNA is not reliable for crime detection or prosecutions. Never had been. Never will be. In a free society DNA use for Law Enforcment purposes should be banned. Something more than being in the same store at the same time, or even three days later, should be required to convict but sometimes all there is is DNA and the criminal justice system lacks the intelligence or desire to know there is a problem.

  11. Maybe you are right about FTDNA.
    But when AncestryDNA shaved a few locations off their test last year, FTDNA could not upload those AncestryDNA records.
    This strongly suggests that crime tests would NOT be able to be uploaded there.
    And people who have signed up for FTDNA tell me that they had to indicate that they had legal right to submit the DNA – the same as at the other companies.
    Yes, it is hard to find such info on the FTDNA website, so why did you not ask them???
    Please condemn their website all you like, but research their performance appropriately.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.