This post has been updated.
Blaine Bettinger recently wrote a post entitled “Are You Doing Everything to Identify Your Matches?” with the dual goals of (1) helping genealogists research AncestryDNA matches who have little or no public information on that site and (2) educating test-takers about these tricks so that they can make informed decisions on what to reveal about themselves. Here, I want to use a similar model to show you how someone might determine whether you carry specific genetic markers so that you can protect your medical privacy, if that is a concern for you.
My goal is not to teach people how to invade your genetic privacy. However, there are voices in the genetic genealogy community who insist that what I am about to describe can’t be done or that it would be so difficult that the risks are negligible. To prove how easy it is, I’m going to have to explain how. I’ll use a case study for cystic fibrosis and the family of D.T. (I did this study at D.T.’s encouragement, and he has given me permission to share this story.)
Cystic fibrosis is a genetic disease that affects the respiratory and digestive tracts. It is caused by having two defective alleles of a gene called CFTR, short for cystic fibrosis transmembrane conductance regulator. (An allele is a variant form of a gene.) CFTR is located on chromosome 7 from base pair 116,907,253 to base pair 117,095,955, or roughly at base pair 117 million. People with one normal allele and one mutant allele of CFTR are called carriers; they do not have the disease, but they can pass on the defective version to their children.
D.T. discovered he was a carrier for cystic fibrosis by testing at 23andMe for genealogical purposes. He similarly learned that his mother, two of her maternal half siblings, and his own half sister were carriers, so he knew the defective copy of CFTR was inherited through his maternal grandmother. D.T. was curious about how the carrier status got into his family, because they are mainly of African descent, while CFTR mutations are most common in Northern Europeans.
To track the inheritance of the defective allele, all we have to do is triangulate its DNA segment, that is, position 117M on chromosome 7. Triangulation is a common technique in genetic genealogy in which you find groups of people who all match one another on the same spot in the genome then try to infer which common ancestor contributed that segment by comparing family trees. Segment triangulation is done using a visual tool called a chromosome browser. Family Tree DNA (FTDNA), 23andMe, MyHeritage, and GEDmatch have chromosome browsers; AncestryDNA does not.
In D.T.’s case, I imported his mother’s matching segments from 23andMe, FTDNA, and GEDmatch into the program Genome Mate Pro (GMP), which also has a chromosome browser feature. Then, I filtered the segments on chromosome 7 to only show those that overlapped position 117M and checked that those segments also matched her half sister, another known carrier. This triangulation confirmed that the DNA matches in question shared the defective CFTR allele, meaning they are also carriers for cystic fibrosis.
This screenshot from GMP shows what I found. The approximate location of the CFTR gene is circled in red.
I was able to confirm that D.T.’s half uncle and first cousin are carriers for cystic fibrosis. But, we already knew that. The real question was whether we could determine the medical status of someone outside the close family, someone who may themselves not know they are a carrier. Turns out, we can. The two distant cousins in the screenshot are also carriers. What’s more, they used their real names in the databases, so I know who they are.
This took me about 5 minutes to do.
What are the risks to privacy?
Anyone with an understanding of genetics and triangulation could find other DNA relatives who share specific alleles. But, before you freak out, there are some important considerations.
- First, most genetic traits aren’t that big a deal. I could list more than 100 of my DNA matches who have wet ear wax (chromosome 16), but you could probably tell just by looking at them (Dry ear wax is mainly an Asian trait.) and … really, who cares?
- Second, while I was able to find two “new” people who were carriers for cystic fibrosis in the example above, I couldn’t pick an individual person to target. The person would have to (a) have done a genealogy DNA test, (b) share DNA with a known carrier, (c) match that carrier in a place that correlated with the relevant gene, and (d) be part of a triangulation group.
- Third, this method will only work using someone with a known medical condition as a starting point. I could not check which of my own DNA matches are carriers for cystic fibrosis, because I am not a carrier myself.
- Fourth, very few of your DNA relatives will have genetic diseases that are controlled by a single gene. Those traits are called “qualitative”, because they can take on only a few discrete states or qualities. You either have cystic fibrosis or you don’t. Harmful qualitative traits are rare, precisely because they are harmful and the genetics are simple. What’s more, almost all harmful qualitative traits are recessive, meaning that you need two copies of the “bad” allele to have the condition. (A probable exception is described here.) Triangulation can only tell someone what one of your copies is. In the example above, I know that those distant cousins to D.T. are carriers of the CFTR mutation, but I can’t tell whether they have cystic fibrosis or not.
- Fifth, most genetic traits are quantitative. Genetically, that means they are controlled by multiple genes that interact with one another and the environment in complex ways, with the resulting traits falling along a range rather than having just a few discrete possibilities. Think height or skin color, which can vary widely, instead of blood type (A, B, or O) or cystic fibrosis (yes or no). From a privacy perspective, the chance that a random genetic match will be able to figure out your status at more than one of the controlling genes for a quantitative trait is almost nil.
- Sixth, even medical geneticists don’t fully understand how specific genes affect quantitative traits. One gene might increase your risk of Alzheimer’s disease by 2-fold while another lowers it by 0.84x, a third lowers it by 3-fold, and a fourth increases it slightly. The outcome is anyone’s guess, even if the your full genetic make-up is known.
- Finally, genetics are only part of the story for most quantitative traits. Environment also plays a role, and that won’t show up in a triangulation group.
How can you protect yourself?
Hopefully, I have given you enough information to know what your DNA matches might be able to determine about your genetic status and how likely that is. If you are concerned, you should know how to mitigate the risks. All four of the major DNA testing companies provide some level of privacy protection to their customers, although the safeguards are different for each one. I will take them-and GEDmatch-in turn.
No matter where you test, you can use an alias instead of your real name. For extra security, I would recommend using a different alias at each site where you have data to make it harder for other people to cross reference you among databases. Finally, for FTDNA and GEDmatch, which use email for communication, you may need to set up a new email address that does not identify you. There’s not much point in using an alias if the associated email address includes your real name.
Also, a word on triangulation, the technique I used to determine CFTR carrier status. To triangulate, I need to do three pairwise comparisons: person A to person B, person A to person C, and person B to person C. It’s only a triangulation group if all three people match one another in the same place on the same chromosome. (Remember: each person has DNA from both parents, so A could match B through her mother and match C through her father in the exact same spot. In that case, B and C wouldn’t match one another there, and I would not be able to infer medical status.)
- AncestryDNA does not have a chromosome browser, so the safety concerns described here are moot. Your matches can see how much DNA you share with them but not where the segments are. I could not have done the CFTR case study there. They also have an internal messaging system, so your DNA matches won’t know your email address.
- 23andMe has a full chromosome browser that lets you do all of the pairwise comparisons necessary for segment triangulation. Their protection lies in letting you control whether to share segment information and with whom. You can (a) opt out of DNA relative matching entirely, (b) participate without sharing segment information, equivalent to AncestryDNA’s system, (c) share segment information with specific DNA relatives on a case-by-case basis, or (d) opt in to “Open Sharing”, which would allow any other DNA relative using Open Sharing to do segment triangulation with you and their other matches. Like AncestryDNA, 23andMe has an internal messaging system that protects your email address.
- FTDNA has a partial chromosome browser available to all autosomal testers. You cannot opt out except by opting out of matching altogether. I call the chromosome browser “partial” because you can’t do formal segment triangulation from a single user account. That is, if you were person A in my hypothetical example above, you could see which segments you share with person B and with person C, but you wouldn’t be able to compare B and C to one another to confirm a triangulation group. (There is a way to infer whether B and C match there, but it isn’t proof.) The exception, of course, would be if you also have access to person B’s account, which is pretty common for genetic genealogists to have. FTDNA users communicate via email.
- MyHeritage introduced a chromosome browser in March, 2018. Like at FTDNA, you cannot directly compare person B with person C, but the system will show if they both triangulate with you. For that reason, I could have determined someone’s genetic status for the CFTR gene for for another trait. MyHeritage users communicate via an internal messaging system.
- GEDmatch is a third-party site to which you can upload your DNA data from your testing company. The upload is called a “kit”. GEDmatch is valuable to genetic genealogists because it lets you compare to people who tested elsewhere than you and because it offers additional tools that your original testing company might not have. It has a full chromosome browser that allows segment triangulation. Unlike at the DNA testing companies, any GEDmatch user can compare any public kit to any other, even if it is not among their DNA relatives. You can designate a kit “research” to make it inaccessible to others. GEDmatch also uses email for communication.
To ensure a level of privacy that suits you, think carefully about where you test (or transfer), how much segment information to share, and whether to use an alias and non-identifying email address. If you are not concerned, your genealogy research will benefit most from being in all five of the databases described above and from being able to use segment information. If you are concerned, you will probably want to restrict yourself to AncestryDNA and 23andMe without segment sharing.
There is no right or wrong here; do what works for you.
The triangulation I describe here could potentially reveal genetic information to another private citizen. It does not reflect what insurance companies, employers, or government entities can see. In the United States, employment and medical insurance (but not life insurance) discrimination based on genetic status is prohibited by GINA (the Genetic Information Nondiscrimination Act of 2008). A bill, H.R. 1313, was recently introduced in the House of Representatives that would weaken those protections. I describe how you can resist the bill here.
This post has been updated as follows:
- On 2 April, 2018, to include the chromosome browser at MyHeritage.
- On 8 May, 2018, to clarify that life insurance companies, unlike health insurance companies, can use genetic information to make policy decisions.