Contribute to the Endogamy Study

Endogamy, the practice of marrying within a community or religion, is one of the biggest challenges in genetic genealogy, especially when using autosomal DNA.  In my talks for RootsTech Connect 2021, I explain what endogamy is, why it’s a problem, and some strategies for working with it.  It’s really one talk in two parts.  You can watch both of them for free until the next RootsTech, in early 2022.  Part 1 is here and Part 2 is here.

At the end of Part 2, I presented some data comparing different endogamous populations.

Without going into detail (Watch the talks!), the “hotter” the colors, the more endogamous the population is, and the more caution you need to exercise when working with your autosomal DNA matches.

I also made an appeal for volunteers to contribute their own match data anonymously to an ongoing study of endogamy.  The goals are two-fold:  to gauge how much endogamy is in different populations, and to develop best-practices based on that information.

If two or more of your grandparents were from the same endogamous population, you can help!  The study can use match data from either MyHeritage or AncestryDNA.  The other companies don’t provide the information in a format that’s easy to use.  However, if you’ve tested elsewhere, you can upload your raw data from your testing company to MyHeritage and still participate in the endogamy study.  Instructions are here; scroll down to the section on MyHeritage.

These instructions describe how to get the two or three columns of information needed for the study.  You can then email the file to me at theDNAgeek (at) gmail (dot) com.  Please also tell me the name or location of the population and the number of grandparents who belonged (e.g., ‘all four grandparents were Ashkenazim’ or ‘two of four grandparents were Samoan’).

If you manage kits for relatives who are also willing to contribute, you can send their match data, too.  Reassure them that I don’t need any identifying information about them or their matches.

 

MyHeritage

The data is easiest to obtain from MyHeritage.  At the top right of your match list, you’ll see three vertical dots.  Click them to get a pop-up with some options, and select “Export entire DNA Matches list”.

MyHeritage will ask you to confirm (click “OK”), then they’ll email you a csv file.  It may take a few hours for that file to come through, so don’t worry if you don’t see it right away.  The file will be called “Firstname Lastname DNA Matches list” with some additional code for the date and kit identifier.

When you receive the email, open the attached file in a spreadsheet program, like Excel or Google Sheets.  Delete every column except “Total cM shared”, “Number of shared segments”, and “Largest segment (cM)” (highlighted below).

Save the edited file (with just the three columns) in csv format with a descriptive name for the population and the number of grandparents from it, e.g., Iceland4.csv or Garifuna2.csv.  Note that the file will no longer contain any identifying information about your matches.

Email the file(s) to me at theDNAgeek (at) gmail (dot) com.  I’ll let you know how your kit(s) rank compared to other endogamous populations.

 

AncestryDNA

The data isn’t quite as straightforward to get from AncestryDNA.  You have to use a third-party tool called the DNAGedcom Client, which is a stand-along program you install on your computer.  It requires a nominal subscription of $5/month.  It’s worth trying it for a month to see if you find it useful.  (It does a lot more than I describe below.)

Alternately, if you aren’t already in the MyHeritage database, you might consider uploading your data there, especially as they’re offering a free “unlock” of their tools this week.  Instructions are here.

However, if you’re at AncestryDNA and have (or can get) the DNAGedcom Client, here’s what to do.

Open the DNAGedcom Client and log into your account there.

Next, click “Gather” in the top menu bar, then click the AncestryDNA button.

On the next screen, log into your AncestryDNA account using your credentials.  This information will not be stored anywhere else.

Once you’re logged in, select the kit you’d like to scan, set the minimum cM value to 20, and make sure none of the tick boxes are checked.

Then click the green “Gather DNA Data” button.  The scan may take a while, depending on how many matches you have.  (Rescanning a kit is a lot faster.  I promise!)

When it’s done, you’ll see a note below the login panel that says “Creating Ancestry Reports Completed.”  It will have saved a file to your computer called “m_Firstname_Lastname.csv”.

Open that file in a spreadsheet program, like Excel or Google Sheets.  Delete every column except “sharedCM” and “sharedSegments” (highlighted below).

Save the edited file (with just the two columns) in csv format with a descriptive name for the population and the number of grandparents from it, e.g., Oaxaca4.csv or Afrikaans3.csv.  Note that the file will no longer contain any identifying information about your matches.

Email the file(s) to me at theDNAgeek (at) gmail (dot) com.  I’ll let you know how your kit(s) rank compared to other endogamous populations.

And thank you!

Updates to This Post

14 October 2021 — corrected the subscription price for DNAGedcom

29 thoughts on “Contribute to the Endogamy Study”

  1. Could You tell what happens to DNA-data after the analyze is done. Are the data deleted or do we risk sharing with unknown actors?

    1. The study doesn’t collect any DNA data or even the names of matches. The only information needed is the total amount of DNA (in centimorgans), the number of segments, and the amount of the longest segment for each match.

  2. I would be happy to send to you many DNA tests from Ancestry that I manage due to all are family for you to upload, you can reach me directly at Peggy Sue Druck

  3. I have one grandparent who is Ashkenazi and another who is 3/4 Ashkenazi and 1/4 Sephardi. Will that satisfy the criteria for the study?

  4. I would consider contributing to your study. I am 1/4 French Canadian ( and the tree goes waaaay back), 1/2 Ashkenazi, and maybe 1/8 Nova Scotia

    1. It might be hard to distinguish how much each group is affecting your matches. I’d be happy to take a look for you, though.

  5. My family comes from western Finland. When I did a cluster in December of 2019 with my My Heritage DNA, 70 of 100 results were one big cluster — which is actually better than my initial cluster in March of that year of 93 of 100. Is that a traditionally endogamous area? Would you like any of that info? Your talk certainly gave me renewed hope for being able to make any sense of this!

    1. I would love to see the data! I don’t have a good sense of how endogamous Finland is, so any new data would be helpful.

  6. I come from an endogomous population from an island off the coast of current-day Croatia. This is on my mother’s side, and I am on 23andMe. So is my mother’s brother (my uncle). I don’t want to upload DNA data to another site, nor to read their ToS, but I’m handy with spreadsheets and math. From the explanation you give in a response above, it sounds like a list I can generate from the relatives csv file of relations I can download from 23andMe. I can even anonymize people’s names myself.

    The degree of endogamy I have found in my ancestors back to the 1700s: 2 ancestor pairs who appear four times, one ancestor pair that appears twice, and there are also repeat ancestors from those ancestor descendants. I was also able to, through a lot of work, find three common ancestor pairs with a DNA cousin. I was heartened to learn from your talk that in five years there may be methods to make DNA relations clearer in endogomous populations.

    Whether you’re interested in my population or not, I salute your work and found your talks the most interesting and relevant to my efforts in Rootstech (the tiny subset I watched, of course).

    1. If you’re willing to extract the data from your 23andMe matches, that would be great! I’m collecting total cM, total number of segments, and the cM of the longest segment (all excluding the X chromosome). Ideally, it would be in a spreadsheet with one match per row. No need to include match names.

  7. I’m the descendent of Colonial Americans. I will try to manage this, but the number of marriage combinations in my family fried my brain. My father in the descendent of John Price and his first wife and my mother is a descendent of same with his second wife. Brothers married sisters and then their children married each other. Cousin married her uncle and her nephew. I was a flower girl in the wedding of my second cousins. I have third cousins who share triple the DNA of other same generation cousins. I’ve stopped using 3rd cousin and use CMs instead. I will try this but I’m not sure it will give the big picture.

  8. I have three different endogamous groups in my ancestry. Both of my maternal grandparents had about 90% of there ancestors from the northern part of Essex County in Massachusetts and the rest came from towns in eastern Massachusetts. Most of my grandmother’s ancestors came from Amesbury MA. There are 17 original settlers there and 8 are her ancestors and two of them 5 times and all of their ancestors were here before 1700 and most by 1650. My paternal grandfather was from a small town in Nova Scotia CA and my paternal grandmother was from a small town in Newfoundland CA.

    I sent your article to my 4th cousin in Newfoundland (we connected on Gedmatch) and she would be perfect for your study. All 4 of her grandparents are from the same small area. She said that she would be interested. If you are interested in her send me an e-mail and I’ll send you her e-mail.

  9. I just re-read your comments about volunteers. I tested at 23 and me but transfered my data to my heritage and gedmatch. I know that my 4th cousin tested at FTDNA but she has transfered her data to gedmatch and my hertiage. I have a 2nd cousin in Nova Scotia who has 2 1/2 grandparents from that small area. Her data is on 23 and me and gedmatch. Would here be any good?

  10. Not sure if I can be of help. One set of 2nd great grandparents were second cousins (Acadians) who likely had church approval to marry. I once heard Blaine Bettinger comment about Acadians as an example of endogamous population. Let me know if this meets your criterion.

    1. Yes, Acadians and their Cajun descendants (I’m Cajun) were endogamous. It’s quite common! Thank you for the offer to help. Right now, we’re focusing on people who have tested and are double second cousins to one another. We’d love your help later when we expand our testing to other scenarios.

  11. Just listened to your two Rootstech talks on endogamy today with great interest. I would love to participate in your study and have at least two and possibly a third set of data to offer. First of all mine may not qualify in that I have only one grandparent who is Cajun (hi cuz). His wife, Colonial American. My father in law, 3 grandparents Spanish Colonial New Mexico. Fourth grandparent a different endogomas population, namely Ireland. Finally my wife, 4 grandparents, 8 great grandparents Spanish Colonial New Mexico. Do you want all 3 or just the two?

    Could you also include links to other endogamy tools you mentioned in your presentation.

    Thanks

    1. That would be great! Right now the study is focusing on people who have at least two grandparents from the same endogamous population, so your wife and FIL would be perfect!

  12. Would be happy to share My Heritage data — I know I have endogamy, on both maternal and paternal sides, although different populations. What I don’t know is how to separate out the data into populations. The file I got from My Heritage has 13781 lines of data, and I know at least a few of the largest are paternal and many of the rest likely to be maternal. Would the data help? I can provide more detail if you like, or I can just go ahead and send with my best guess about likely groups! Thanks.

    1. I appreciate the offer. For now, the study is focusing on individuals from just one endogamous population. I’ll keep you in mind for future work, though!

  13. Hi, are you still accepting contributions to this? I’m waiting on test results at the moment, but I can send stuff your way later if you’re interested. I’m thoroughly Appalachian on both sides of my family and I have two 5th great-grandfathers that are brothers.

  14. Hi, Leah,

    Thoroughly enjoyed your Banyan talks at RootsTech. I manage several kits at MyHeritage for people with 4 Ashkenazic grandparents. Would you like me to download each one?

    1. That would be wonderful! Note that if you process the data as described in the blog, it will be completely anonymous.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.