A group of trees with the text " your tree is a banyan : an emerging endogamy ".

Contribute to the Endogamy Study

February 25, 2021 thednageek 35d Comments

Endogamy, the practice of marrying within a community or religion, is one of the biggest challenges in genetic genealogy, especially when using autosomal DNA. In my talks for RootsTech Connect 2021, I explain what endogamy is, why it’s a problem, and some strategies for working with it. It’s really one talk in two parts. You can watch both of them for free until the next RootsTech, in early 2022. Part 1 is here and Part 2 is here.

At the end of Part 2, I presented some data comparing different endogamous populations.

Without going into detail (Watch the talks!), the “hotter” the colors, the more endogamous the population is, and the more caution you need to exercise when working with your autosomal DNA matches.

I also made an appeal for volunteers to contribute their own match data anonymously to an ongoing study of endogamy. The goals are two-fold: to gauge how much endogamy is in different populations, and to develop best-practices based on that information.

If two or more of your grandparents were from the same endogamous population, you can help! The study can use match data from either MyHeritage or AncestryDNA. The other companies don’t provide the information in a format that’s easy to use. However, if you’ve tested elsewhere, you can upload your raw data from your testing company to MyHeritage and still participate in the endogamy study. Instructions are here; scroll down to the section on MyHeritage.

These instructions describe how to get the two or three columns of information needed for the study. You can then email the file to me at theDNAgeek (at) gmail (dot) com. Please also tell me the name or location of the population and the number of grandparents who belonged (e.g., ‘all four grandparents were Ashkenazim’ or ‘two of four grandparents were Samoan’).

If you manage kits for relatives who are also willing to contribute, you can send their match data, too. Reassure them that I don’t need any identifying information about them or their matches.

MyHeritage

The data is easiest to obtain from MyHeritage. At the top right of your match list, you’ll see three vertical dots. Click them to get a pop-up with some options, and select “Export entire DNA Matches list”.

MyHeritage will ask you to confirm (click “OK”), then they’ll email you a csv file. It may take a few hours for that file to come through, so don’t worry if you don’t see it right away. The file will be called “Firstname Lastname DNA Matches list” with some additional code for the date and kit identifier.

When you receive the email, open the attached file in a spreadsheet program, like Excel or Google Sheets. Delete every column except “Total cM shared”, “Number of shared segments”, and “Largest segment (cM)” (highlighted below).

Save the edited file (with just the three columns) in csv format with a descriptive name for the population and the number of grandparents from it, e.g., Iceland4.csv or Garifuna2.csv. Note that the file will no longer contain any identifying information about your matches.

Email the file(s) to me at theDNAgeek (at) gmail (dot) com. I’ll let you know how your kit(s) rank compared to other endogamous populations.

AncestryDNA

The data isn’t quite as straightforward to get from AncestryDNA. You have to use a third-party tool called the DNAGedcom Client, which is a stand-along program you install on your computer. It requires a nominal subscription of $5/month. It’s worth trying it for a month to see if you find it useful. (It does a lot more than I describe below.)

Alternately, if you aren’t already in the MyHeritage database, you might consider uploading your data there, especially as they’re offering a free “unlock” of their tools this week. Instructions are here.

However, if you’re at AncestryDNA and have (or can get) the DNAGedcom Client, here’s what to do.

Open the DNAGedcom Client and log into your account there.

Next, click “Gather” in the top menu bar, then click the AncestryDNA button.

On the next screen, log into your AncestryDNA account using your credentials. This information will not be stored anywhere else.

Once you’re logged in, select the kit you’d like to scan, set the minimum cM value to 20, and make sure none of the tick boxes are checked.

Then click the green “Gather DNA Data” button. The scan may take a while, depending on how many matches you have. (Rescanning a kit is a lot faster. I promise!)

When it’s done, you’ll see a note below the login panel that says “Creating Ancestry Reports Completed.” It will have saved a file to your computer called “m_Firstname_Lastname.csv”.

Open that file in a spreadsheet program, like Excel or Google Sheets. Delete every column except “sharedCM” and “sharedSegments” (highlighted below).

Save the edited file (with just the two columns) in csv format with a descriptive name for the population and the number of grandparents from it, e.g., Oaxaca4.csv or Afrikaans3.csv. Note that the file will no longer contain any identifying information about your matches.

Email the file(s) to me at theDNAgeek (at) gmail (dot) com. I’ll let you know how your kit(s) rank compared to other endogamous populations.

Andthank you!

Updates to This Post

14 October 2021 – corrected the subscription price for DNAGedcom

Share on Facebook

35 thoughts on “Contribute to the Endogamy Study”

Pingback: Contribute to the Endogamy Study — The DNA Geek | Ups Downs Family History
Egil Enaasen says:

February 26, 2021 at 10:17 am

Could You tell what happens to DNA-data after the analyze is done. Are the data deleted or do we risk sharing with unknown actors?

Reply
1. thednageek says:
  
  February 26, 2021 at 5:23 pm
  
  The study doesn’t collect any DNA data or even the names of matches. The only information needed is the total amount of DNA (in centimorgans), the number of segments, and the amount of the longest segment for each match.
  
  Reply
Peggy Sue Druck says:

February 26, 2021 at 2:50 pm

I would be happy to send to you many DNA tests from Ancestry that I manage due to all are family for you to upload, you can reach me directly at Peggy Sue Druck

Reply
Haakon Chevalier says:

February 26, 2021 at 5:25 pm

I have one grandparent who is Ashkenazi and another who is 3/4 Ashkenazi and 1/4 Sephardi. Will that satisfy the criteria for the study?

Reply
1. thednageek says:
  
  February 27, 2021 at 12:22 pm
  
  I’d love to see that data! Thank you for offering.
  
  Reply
Rgallica says:

February 27, 2021 at 8:39 pm

I would consider contributing to your study. I am 1/4 French Canadian ( and the tree goes waaaay back), 1/2 Ashkenazi, and maybe 1/8 Nova Scotia

Reply
1. thednageek says:
  
  February 28, 2021 at 4:59 pm
  
  It might be hard to distinguish how much each group is affecting your matches. I’d be happy to take a look for you, though.
  
  Reply
Elina J Filander says:

February 28, 2021 at 6:05 am

My family comes from western Finland. When I did a cluster in December of 2019 with my My Heritage DNA, 70 of 100 results were one big cluster — which is actually better than my initial cluster in March of that year of 93 of 100. Is that a traditionally endogamous area? Would you like any of that info? Your talk certainly gave me renewed hope for being able to make any sense of this!

Reply
1. thednageek says:
  
  February 28, 2021 at 5:01 pm
  
  I would love to see the data! I don’t have a good sense of how endogamous Finland is, so any new data would be helpful.
  
  Reply
Suzanne M says:

February 28, 2021 at 11:51 am

I come from an endogomous population from an island off the coast of current-day Croatia. This is on my mother’s side, and I am on 23andMe. So is my mother’s brother (my uncle). I don’t want to upload DNA data to another site, nor to read their ToS, but I’m handy with spreadsheets and math. From the explanation you give in a response above, it sounds like a list I can generate from the relatives csv file of relations I can download from 23andMe. I can even anonymize people’s names myself.

The degree of endogamy I have found in my ancestors back to the 1700s: 2 ancestor pairs who appear four times, one ancestor pair that appears twice, and there are also repeat ancestors from those ancestor descendants. I was also able to, through a lot of work, find three common ancestor pairs with a DNA cousin. I was heartened to learn from your talk that in five years there may be methods to make DNA relations clearer in endogomous populations.

Whether you’re interested in my population or not, I salute your work and found your talks the most interesting and relevant to my efforts in Rootstech (the tiny subset I watched, of course).

Reply
1. thednageek says:
  
  February 28, 2021 at 5:04 pm
  
  If you’re willing to extract the data from your 23andMe matches, that would be great! I’m collecting total cM, total number of segments, and the cM of the longest segment (all excluding the X chromosome). Ideally, it would be in a spreadsheet with one match per row. No need to include match names.
  
  Reply
Julie says:

March 2, 2021 at 3:11 pm

I’m the descendent of Colonial Americans. I will try to manage this, but the number of marriage combinations in my family fried my brain. My father in the descendent of John Price and his first wife and my mother is a descendent of same with his second wife. Brothers married sisters and then their children married each other. Cousin married her uncle and her nephew. I was a flower girl in the wedding of my second cousins. I have third cousins who share triple the DNA of other same generation cousins. I’ve stopped using 3rd cousin and use CMs instead. I will try this but I’m not sure it will give the big picture.

Reply
1. thednageek says:
  
  March 3, 2021 at 7:26 pm
  
  It’s definitely complicated!
  
  Reply
Dorothy Greene says:

March 7, 2021 at 8:56 pm

I have three different endogamous groups in my ancestry. Both of my maternal grandparents had about 90% of there ancestors from the northern part of Essex County in Massachusetts and the rest came from towns in eastern Massachusetts. Most of my grandmother’s ancestors came from Amesbury MA. There are 17 original settlers there and 8 are her ancestors and two of them 5 times and all of their ancestors were here before 1700 and most by 1650. My paternal grandfather was from a small town in Nova Scotia CA and my paternal grandmother was from a small town in Newfoundland CA.

I sent your article to my 4th cousin in Newfoundland (we connected on Gedmatch) and she would be perfect for your study. All 4 of her grandparents are from the same small area. She said that she would be interested. If you are interested in her send me an e-mail and I’ll send you her e-mail.

Reply
1. thednageek says:
  
  March 9, 2021 at 11:12 pm
  
  Thank you! Please have her contact me here: https://thednageek.com/contact/
  
  Reply
Dorothy Greene says:

March 7, 2021 at 9:11 pm

I just re-read your comments about volunteers. I tested at 23 and me but transfered my data to my heritage and gedmatch. I know that my 4th cousin tested at FTDNA but she has transfered her data to gedmatch and my hertiage. I have a 2nd cousin in Nova Scotia who has 2 1/2 grandparents from that small area. Her data is on 23 and me and gedmatch. Would here be any good?

Reply
1. thednageek says:
  
  March 9, 2021 at 11:13 pm
  
  The data from MyHeritage is perfect!
  
  Reply
Dennis Hogan says:

April 1, 2021 at 4:30 pm

Not sure if I can be of help. One set of 2nd great grandparents were second cousins (Acadians) who likely had church approval to marry. I once heard Blaine Bettinger comment about Acadians as an example of endogamous population. Let me know if this meets your criterion.

Reply
1. thednageek says:
  
  April 2, 2021 at 4:40 pm
  
  Yes, Acadians and their Cajun descendants (I’m Cajun) were endogamous. It’s quite common! Thank you for the offer to help. Right now, we’re focusing on people who have tested and are double second cousins to one another. We’d love your help later when we expand our testing to other scenarios.
  
  Reply
Chris Pederson says:

April 21, 2021 at 4:43 pm

Just listened to your two Rootstech talks on endogamy today with great interest. I would love to participate in your study and have at least two and possibly a third set of data to offer. First of all mine may not qualify in that I have only one grandparent who is Cajun (hi cuz). His wife, Colonial American. My father in law, 3 grandparents Spanish Colonial New Mexico. Fourth grandparent a different endogomas population, namely Ireland. Finally my wife, 4 grandparents, 8 great grandparents Spanish Colonial New Mexico. Do you want all 3 or just the two?

Could you also include links to other endogamy tools you mentioned in your presentation.

Thanks

Reply
1. thednageek says:
  
  April 21, 2021 at 6:26 pm
  
  That would be great! Right now the study is focusing on people who have at least two grandparents from the same endogamous population, so your wife and FIL would be perfect!
  
  Reply
Chris Pederson says:

April 21, 2021 at 7:10 pm

I’ll work on it in next couple of days.

Reply
Phyllis Kaelin says:

April 28, 2021 at 4:39 pm

Would be happy to share My Heritage data — I know I have endogamy, on both maternal and paternal sides, although different populations. What I don’t know is how to separate out the data into populations. The file I got from My Heritage has 13781 lines of data, and I know at least a few of the largest are paternal and many of the rest likely to be maternal. Would the data help? I can provide more detail if you like, or I can just go ahead and send with my best guess about likely groups! Thanks.

Reply
1. thednageek says:
  
  April 28, 2021 at 4:52 pm
  
  I appreciate the offer. For now, the study is focusing on individuals from just one endogamous population. I’ll keep you in mind for future work, though!
  
  Reply
Anna says:

May 30, 2021 at 7:55 pm

Hi, are you still accepting contributions to this? I’m waiting on test results at the moment, but I can send stuff your way later if you’re interested. I’m thoroughly Appalachian on both sides of my family and I have two 5th great-grandfathers that are brothers.

Reply
1. thednageek says:
  
  May 31, 2021 at 12:09 pm
  
  Yes please!
  
  Reply
Emily Garber says:

June 26, 2021 at 5:14 pm

Hi, Leah,

Thoroughly enjoyed your Banyan talks at RootsTech. I manage several kits at MyHeritage for people with 4 Ashkenazic grandparents. Would you like me to download each one?

Reply
1. thednageek says:
  
  June 29, 2021 at 2:25 pm
  
  That would be wonderful! Note that if you process the data as described in the blog, it will be completely anonymous.
  
  Reply
Sandra says:

February 26, 2022 at 11:00 pm

Are you still collecting data for your endogamous study?

Reply
1. thednageek says:
  
  February 28, 2022 at 6:57 pm
  
  Yes, feel free to contact me if you’d like to contribute. https://thednageek.com/contact/
  
  Reply
Mila K says:

March 1, 2026 at 1:38 pm

When you calculate the average segment size do you use the pre-Timber or post-Timber amount?

Reply
1. thednageek says:
  
  March 1, 2026 at 3:26 pm
  
  I use post-Timber because it’s easier to access. If there’s a big difference, you almost definitely have endogamy.
  
  Reply
2. Mila K says:
  
  March 5, 2026 at 1:21 am
  
  The problem with this methodology is that it creates a double counting distortion, in the best case scenario. Furthermore, that kind of advice could lead to people reaching the wrong conclusions about the actual rate of endogamy among their DNA matches.
  
  The endogamy among some populations, e.g. Roma, is so heavy that it makes AJ endogamy look like a cakewalk by comparison. Yet the former are barely grazed by Ancestry’s TIMBER. Meanwhile, those of British descent get pummeled and hammered by TIMBER at a rate disproportionate to their actual level of endogamy, which is broadly agreed to be at the mild end of the spectrum.
  
  Ditto for Ashkenazi Jews, believe it or not, at least in the disproportionate sense, and those who are partially AJ get it even worse. It would help to explain why there is such a sharp drop between 2C and 3C, with the average segment size among Ashkenazim in the Close Family to the 2nd Cousin cM range being similar to populations in the Mild Endogamy category, while the average segment size for the 3rd Cousin to Distant Cousin cM range being similar to populations in the Extensive Endogamy category. The fact that Ashkenazi Jews are awkwardly straddling these two disparate categories should have been a clue that something is off. Although part of it has to do with the historical tempo of AJ endogamy, that’s not the whole story. It’s also an artifact of the ham-fisted way that Ancestry decides if you are TIMBER-worthy, which causes bizarre discontinuities.
  
  Anyone who can guess why these two groups get the brunt of TIMBER deserves a free year of Pro Tools, or a decade. Hint: Ancestry’s TIMBER algorithm has a major base rate problem.
  
  The fact that Ancestry limits TIMBER to cMs below 90 could be a hint that even they don’t have much faith in its own validity, and are actually more concerned with keeping our match lists to a “manageable” size, a 23&me style truncation by a different name. As opposed to doing it in a less micromanaging way, such as providing us with the ability to order our matches by longest segment or average segment size, or by introducing a chromosome browser or a triangulation tool.
  
  Plus, we know that the 90 cM limit is not some magical safe harbor that makes it immune to the effects of pedigree collapse or endogamy, with the zone of endogamy extending well above it for some populations. And since that’s where the closer matches are, that’s also where a valid TIMBER, as opposed to the current overwrought, sloppy, in your face one would be most helpful. It’s small comfort knowing that you have to wade through hundreds of endogamous IBS matches mixed in with the valid ones to reach the still potentially mis-TIMBERed ones below 90 cM.
  
  The takeaway is that everyone gets shafted by TIMBER, albeit from different ends, with some systemically over-TIMBERed and some systemically under-TIMBERed. Ancestry’s TIMBER algorithm deserves to be viewed in the same light as MyHeritage’s overeager imputation algorithm which so generously gifts us those beloved phantom Frankensegments, and not simply taken at face value.
  
  Ancestry’s TIMBER is also causing knock-on problems for their AutoCluster tool. The other implication is that the data in the Shared cM Project is corrupted in the sub-90 cM range, and isn’t comparable across different populations.
  
  Yes, this issue seriously needs to be brought to the attention of TPTB at Ancestry, and not merely hand-waved away, as it’s already caused much misunderstanding and confusion.
  
  There’s more that I can say on the matter, but I’m trying to keep my comment “manageable” here. I know it would have no hope of making it through at, say GGTT.
  
  Reply
  1. thednageek says:
    
    March 10, 2026 at 6:41 pm
    
    I agree that TIMBER is imperfect, especially for groups that are underrepresented in the database, because they probably don’t have enough of a reference sample to assess the “pileups.” I can’t think of a mechanism that would cause overcorrection for groups like Brits and Ashkanzim. Can you?
    
    (Apologies for the delay in posting. I was at RootsTech.)
    
    Reply