Several years ago, I helped an adoptee I called “Gordon” identify his biological father. (Gordon is a pseudonym, as are all other names in this post.) Gordon already knew that his birth mother was Helene Mills, the daughter of Oscar Mills and Florence Mattieson and the sister of Chris, Tony, and Michael.
Gordon’s case was particularly tricky, because he had long sections of DNA where he’d inherited the exact same genetic information from both parents. These so-called “runs of homozygosity” (ROH for short) indicated that the only candidates for his father were his own grandfather, Oscar, or an uncle (Chris, Tony, or Michael).
The total centimorgan amount of ROH DNA cannot distinguish between father–daughter incest and brother–sister incest, although sometimes the number of ROH segments, mapping those ROH segments, and subtle differences in shared DNA with 2nd cousins (2C) and beyond can point us in the right direction. In 2018, Dr Andrew Millard of Durham University, UK, ran some computer simulations and considered all three factors. Overall, he estimated that it was about 35,000 times more likely that Oscar was Gordon’s father than one of Helene’s brothers. (Read about that analysis here.)
Dr Millard’s conclusion was convincing, but it was well beyond the reach of the average genetic genealogist. I’ve since learned how to do these analyses myself; they involve a lot of time and a lot of—shall we say—choice language.
Now, there’s an easier way!
Distant Cousins Give Us Clues
Two of the three analyses that Dr Millard performed would not be possible in most genetic genealogy databases, because they don’t show ROH segments. The third approach is based solely on shared DNA amounts to 2C and beyond.
Consider the diagram below. If Gordon’s father was Helene’s brother (left), then Gordon is the great-grandson of Mr Mills & Ms Walton twice and the great-grandson of Mr Mattieson and Ms Kennedy twice. In that case, we’d expect him to share, on average, the same amount of DNA with his Mills cousins as with his Mattieson ones.
If, on the other hand, Helene’s father Oscar was also Gordon’s father, then Gordon is the both the grandson (through Oscar) and the great-grandson (through Helene) of Mr Mills and Ms Walton, while he’s great-grandson of Mr Mattieson and Ms Kennedy only once (through Helene). In this case, we’d expect Gordon to share, on average, more DNA with his Mills cousins than his Mattieson ones.
Based solely on DNA matches to 2C and beyond, Dr Millard estimated 85-to-1 odds that Oscar was Gordon’s father.
BanyanDNA Can Analyze Cases Like Gordon’s
Back in 2018 when Dr Millard did his analysis, there were only five 2C-and-beyond DNA matches to consider. One of them, Debbie, was related through both the Mattieson side and the Mills side. In the intervening years, another 11 matches have been mapped onto the tree, including another both-sides cousin, William.
I built this expanded tree in BanyanDNA, a forthcoming tool that can analyze complex relationships, like pedigree collapse, double cousins, and yes, even incest. As you can see, this tree is definitely complex!
To help bring some order to the chaos, I used blue lines for the Mills side, red lines for the Mattieson side, purple lines for the Mills–Mattieson descendants, orange lines for William and Debbie (both sides), and yellow lines for Gordon’s possible parentage. (That part’s a tangle. We’re working on improving the presentation.)
The DNA matches are as follows:
- Mills side (blue lines): Milton 181 cM, Francine 101 cM, Ernesto 267 cM, Leslie 175 cM, Alberto 130 cM, Barry 296 cM, Van 353 cM
- Mattieson side (red lines): Valerie 218 cM, Kirk 112 cM
- Mills–Mattieson descendants (purple lines): Lorenzo 450 cM, Rafael 725 cM, Patty 1749 cM, Nadine 850 cM, Beryl 997 cM
- Both sides (orange lines): William 206 cM, Debbie 325 cM
I had four hypotheses:
- Hypothesis 1: Oscar (grandfather) was Gordon’s father.
- Hypothesis 2: Tony (uncle) was Gordon’s father.
- Hypothesis 3: Chris (uncle) was Gordon’s father.
- Hypothesis 4: Michael (uncle) was Gordon’s father.
The gory details of how BanyanDNA works will be explained elsewhere. For now, suffice to say that it models DNA inheritance down through the tree that you give it. It does this hundreds or thousands of times (called “trials”) to account for randomness in inheritance, then it performs some statistical magic to compare the actual DNA amounts to the expected ones.
This is what BanyanDNA has to say about Gordon’s father after 1,000 trials.
There’s a 93% chance that his grandfather Oscar was also his father as compared to one of his three uncles. This is not far off from the prediction that Dr Millard made (85-to-1) based on shared DNA using fewer matches and much more labor-intensive analyses.
In BanyanDNA, you can even expand each hypothesis section to see the match-by-match analysis, including how much DNA the match actually shares, how much they’d be expected to share given the relationship(s) in the tree, the ideal range of shared DNA for those relationships, and the number of standard deviations between actual and expected (smaller is better when it comes to standard deviations).
BanyanDNA is going to be a game-changer for all of us with complex trees, whether that means pedigree collapse, double cousins, incest, or (eventually) endogamy! It’s also the only tool purposely built to analyze DNA matches through both sides of a union couple—like Oscar Mills and Florence Mattieson—at the same time.
Who’s Behind BanyanDNA?
BanyanDNA is a collaboration among five partners: two talented developers, Jaren Campbell and Carson Wilde; a statistician, Mike Charleston; and two genetic genealogists, Margaret Press and myself. You can learn more about the team here.
BanyanDNA promises to revolutionize genetic genealogy for complex trees. The official launch will be on 29 February, 2024, at RootsTech, with a live lecture, a hands-on workshop, and booth #622 in the Expo Hall. Early access starts on or about February 10. Anyone can participate in this open beta with the understanding that we will still be ironing out any final glitches, so you may encounter bugs. Sign up for our mailing list or check out our Facebook page for updates.
We can’t wait to see what you accomplish with BanyanDNA!
My question is based on the assumption that there will be a feature that allows users to upload a tree definition like when working with graphs (e.g., Python’s Networkx). For those working on multiple trees for major projects (in my case, 900 trees), are you providing tools to work with BanyanDNA programmatically? For example, I can use the bonsai tree algorithm programmatically and without having to upload my data to someone else’s website. Putting aside the argument about accuracy for a moment, if I were to use BanyanDNA, would I need to account for building trees individually to use the tool?
We will support gedcom imports some time after RootsTech. For now, users can easily build trees directly into BanyanDNA.
Congratulations Leah and team, this is amazing and will help so many! Can’t wait for the endogamy tool to arrive.
How awful—incest by her father. The DNA part is fascinating, but I guess I am focused on the terrible personal story this uncovered.
You’re right. Her experience was awful. I hope that being open about incest will shed light on past horrors and also destigmatize the victims. Only “Oscar” was to blame here.
This is wonderful news!! When you say “eventually” for endogamy, what will determine when this is available?
Great question, Judy! I can’t put a date on it; I can only say that it’s a top priority after RootsTech.
Thank you!! I am a search angel for a 79 yr old man with cancer. YDNA and autosomal have both told us that his father is Jewish. I have worked this every way I can but his closest matches are also NPEs. Unfortunately, most of his Jewish matches just say “endogamy” and refuse to give up any names or info. This might be just what we need!! I so badly want to solve this one for him!!
How close are his closest Jewish matches?
Will it be possible to import/copy a WATO tree into Banyan?
I have a pretty extensive tree already built in WATO, and I would love to just copy it to Banyan to analyze several issues with cousin relationships (with ensuing offspring).
That’s on our to-do list, and we’re talking with Jonny Perl about how to make that work. It’ll be after RootsTech, though.
But you still need segment data for this, correct? So can’t use it with matches that are on Ancestry (where most of our matches are).
No, this analysis didn’t use segment data at all. Most of the matches were from AncestryDNA.
Really? That’s very interesting. Will take another look.
This question is unrelated to your “Gordon Revisited” article.
In “Frequently Asked Questions” you recommend testing with BOTH Ancestry and 23andMe, and then uploading the Ancestry data file to MyHeritage. But wouldn’t it be better to upload the 23andMe data file? The chip used by 23andMe is nearly identical to that used by MyHeritage (and also FTDNA), but the chip used by Ancestry is very different, and doesn’t cover the same SNPs. So an Ancestry upload will involve a lot of guesswork (or “imputation”) by MyHeritage to fill in the missing SNPs, which could create False Matches.
That largely depends on the chip versions being compared. In truth, they’re all being imputed, so there’s no way around the guesswork. Ultimately, MyHeritage will have to refine their imputation to minimize false matching, and we as genealogists will have to take imputation into consideration.
Okay, granted. But if they tested with 23andMe’s current chip (V5, I think) isn’t it preferable to use that for the upload to MyHeritage, rather than the Ancestry one?
And, of course, there’s no reason why they couldn’t upload BOTH files. (I did that, and the results were different).
I theory, maybe. I’m not confident enough in MyHeritage’s matching algorithm to think it’ll matter much.
The dna list by country is a nice resource (dated 2017). I am not sure if there is a newer version of this list somewhere https://thednageek.com/genealogical-dna-testing-around-the-globe/
Many tests claim to be global but are very USA centric. I am interested in Italy and it seems they do not do dna tests.
I am also wondering if the new tool can in some ways help adoptees bring clarity to searches for family. Many times limited or only one side known and this complicates the ability to find relatives.