Is Your Family Tree Biologically Correct?

You know you’ve wondered; we all have:  Is my family tree biologically correct?  All of it?

I can easily prove that my social parents are also my biological parents because both have done DNA tests and match me as expected.

I can just as easily conclude that all four of my grandparents are who I think they are thanks to DNA matches to an uncle and some closer cousins.

But my confidence level decreases with each generation back as the DNA matches become more distant and the shared DNA amounts less predictive of specific relationships.  For example, I have a 2nd cousin once removed (2C1R) who shares only 19 cM with me.  According to the Shared cM Tool at DNA Painter, it’s far more likely that they are a distant cousin than 2C1R.  Is my tree wrong?  Is theirs?  Or is our match just an outlier.

DNA-based confidence in a pedigree declines with each generation.

 

To answer these questions, I need more match data for more people.  I also need a way to analyze all of that data at the same time.

Tree Validation with BanyanDNA

This is where BanyanDNA shines.  BanyanDNA is a new tool for genetic genealogy that is customized to your family tree, including cases of pedigree collapse and double cousins.  It has three main uses:  visualizing complex trees, validating biological relationships, and hypothesis testing for unknown parentage.  This post will showcase the validation features.

(Full disclosure:  I am a partner in the BanyanDNA business.)

BanyanDNA is unique among all genetic genealogy tools in that it can analyze multiple DNA kits at the same time.  This is fabulous news for those of us who manage the DNA results of other relatives.  BanyanDNA can even be used for match data from sites like MyHeritage and 23andMe, which show us how much DNA our matches share with one another, even when we don’t have direct access to their full match lists.   (Heads up:  AncestryDNA recently announced that they will soon offer us this information, as well!)

For example, on one side of my family, I have access to my own kits at AncestryDNA, 23andMe, and MyHeritage; my parent’s at 23andMe and MyHeritage; and a 1C1R’s at AncestryDNA.  I can plug all of that match data into a single analysis to see how it all fits the tree.

The images below show the tree in BanyanDNA from the perspective of each of the three of us.  In the upper image, you can see that I have match data (shown in the small “flags” at the top left of each of match) to everyone who has tested (the purple nodes).  The second panel shows a single match for my parent, who is only at 23andMe.  The third panel shows my 1C1R’s matches at AncestryDNA.

Once I have the tree and enter the data, I can ask BanyanDNA to analyze all of these matches at once.  It will perform up to 10,000 computer simulations that model the inheritance of DNA from the ancestral couple down to each of the DNA testers.  Then, it will compare the actual match data that I entered with the expected results from the simulations.

The first few lines of the output look like this.  For each match, BanyanDNA tells me the relationship in the tree, the actual amount of shared DNA, the average amount of DNA expected for the relationship, and a common range (±1 standard deviation for you math nerds).  It also tells me the number of standard deviations (SDs).

Simply put, the number of SDs is a measure of how far away the actual value is from the expected average.  Generally speaking, if a match is less than 1 SD from the expected value, there’s little to worry about.  If it’s 1–2 SDs from expected, it’s worth a closer look but not a huge concern.  And more than 2 SDs requires careful scrutiny before it can be accepted.

The screenshot above shows a subset of the results.  I had a total of 15 pairwise matches in this analysis.  Most of them were below 1 SD, so I am confident that my 1C1R is really my 1C1R and that these great grandparents were who I think they were.  However, my troublesome 2C1R, in the third row, is 2.7 SD from the expected average (actual: 19 cM, expected: 101 cM).  That’s not enough to say for sure that we aren’t really 2C1R, but it’s definitely worth a reappraisal.

The best way to reappraise is to throw more data at the problem.  In this case, I can also look at the match between my 1C1R (fourth row in the screenshot above, 399 cM and 0.3 SD to me) and Troublesome.  On paper, they are 1C2R to one another and share 160 cM.  That’s lower than the expected average of 213 cM but less than 1 SD.

Overall, this doesn’t definitively prove that Troublesome is my full 2C1R, but it gives me a little more confidence that they could simply be a low-matching cousin rather than an unexpected half relative.

Next Steps

BanyanDNA also supports hypothesis testing, so I can ask it to evaluate directly whether Troublesome is a full relative (one hypothesis) or half relative (a competing hypothesis) using Troublesome’s matches to myself and my 1C1R.  That’s a topic for a future post.  With only two matches to Troublesome, the results aren’t statistically meaningful yet.

Once AncestryDNA starts showing me how much DNA Troublesome shares with the other testers in the family, I can reassess whether Troublesome’s great grandparent was a half-sibling.  Time (and more data) will tell!

13 thoughts on “Is Your Family Tree Biologically Correct?”

  1. BanyanDNA sounds very exciting! I am curious, however, about how it would work in endogamous populations? My paternal line is from a fishing village in Scotland who all share the same surnames and all intermarried over many generations!

    1. Endogamy is a major motivating force for BanyanDNA! Right now, it can accommodate anything you can build into the tree. That is, if you know the “extra” relationships between people, BanyanDNA can analyze it. To address those unknown extra connections——the ones that pre-date the documented tree——we’ll need to customize the analysis to each population. To that end, we’re collecting data from volunteers. You can contribute here if you like: https://forms.gle/Z7EpTC5WUUcQkmud7 We accept data for any population, as long as all four grandparents of the tester were from that population.

  2. Quote you: ” I have a 2nd cousin once removed (2C1R) who shares only 19 cM with me. According to the Shared cM Tool at DNA Painter, it’s far more likely that they are a distant cousin than 2C1R. Is my tree wrong? Is theirs? Or is our match just an outlier.” Unquote.

    Have you charted your GENEALOGY? The depiction or image of the people in you ancestry will have been arrived at by PAPER research. There will be holes in the descendants of all those many lineages. Genetic tests might indicate a family of interest, but somehow only documentation, [B. ,M., D.], can show who is related or connected to whom.
    Matches are simply people who have tested who also contain some DNA that you have.
    The nature DNA recombination of the 22 autosomes over 4-generations (GtGtGP = 16 family lines; [3c]) means that even biological siblings can show no match.
    I have five cases or 3c-1r having values in range of 7cM, 37cM, 76cM, 112cM, 200cM. Go figure. (See . I have paper records of all this: B., M., D., Burial, Census, School. All on FTDNA and confirmed on Ancestry and MyHeritage where those ‘matches’ have also tested.

    1. Yes, of course I’ve charted my genealogy using records. Records can be wrong, though. On paper, Troublesome is my full 2C1R. The question is whether the low DNA match suggests a half-relation that we were not previously aware of.

  3. As you know, my maternal side is a mess and too hard for me to work with. But I am wondering whether I should try this with my paternal side. But there are no first cousins on that side at all (my father’s sister had no children) and as far as I know, no second cousins who have tested on any site. There are some third cousins and even more distant cousins plus my brother and me. Is it even worth trying to use Banyan to test the apparently bio relatives on that side, or do I need closer matches to make it worthwhile? And, of course, endogamy is a factor on my father’s side as well as my mother’s side. (Also, I believe I’ve filled out the survey already with both maternal and paternal matches. If I haven’t, let me know.)

    1. Sure, it’s worth a shot, with the proviso that as the relationships become more distant (e.g., 3C and further), autosomal DNA is less useful.

  4. I have tried the free version, and it’s very time consuming and clunky. I would love to try the beta version allowing gedcom imports, but I will not pay $75 to try it. The beta should be available to try for free. JMHO.

  5. I have tried it but am not on Facebook to go to the group for help. I am trying to figure out my grand uncle’s father and my father in laws but the example used in the quick set up for double cousin is not helping me figure out how to use it in these scenarios. Both have what I believe are half first cousins that I am trying to work with..(369 one and the other 305). Do I just put them where I think they should go or use the hypothesis? I’d pay for anything to help me at this point but I have to understand it better. Is there a video somewhere for this problem? Thanks!

    1. If you don’t have double relationships or pedigree collapse, it works much like WATO: build the tree of the matches, add cM amounts, add a few hypotheses for where your Person of Interest might fit into that tree, then run calculations. You can contact BanyanDNA support at https://forms.gle/RtDL8ykAqT9p8tGC8 if you have additional questions.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.