Is Your Family Tree Biologically Correct?

You know you’ve wondered; we all have:  Is my family tree biologically correct?  All of it?

I can easily prove that my social parents are also my biological parents because both have done DNA tests and match me as expected.

I can just as easily conclude that all four of my grandparents are who I think they are thanks to DNA matches to an uncle and some closer cousins.

But my confidence level decreases with each generation back as the DNA matches become more distant and the shared DNA amounts less predictive of specific relationships.  For example, I have a 2nd cousin once removed (2C1R) who shares only 19 cM with me.  According to the Shared cM Tool at DNA Painter, it’s far more likely that they are a distant cousin than 2C1R.  Is my tree wrong?  Is theirs?  Or is our match just an outlier.

DNA-based confidence in a pedigree declines with each generation.

 

To answer these questions, I need more match data for more people.  I also need a way to analyze all of that data at the same time.

Tree Validation with BanyanDNA

This is where BanyanDNA shines.  BanyanDNA is a new tool for genetic genealogy that is customized to your family tree, including cases of pedigree collapse and double cousins.  It has three main uses:  visualizing complex trees, validating biological relationships, and hypothesis testing for unknown parentage.  This post will showcase the validation features.

(Full disclosure:  I am a partner in the BanyanDNA business.)

BanyanDNA is unique among all genetic genealogy tools in that it can analyze multiple DNA kits at the same time.  This is fabulous news for those of us who manage the DNA results of other relatives.  BanyanDNA can even be used for match data from sites like MyHeritage and 23andMe, which show us how much DNA our matches share with one another, even when we don’t have direct access to their full match lists.   (Heads up:  AncestryDNA recently announced that they will soon offer us this information, as well!)

For example, on one side of my family, I have access to my own kits at AncestryDNA, 23andMe, and MyHeritage; my parent’s at 23andMe and MyHeritage; and a 1C1R’s at AncestryDNA.  I can plug all of that match data into a single analysis to see how it all fits the tree.

The images below show the tree in BanyanDNA from the perspective of each of the three of us.  In the upper image, you can see that I have match data (shown in the small “flags” at the top left of each of match) to everyone who has tested (the purple nodes).  The second panel shows a single match for my parent, who is only at 23andMe.  The third panel shows my 1C1R’s matches at AncestryDNA.

Once I have the tree and enter the data, I can ask BanyanDNA to analyze all of these matches at once.  It will perform up to 10,000 computer simulations that model the inheritance of DNA from the ancestral couple down to each of the DNA testers.  Then, it will compare the actual match data that I entered with the expected results from the simulations.

The first few lines of the output look like this.  For each match, BanyanDNA tells me the relationship in the tree, the actual amount of shared DNA, the average amount of DNA expected for the relationship, and a common range (±1 standard deviation for you math nerds).  It also tells me the number of standard deviations (SDs).

Simply put, the number of SDs is a measure of how far away the actual value is from the expected average.  Generally speaking, if a match is less than 1 SD from the expected value, there’s little to worry about.  If it’s 1–2 SDs from expected, it’s worth a closer look but not a huge concern.  And more than 2 SDs requires careful scrutiny before it can be accepted.

The screenshot above shows a subset of the results.  I had a total of 15 pairwise matches in this analysis.  Most of them were below 1 SD, so I am confident that my 1C1R is really my 1C1R and that these great grandparents were who I think they were.  However, my troublesome 2C1R, in the third row, is 1.7 SD from the expected average (actual: 19 cM, expected: 101 cM).  That’s not enough to say for sure that we aren’t really 2C1R, but it’s definitely worth a reappraisal.

The best way to reappraise is to throw more data at the problem.  In this case, I can also look at the match between my 1C1R (fourth row in the screenshot above, 399 cM and 0.3 SD to me) and Troublesome.  On paper, they are 1C2R to one another and share 160 cM.  That’s lower than the expected average of 213 cM but less than 1 SD.

Overall, this doesn’t definitively prove that Troublesome is my full 2C1R, but it gives me a little more confidence that they could simply be a low-matching cousin rather than an unexpected half relative.

Next Steps

BanyanDNA also supports hypothesis testing, so I can ask it to evaluate directly whether Troublesome is a full relative (one hypothesis) or half relative (a competing hypothesis) using Troublesome’s matches to myself and my 1C1R.  That’s a topic for a future post.  With only two matches to Troublesome, the results aren’t statistically meaningful yet.

Once AncestryDNA starts showing me how much DNA Troublesome shares with the other testers in the family, I can reassess whether Troublesome’s great grandparent was a half-sibling.  Time (and more data) will tell!

Updates to This Post

  • 20 Jun 2024 — Corrected the number of standard deviations between myself to Troublesome from 2.7 (a typo) to the actual value of 1.7.

22 thoughts on “Is Your Family Tree Biologically Correct?”

  1. BanyanDNA sounds very exciting! I am curious, however, about how it would work in endogamous populations? My paternal line is from a fishing village in Scotland who all share the same surnames and all intermarried over many generations!

    1. Endogamy is a major motivating force for BanyanDNA! Right now, it can accommodate anything you can build into the tree. That is, if you know the “extra” relationships between people, BanyanDNA can analyze it. To address those unknown extra connections——the ones that pre-date the documented tree——we’ll need to customize the analysis to each population. To that end, we’re collecting data from volunteers. You can contribute here if you like: https://forms.gle/Z7EpTC5WUUcQkmud7 We accept data for any population, as long as all four grandparents of the tester were from that population.

    2. This 2C1R Path actually expects a 19 cM match.
      You could research this path to see if it is the path that matches your understanding of your family’s history. When I write “he is your” or “you are his”, take it with a grain of salt – I know DNA could do anything and this might not be how you are related. I am just saying that there is 1 solitary path that actually expects a 2C1R to share anywhere from negative 12.66 – 87.18 because people related on that path can share between 45-90 cM on X that is being deducted from the total amount of DNA they share and it is the total amount that falls within the normal shared cM amount for 2C1R. People related on this path could potentially share less than 7 cM autosomal and so might not match at all at Ancestry but would match at 23 and Me. This would be one other thing you could check into when you have a match that seems like some kind of outlier. It is just something else to try. The tool I made finds these paths in seconds, so it can’t hurt to try.
      You are Tester 1, Female
      He is Tester 2, Male
      He is your 2nd Cousin 1R Type B, (Younger)
      He is your FATHER’s MOTHER’s SISTER’s SON’s DAUGHTER’s SON.
      You’re his MOTHER’s FATHER’s MOTHER’s SISTER’s SON’s DAUGHTER

      The path is index number 16018
      06th degree relative
      average of 1.56% shared DNA,
      low cM range at Acestry 77.34
      high cM range at Ancestry 154.68
      you share 45 to 90 cM on X that is not being reported by Ancestry mid range on X would be 67.5
      the cM range at Ancestry is (negative) -12.66 to 87.18
      No haplogroups shared
      Your MRCA category is 1GGP, your specific MRCA is FM-F&M
      His MRCA category is 2GGP, his specific MRCA is MFM-F&M

      1. What do you mean by “expects”? While 19 cM is possible for 2C1R, it is not likely. More to the point, 2C1R is not likely for 19 cM. When faced with an unlikely outcome, our duty as scientists is to rule out alternative explanations before accepting a conclusion.

        I am concerned by this statement: “1 solitary path that actually expects a 2C1R to share anywhere from negative 12.66 – 87.18.” What do you mean by –12.66 cM? It is not possible for two people to share negative DNA. Also, what do you mean by X-DNA being deducted from the total amount of shared DNA? X-DNA is not deducted from the expected autosomal share.

  2. Quote you: ” I have a 2nd cousin once removed (2C1R) who shares only 19 cM with me. According to the Shared cM Tool at DNA Painter, it’s far more likely that they are a distant cousin than 2C1R. Is my tree wrong? Is theirs? Or is our match just an outlier.” Unquote.

    Have you charted your GENEALOGY? The depiction or image of the people in you ancestry will have been arrived at by PAPER research. There will be holes in the descendants of all those many lineages. Genetic tests might indicate a family of interest, but somehow only documentation, [B. ,M., D.], can show who is related or connected to whom.
    Matches are simply people who have tested who also contain some DNA that you have.
    The nature DNA recombination of the 22 autosomes over 4-generations (GtGtGP = 16 family lines; [3c]) means that even biological siblings can show no match.
    I have five cases or 3c-1r having values in range of 7cM, 37cM, 76cM, 112cM, 200cM. Go figure. (See . I have paper records of all this: B., M., D., Burial, Census, School. All on FTDNA and confirmed on Ancestry and MyHeritage where those ‘matches’ have also tested.

    1. Yes, of course I’ve charted my genealogy using records. Records can be wrong, though. On paper, Troublesome is my full 2C1R. The question is whether the low DNA match suggests a half-relation that we were not previously aware of.

  3. As you know, my maternal side is a mess and too hard for me to work with. But I am wondering whether I should try this with my paternal side. But there are no first cousins on that side at all (my father’s sister had no children) and as far as I know, no second cousins who have tested on any site. There are some third cousins and even more distant cousins plus my brother and me. Is it even worth trying to use Banyan to test the apparently bio relatives on that side, or do I need closer matches to make it worthwhile? And, of course, endogamy is a factor on my father’s side as well as my mother’s side. (Also, I believe I’ve filled out the survey already with both maternal and paternal matches. If I haven’t, let me know.)

    1. Sure, it’s worth a shot, with the proviso that as the relationships become more distant (e.g., 3C and further), autosomal DNA is less useful.

  4. I have tried the free version, and it’s very time consuming and clunky. I would love to try the beta version allowing gedcom imports, but I will not pay $75 to try it. The beta should be available to try for free. JMHO.

  5. I have tried it but am not on Facebook to go to the group for help. I am trying to figure out my grand uncle’s father and my father in laws but the example used in the quick set up for double cousin is not helping me figure out how to use it in these scenarios. Both have what I believe are half first cousins that I am trying to work with..(369 one and the other 305). Do I just put them where I think they should go or use the hypothesis? I’d pay for anything to help me at this point but I have to understand it better. Is there a video somewhere for this problem? Thanks!

    1. If you don’t have double relationships or pedigree collapse, it works much like WATO: build the tree of the matches, add cM amounts, add a few hypotheses for where your Person of Interest might fit into that tree, then run calculations. You can contact BanyanDNA support at https://forms.gle/RtDL8ykAqT9p8tGC8 if you have additional questions.

  6. Fantastic! I wish I had this two years ago when I was working on a complex adoption case where I was trying to identify bio mom and dad. I have already included this in my toolbox and can use this tool to essentially affirm my findings. I prefer to confirm my findings with genealogy, and two different DNA tests if possible like an Ancestry.com atDNA test corroborated with a mtDNA for mom or a YDNA test for dad, if I can get them to test. Your work is always innovative and helpful.

  7. I need to go farther back in my tree to find the siblings and parents of my 3 great Hawaiian grandparents. Does Banyan have this capability?

    1. BanyanDNA can handle anything you can build into your tree. It can’t build the tree for you, though, nor can it account for complex relationships that pre-date the tree.

    2. Rebecca. Your quest is Genealogical, not genetic.
      You require a local program on a computer to build a tree that includes your Hawaiian folk. Genealogy means paper records – births, marriage, death & seeing the Dates & Places & relationships mentioned thereon. That is the only way you can positively identify the actual people in your tree – your tree.
      If you have matches with trees that include your folk, follow those up with Contacting them, Genealogy and Gedcom downloads.
      DNA is about living folk and recently deceased tested generations only. No names, dates place, generations are indicated in a DNA test.

  8. I disagree with you completely. I have an extensive tree. Mahalo.

    Perhaps you are not aware of the complexities of Hawaiian mo’okuʻauhau (geneaology). If moʻokūʻauhau was only genealogical, there would be absolutely no use for Banyan DNA.

  9. I have a 1/2 2C match that I share 19 cM also. I know exactly how we are related. We are also 3rd cousins. 19 cM can be an expected amount (to me based on my calculations) for a 2C1R match if that 19 cM is Autosomal AND the testers also share enough centimorgans on their X chromosome that the total shared amount of DNA reaches at least the low end of the range for 2C1R. There are only 256 2C1R relationship paths and 128 of them start with Tester 1 being a woman. If you are 2C1R you will have to be related to that relative on 1/128 specific relationship paths and you are expected to share an average of 1/128th of your DNA with them. 1/128= an average of 1.56% shared for 2C1R and all other relationship categories in the 6th degree because they all have exactly 128 paths. 1.56% of your centimorgans at Ancestry, is different from 1.56% of your centimorgans at 23 and me because Ancestry starts with a lower total amount of cM than 23 and Me. The average cM for 2C1R at Ancestry would be about 102 and at 23 and Me it would be about 115 (not a lot of difference). The expected range for 2C1R range at Ancestry would be 77.22-154.43 and at 23 and Me it would be 86.58-173.15; 2C1R are expected to share an amount of DNA captured within those ranges. The new Banyan tool is calculating how far afield the shared amount for a category is from the center of a range that splits the difference between all the major testing company’s total cM number (looks like from the examples you gave above). I have combed through 23 and Me’s explainations of how they estimate relationships and they very clearly state that they use the total amount of autosomal DNA centimorgans and the total amount of centimorgans shared on X to arrive at the percentage of DNA testers share and from there they base their relationship estimate on what average % the testers total shared percentage is closest to. A very logical approach. Their estimated relationship is not based on autosomal centimorgans alone and they don’t advise customers to exclude centimorgans shared on the x chromosome when doing their own relationship determination exercises. I receive a lot of push back in facebook groups when I mention including centimorgans on X to obtain a total shared amount of DNA for use in determining which average % their shared amount is closest to.

    You are a female and will share 1/128 of your total centimorgans with your 2C1R which should be somewhere in the range of 86.58-173.15 at 23 and Me. We know you share 19 cM of Autosomal DNA so subtract 19 from 86.58 and you would need at least 67.58 cM of centimorgans shared on X to share enough centimorgans to place you at the bottom of the range for 2C1R. You could share more than that, but you would need at least that much extra centimorgans to be at the low end of the 2C1R range. What are your chances of sharing any X DNA with a 2C1R if you are a woman? 49/128 relationship paths are X paths. You have a 38% chance of sharing 5.62 to 90 cM more centimorgans of DNA than ancestry is reporting to you. 90 cM is greater than the low end of the 2C1R range, that is where that negative number comes in. It does not really mean that testers could share negative DNA; translate it to mean that there is the potential for some 2C1R to only share X DNA, up to 67.5 and possibly even 90 cM. Reverse engineer the Ancestry results to see if you might be related on a path that shares 58 + cM on X, if you are not related on a path that that could add to your total centimorgans then I would say that you are dealing with a number that is way out of bounds for a 2C1R. As it is there is one single path that is perfect and several others that would add enough centimorgans to bring the total into range for the 2C1R range that the Banyan tool called out in one of your examples that 44 cM number, there are a few paths that would add enough DNA to get you to the low end of that range.
    Can I ask why the common range and average is different is not the same for both 2C1R in your example? Wouldn’t the standard deviation be calculated against a standard amount for all 2C1R relationships?
    Also if you will humor me in responding to a few questions about this 2C1R match:
    Is it correct to assume that it was you who designated this match as a 2C1R based on your family knowledge and were surprised at the 19 cM number? 19 cM at Ancestry would not have generated a 2C1R estimate from them so I am wondering how you came to the conclusion that you have a 2C match that shares 19 cM.
    Also if you do know how this match is related or if you have spoken to them enough and could answer some questions –
    1) Is the match male or female?
    2) Is the match on your father’s or mother’s side of the family?
    3) Is the match on their father or mother’s side of their family?
    4) Are they for sure 1 generation removed or are you guessing based on their age vs yours?
    5) If they are 1 generation away from you are you older or younger than they are?
    6) Can you tell more than just what side of the family the match is on? Like Father’s Mother’s or Mother’s Mother’s etc?
    7) Can they tell more than just the side of their family?
    8) Do you have a MRCA category from your perspective 1GGP or 2GGP? Do they?
    9) Do you have a specific MRCA identified like FF-F&M for a 1GGP?
    Thanks I hope I answered your questions and also hope you’ll answer mine

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.