Power-Up Your DNA Analyses

AncestryDNA recently introduced Enhanced Shared Matches, a new feature in their Pro Tools add-on subscription that is taking the DNA world by storm.  Pro Tools has some nifty tree-checker tools and reports, but genetic genealogists are most excited by the ability to see how much DNA our matches share with one another.  Other databases, like 23andMe, MyHeritage, and GEDmatch, have had similar features for years, but the sheer size of AncestryDNA’s database makes ESM especially exciting.

Here’s a snippet of what you can see:

This information is proving invaluable for placing those frustrating mystery matches into our trees.  For example, I have a DNA match to my mom’s first cousin James and also to a younger man with the same surname.  Because his shared DNA amount is ambiguous, I haven’t been able to tell whether the younger man was James’ son or grandson, and he hasn’t responded to messages.

James shares 332 cM with me and 1,994 cM with the younger man.  With Pro Tools, I can easily see that the younger man is a grandson.

Sometimes, we also get lucky and a mystery match has a close relative with a tree, allowing us to piggyback the mystery match into our own tree.

A Match Made in Heaven

While these uses are exciting, they’re just the tip of the DNA iceberg.  The real power-up is in combining Enhanced Shared Matches with BanyanDNA.  This is truly a DNA match made in heaven.  (Pun intended!  Full disclosure:  I am a partner in the BanyanDNA business.)

BanyanDNA is a bit like the What Are the Odds? (WATO) tool in that you can create a tree and test out various relationship scenarios to see which best aligns with the shared DNA amounts.  For example, you might use WATO to identify which of three men was your unknown grandfather based on how much DNA each man’s descendants share with you.

BanyanDNA is a different in a few ways.  First, it is not limited to a single descendant lineage.  That is, if you are descended from Cleo and Effie, it can incorporate DNA matches who are related to you through both Cleo’s siblings and cousins as well as Effie’s siblings and cousins.

Second, BanyanDNA can handle multiple relationships, like pedigree collapse and double cousins, whereas WATO cannot.  Basically, if you can map out the relationships and build them into your tree, BanyanDNA can analyze it.

Third, BanyanDNA can analyze shared DNA between any two people in the tree, not just the DNA matches of one target person.  That means we can use the information from Enhanced Shared Matches to validate that our trees are biologically correct.

Validation in BanyanDNA

Let’s look at an example.  This anonymized project shows ten tested descendants of George and Jessie, early settlers in north Texas.  Those are the purple nodes.  The question marks represent places where the tree has not yet been validated.

Thanks to Enhanced Shared Matches, I am able to see how much DNA everyone in this project shares with everyone else and enter that information into BanyanDNA.  When I click on any tester in the project, I can see their shared amount to each other tester inside the small purple “flags” at the top left of the nodes.  To demonstrate, this screenshot shows the shared DNA amounts for three of the ten descendants, #2, #3, and #10.

I can now ask BanyanDNA to analyze all of this information at once.

To perform a calculation, I click the calculator icon in the side panel (see the first BanyanDNA screenshot above), select “Validation,” then tell BanyanDNA how many trials to run.  (BanyanDNA can also do “hypotheses” runs, which work similar to WATO.)

A trial is a computer simulation of the shared DNA amounts in the tree I built into the tool.  One thousand trials is enough for everyday use, and you can run up to 10,000 for precision work, like a final proof argument or a client project.

BanyanDNA will compare the actual shared DNA amounts to the expected values from the trials and alert me to possible errors in the tree.  For each pairwise match, BanyanDNA reports the relationship in the tree (if there were more than one relationship between two people, it would list all of them), the actual amount of shared DNA, the expected average, and a typical range (±1 standard deviation for the math nerds; you know who you are.)  Finally, it indicates how far off the actual value is from the expected one in the Num. SDs column.

The output looks like this:

Focus on the last column of numbers.  In the screenshot above, the second and last pairs are pretty close to the expected amounts (only 0.2 and 0.4 standard deviations, respectively), so those relationships are probably correct.  The first and fourth pairs are a little outside the common range (1.2 SDs), but as long as the other matches for those individuals are in range, there’s no cause for concern.  That fourth pair, however, is a red flag.  A match that’s more than 2 SDs from expected is an outlier and may indicate an error in the tree.

In validating the tree, I start with the low-hanging fruit.  I can easily see that #6 and #7 are full siblings, and they both match #5 in the first cousin range (≈900 and 850 cM, respectively).  Similarly, #8 and #9 share ≈1950 cM, in range for aunt–nephew.  All five of these people match the others in the tree.  Working through the tree branch by branch, I am able to validate all but two of them.  Here, I use emojis to mark the branches that were successfully validated.

Two branches, however, are problematic:  those for #4 and #10.  All nine of the matches to #4 are greater than 1 SD, while only one of #10’s matches is below 1 SD.  It’s time to revisit the tree.

Fortunately, both cases involved easy fixes.  For #4, I inadvertently skipped a generation when building the tree; she is actually the great granddaughter of George and Jessie rather than the granddaughter.  And #10 is descended from George’s second wife rather than Jessie.

Here’s the tree after I’ve corrected those errors.

This tree yields an excellent validation.  Only nine of the 45 pairwise matches (20%) have a standard deviation greater than 1, and the highest is 1.4.  It’s normal for roughly a third of the data to be outside one standard deviation and for about 5% to be outside two standard deviations, so this revised tree appears to be biologically correct.

Having a validated tree is essential because this project aims to identify an unknown great grandmother.  I have a candidate—George’s niece, Mary—who was known to have given up at least one child for adoption.  However, there are no living candidates to test to confirm that hypothesis, so the argument will rest on matches to 3rd and 4th cousins.  In addition, there’s some pedigree collapse in Mary’s tree and the analyses will be particular sensitive to any misplaced matches.

BanyanDNA is freemium software, meaning that it’s basic features are free to use.  The example shown here can be done in a free account.  Subscribers have more bells and whistles at their disposal.  There is extensive documentation on the website and a Facebook User Group.  I also offer a class on BanyanDNA (and other topics!) through my educational series.

5 thoughts on “Power-Up Your DNA Analyses”

    1. Yes, BanyanDNA can import gedcoms, although it’s currently a beta feature and only available by opt-in to premium users. Once we work out some of the kinks, we’ll make that available to everyone.

  1. Thanks for your quick work illustrating how these two new tools, Ancestry’s Pro Tools and Banyan DNA, can work together. I’ll be passing this post along, and expanding my Banyan tree too!

    I solved a new mystery match just last night – brothers!

  2. I have a situation in my Irish ancestral family (fairly common, from what I’ve seen, given the state of early Irish records), that I don’t think even BanyanDNA is yet able to assist with. I have many matches that appear to trace back to a common ancestor, but we’ll probably never know who that ancestor was. The best we can hope for is to figure out how the earliest ancestors we know about (generally 3rd-4th great-grandparents) are related to each other, given the cross-matches among the descendants.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.