A New Coverage Estimator at DNA Painter

Coverage is the biggest concept in genetic genealogy that you’ve probably never heard of.  It refers to how much of an ancestor’s autosomal DNA you can “recover” by testing their descendants.  Each descendant only has a fraction of the ancestor’s DNA:  50% of a parent’s, about 25% of a grandparent’s, and so on.  It adds up, though, when you test multiple descendants of that person, like constructing a mosaic image.  It adds up faster when you test the right relatives.

Coverage matters for genealogy because we won’t match all of the DNA relatives that our parents, grandparents, and so on would have.  A relative you don’t match is a relative you might not know to collaborate with or to use in DNA-based proof arguments.  However, some of our other relatives will have inherited different bits of that ancestor’s DNA and might have key matches we don’t.

Coverage is also important for ancestral reconstruction methods, such as those offered by Borland Genetics.  The more unique bits of ancestral DNA represented in the descendants, the better the reconstruction.

Paul Woodbury of Legacy Tree Genealogists first introduced the idea of coverage to the public in a blog post in April, 2018. He’d been teaching about it in genealogy courses for some time before that.  And while it’s a brilliant approach to devising a DNA-based research strategy, it hasn’t gotten the attention it deserves because it’s quite math-y.  Here is Paul’s equation for someone whose five children have tested:

Yikes!

The calculations become even more complex when you add more testers, when they are in different generations, or when they include mixtures of siblings and cousins.

More recently, Nicole Dyer at Family Locket shared a semi-automated coverage calculator based on Paul’s formulas.  It was created in Google Sheets by Laura Clark Murray.  It’s a huge time-saver because it does the math for you, but it is still based on the complex formulas.

 

A More Intuitive Approach

It turns out there’s a simpler, more intuitive way to evaluate coverage.  Here’s a straightforward example:  A child inherits 50% of their mother’s DNA, so testing the child recovers half of her genome.  A second child also inherits half of her DNA, but in that case, 50% plus 50% does not equal 100%.  That’s because siblings share DNA.  Some of Mom’s DNA in the second child has already been recovered by testing the first child.  In fact, testing the second child recovers, on average, only half of what’s missing, or 0.5 x 0.5 = 0.25 = 25%.

The same logic applies to each additional sibling:  they recover about half of what’s still missing from the previous tests.  This is an average rather than a precise figure, of course, because some siblings share more DNA than others.

As you can see, there are diminishing returns.  At some point, testing more siblings might not be worth the expense.  And since most of us aren’t made of money, coverage is an important consideration as we plan our research strategies.

The issue becomes more important the further back in your tree you go.  Consider the siblings above in relation to their grandfather.  Testing five siblings recovered ≈96.9% of their mom’s DNA but just ≈48.5% of Grandpa’s.  That’s because mom inherited half of his genome.  In fact, even if Mom herself tested, she would only recover 50% of his DNA.

If your objective were to recover Grandpa’s DNA, and none of his children are available, it turns out you’re better off testing five first cousins than five siblings.  Each grandchild only has ≈25% of his DNA, but it adds up.  The first grandchild contributes ≈25%, the second ≈25% of the missing ≈75% (or ≈18.8%), and so on.  With just three first cousins, you could recover more of Grandpa’s DNA (≈57.9%) than with the five siblings (≈48.5%).  And for the same amount of money, you could recover ≈76.3% of his DNA with the five-cousin strategy.

Let’s consider one more scenario:  five grandchildren where two of them are siblings.  We can calculate coverage for the first three grandchildren as before, but for the last two, we need to account for the fact that they are siblings.  Two children cover ≈75% of Mom, who has 50% of her father’s DNA, so those last two grandkids only cover 37.5% (half of 75%) of the missing 42.1%, or 15.8% between the two of them.

Easy peasy, right?

Now you try one:

Ha!  Just kidding.  It’s not easy at all.  In fact, the calculations are a total pain in the patootie.  That’s why we need a calculator that’s both infinitely scalable and intuitive to use for the everyday genealogist.  

 

Introducing the “Coverage Estimator” at DNA Painter

The Coverage Estimator allows you to quickly build or import a tree, mark who in the tree has tested, and see the calculated coverage for the ancestor you are researching.  (We call that ancestor your “research subject.”)

Here’s our Thomas example from above.  He’s estimated to have 52.5% coverage from his tested descendants.

If you recognize the backbone of the What Are the Odds tool here, the Coverage Estimator should be fairly intuitive to use.

For detailed, step-by-step instructions on how to use the tool, see Jonny Perl’s blog post here.  Briefly, you build or import a descendant tree, starting with the ancestor whose DNA you are trying to recover, like Thomas.  The tree should include each descendant who has taken a DNA test.  Those get marked as such using a pull-down menu.

The Estimator does its magic on the fly as you set up the tree.  It’s neat to see the coverage total increase as you add testers.

Best of all, the Coverage Estimator recommends the best person (or people) to test next.  This feature works best if your tree includes death years; the tool will not to recommend someone it knows to be deceased.  In Thomas’ case, it suggests testing either Joanne or Marianne to bring coverage up to ≈58%.

You can even note whether the suggested people are willing to test.  If someone declines, or if they agree but their results are not available yet, the tool will make more suggestions.  That option is found under Add/Edit details for each living person in the tree.

When to Use the Coverage Estimator

Coverage is a concern any time you can’t test your research subject and are working with the DNA of their descendants.  Your research subject is the ancestor whose parentage you are investigating.

For example, several years ago I confirmed the biological parents of my great grandfather, Claude Duval LaCoste, using the DNA results of his three grandchildren.  Claude was my research subject.  The tool tells me that the three tests provided about 53% coverage of his DNA.  Fortunately, that was enough!  The tool can’t suggest additional testers in this case because there’s no one else to include.

Paul Woodbury, the originator of the coverage concept, describes five ways to use the Coverage Estimator.  Check it out!

 

Key Points to Remember

There are a few things to keep in mind when using the Coverage Estimator.

  • Consider coverage for every historical project.  That is, any research project that attempts to identify or confirm the parentage of an ancestor who can’t be tested themself (e.g., grandparent, great grandparent, etc.) should include an analysis of coverage.  It’s not necessary to identify your own biological parent(s); your own DNA test represents 100% coverage for yourself.
  • The coverage percentages are just estimates.  The true amount of an ancestor’s DNA represented by their tested descendants will vary because inheritance is random.  In other words, the existing tests may cover more or less of the ancestor’s DNA than the tool reports, and it can’t guarantee how much a suggested tester will add.
  • Coverage is specific to each database.  If you’ve tested at AncestryDNA and your sister tested at 23andMe, you still only have 50% coverage of your parent in each database.  To accrue the benefits of coverage, the descendants should be in the same DNA database and you should have access to those results.
  • Use the Coverage Estimator to guide your next steps.  An analysis of coverage can help you decide whom to test next, whom to share DNA access with, and whom to recruit to a secondary database like MyHeritage or for ancestral reconstruction.  That is, if you are at AncestryDNA and your sister is at 23andMe, if you both transfer to MyHeritage, you will have 75% coverage of your parent there (but not in the two original databases).

 

9 thoughts on “A New Coverage Estimator at DNA Painter”

  1. Thanks Jonny. I will try and use this tool to figure out coverage on 2 major brick walls. My maternal great grandfather (need to find his parents) and paternal 2nd great grandfather (need to find his parents).
    I am always willing to try any new tool to break down these long time brick walls.
    Diane

  2. The adoptee or person working on their own often has no other option than to try to sort things out from themselves. Matches can certainly add up segments, but only if they have some DNA at a chromosome browser site. Even then, the picture is patchy: even with the help of a close relative’s DNA I have less than 50% and massive gaps. Using siblings and close cousins can enable most of one’s four grandparental segments to be mapped. Any later segment in that area has a head start in helping identify a CA for that match.
    Either way, coverage is important. And a coverage estimator can help select relatives to persuade to test, or to pay for their test.
    I still have the problem of some matches connecting on more than one of my ancestral lines, or having a cousin marriage. I have been able to calculate the %relationship and hence a likely cM range. But other people might need help. A problem for another day!

  3. Math is scary stuff, but this is quite fascinating!

    Thank you for sharing this with us. I will definitely check out the new tool. I have a great-grandmother with a paternity issue(but a potential paternal surname and matches to go with it) and a 2nd great-grandmother who was apparently dropped off by an alien spaceship to earth. I have no idea what else to do to find the names of her parents. My Ancestry match list certainly hasn’t helped.

  4. Trying to re-create the DNA of an ancestor sounds great if you discover something like a tooth, a hair or a stamp with saliva residue on from which her/his full DNA might be recovered to help be really confident about what their complete profile is & avoid risking confusion with someone else’s DNA.
    Other aspects of the DNA record though, such as endeavouring to compensate for gaps (perhaps with generational leap-frogging) where privacy rights have been used to hide DNA matches & effectively remove DNA data out from the public domain, looks at least as interesting; as does the new pan genome effort & the intended refinement iterations of it for improving accuracy.
    Are there any methods (generational leap-frog?) by which the evidential gaps caused in the DNA by the privacy rights that might enable such DNA knowledge to nevertheless be used for genealogical research without revealing the identity of an originator who wishes to remain private?
    Can the pan genome programme’s impact be calculated for improving the accuracy of DNA matching?
    Thank you.

    1. There are methods to infer a genetic profile by using the DNA kits of their descendants, but doing so for someone who is still alive and has chosen not to test is an interesting ethical question. Is that fundamentally different from analyzing their DNA directly without their consent?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.