The autosomal tests we do for genealogy examine more than 500,000 SNPs (individual units) of DNA. That’s a lot of data! Have you ever wondered how accurate the results are?
I’m not referring to how precise the ethnicity estimates are or how reliable the relationship predictions are; those are interpretations of the test results. I’m referring to something more fundamental: the raw data itself. After all, if the raw data are inaccurate, the downstream analyses could be affected, as well.
In fact, I was inspired to do this survey because I came across a case where a high error rate in the raw data led to very unusual results, so unusual that another genealogist had misidentified an adoptee’s birth father.
One estimate of the error rate in this type of test is the “no-call” rate. The no-call rate is the percentage of SNPs without a reported value. No-call rates can range from a fraction of one percent to five percent or more.
GEDmatch has a “Diagnostic Utility” tool that can quickly tell you the no-call rate. To use it, log into your account at GEDmatch and click on the tool as shown in the screenshot.
Once you’ve run the tool, please answer the Error Rate Survey to crowd-source detailed information about the no-call rates at the testing companies. You can answer the survey multiple times, once per kit.
Thanks!
Entered my results from 2 kits, 2 different testing Co.’s, both for sae person & both taken 2 months apart from each other. One was 0.2xxxx% and the other was 1.9xxxx%! Enlightening, thank you.
For reference, what is a “high rate” that we should be wary of? Or do we know?
The error rate for the kit I mention in the post was about 4.5%. I’m not sure we have a good handle on how high is too high.
My no-call rates are:
23 and Me: .32%
FTDNA: .44%
Ancestry: 1.35%
Thank you!
Interesting. So far my Ancestry kits have a lower no call rate than my FTDNA kits. I’ll spend the weekend uploading more Ancestry kits for those who have tested at both sites.
I’m seeing lower no-call rates for Ancestry v2 than v1 and for 23andMe v4 than v3, which is pretty interesting.
Hi, I too have uploaded results for 7 kits that I manage, including two tests I have done, with FTDNA and Ancestry. I noticed a few interesting things:
1. The only two kits with large errors and identified problems were from my wife and her mother. I don’t know what causes errors, whether it is only the processing or whether it could be something in the DNA itself, in which case mother and daughter having errors and problems may be interesting.
2. Ancestry tested fewer locations but had lower error rates. I’ll be interested to see if that turns out to be typical.
I look forward to seeing what you make of all this.
Thanks!
The pattern I’m noticing is that more recent FTDNA kits are running at 4.4%.
I’ve submitted my own Ancestry/FTDNA/23andme, all minimal error rate at < 0.5%, all from some time ago, and three recent FTDNA, all of which showed 4.4%.
One of these three I rechecked with a reload of build36 and build37 files to see if there was any difference. Not really. (the surplus kits have since been deleted again)
Also a 2018 Ancestry, again minimal.
I have also asked FTDNA for their comments on two of them (the third I only thought to check afterwards).
How do we tell if the no call is a GEDMatch processing error or a nocall from FTDNA?
Interesting observation. I didn’t collect dates in the survey, so I can’t tell whether there’s been a change at FTDNA. For AncestryDNA and 23andMe, I can get a date range based on the version number.
And to answer your question about the source of the no-calls, you can upzip the raw data file and open it in Excel to have Excel count the no-calls for you. The formula is =countif(range,”–“), where range is the column with the base calls. I only checked one kit with a high no-call rate, but it was pretty much the same in the raw data as at GEDmatch.
Thanks so much for crunching these numbers.
Do you have a survey link for the number and size of segments for the people who are showing possible half-sib, grandparent and nibling relationships? I am very interested in seeing how that will chart with a larger sampling. I would love to see how that wonderful chart is working. It certainly has helped me. I love reading your blog. One of the best on the internet.
Thank you so much for the kind words! Kitty Cooper has been collecting data on 2nd degree relationships (half sib, grands, niblings). Here’s a link: https://blog.kittycooper.com/2017/09/the-25-relationship-a-first-look-at-the-data/
Thank you so much for pointing us to Kitty’s data. I had no idea she was collecting that or I would have made a submission.
And I too will be interested in seeing a chart for the results of this no call survey.
As an adoptee, I am always looking for any “edge” I can get for verifying my DNA matches. Not only do adoptees not have family stories to guide them (sometimes a good thing) but we frequently don’t have the luxury of open communications with our matches until we gather enough confidence to refer to a mutually known family member to “introduce” ourselves in order to elicit trust. And, I’ve learned in the 2 years that I’ve been searching to trust the emerging DNA citizen science above all else.
What you said about open communication in the early stages of a search is so true! Once you’ve found some connections among your matches, you can reach out and say “Hi, we match as Nth cousins. Are you descended from John Smith and Mary Jones of Nebraska?” Before that, you don’t have a “hook” for making a connection.
Exactly!
Now is as good a time as any to also say to never give up searching. I’m 66, only discovered I was adopted at age 64, found my deceased birth Dad right away with a half-sib match, but found my birth Mom the end of this March. Celebrating her 85th birthday! Going to visit her in 2 weeks, voice recorder, camera, notepad, and flowers in hand!
Oh my gosh, I’m so happy for you! Congratulations!
Ancestry Tests before 2016
Gedmatch SNPs for me: 678756. No calls: 1.85%
Gedmatch SNPs for husband: 454389. No calls: .10%
Does endogamy play a roll here. My husband is Jewish. His grandparents were 1st cousins from a very inbred area in Hungary. Any other thoughts on the difference in SNPs?
Looks like you tested on Ancestry’s v1 chip and your husband tested on the v2 chip. So far, I’m seeing huge improvements in the no-call rate for both AncestryDNA and 23andMe when they moved to newer chips.
I haven’t figured out how to do GedMatch yet, but I will say that not everything can be accurate, because my identical twin and I have some distant cousins that only one of us matches with on Ancestry. Our ethnicity differs by several points too. It was explained to me on the ethnicity that in any test some segments are unreadable, which causes a variation.
The tests look at 500,000 to 700,000 individual sites (called SNPs) in our DNA, so they’re bound to make some mistakes. You and your identical twin will have mistakes in different spots, which will have slight effects on both matching and ethnicity estimates. Your overall estimates should and all but the distant matches should be in agreement, though.
My mothers and my V2 Ancestry kits had very low no calls. 0.0977% and 0.0795%.
Six kits submitted: 2 x Ancestry (v2), FTDNA and My Heritage. My Heritage consistently higher but at around 2-3%.
Very interested in what the results look like overall.
Karen
FTDNA and MyHeritage should be similar, as they’re run in the same lab.
Hi I recently did a 23 and me test and I downloaded my results to Gedmatch. On 23 and me I got no Italian, but on Gedmatch it said 10.03 percent Italian. Do this really mean I am really Italian? Thanks
I can’t speak to the accuracy of the admixture calculators at GEDmatch.
Hi, I realise I’m a bit late to the party here, but I just found this page as I was curious about the how many errors occur during the initial raw data process so had a google.
I’ve clicked on the ‘DNA File Diagnostic Utility’ but it seems they no longer publish a ‘no-call’ rate in the analysis.
The reason I’m checking is because I had myself, both my kids, parents and siblings tested with MyHeritage. On the MyHeritage ‘DNA Matches’ page, it contains a list of all your nearest matches and how much DNA you share with each in terms of percentage. What I noticed was, the amount of shared DNA between my parents and I was very close to expected at 49.9% and 49.8% (I figured 0.2% discrepancy was quite acceptable in terms of error). My siblings also showed the same closeness with my parents. Between my son and I, it was also 49.9% shared DNA, so this is all quite consistent. However my daughter showed only a 47.7% shared DNA match with me. That seems not only inconsistent, but as if some error occurred during the raw data process for her kit. Do you have any thoughts on this?
Much Thanks,
Dan
Did all of you test directly with MyHeritage or did some of you upload data from another testing company?