The genetic genealogy testing companies were out in force at the i4GG conference this past weekend in San Diego! Representatives from Living DNA and MyHeritage gave hour-long talks on Saturday, and people from Family Tree DNA, AncestryDNA, and 23andMe spoke on Sunday. (Note: I earn a small commission if you purchase through the links in this post. The cost is the same for you. Click here for more information.)
They all gave polished and informative presentations, and I made a point to see them all so that I could report back to my readers. This is the second in a series of five (in the order of the original presentations) on what they had to say.
Oron Navon, Senior Bioinformatics Scientist at MyHeritageDNA
Oron Navon flew all the way from Israel to present to us at the i4GG conference, and I’m very glad he did. In addition to talking about updates that we’ll be seeing soon, he presented some fun analyses showing what we can learn from large-scale datasets.
MyHeritage was founded in 2003 as an online genealogy company. It is now a multi-language platform, with Amharic soon to be added to the 43 languages currently represented. Their DNA product launched in November 2016 and currently has about 700,000 people tested, with thousands being added daily.
Relative Matching at MyHeritage
Thus far, their relative matching system has been inconsistent compared to other companies and earned them some criticism. I reviewed their matching here. Mr Navon was open about the problems, describing one example of half siblings in detail, and explained how they will be addressed. I respect him, and the company, for being candid. The new matching system is a major overhaul of both the phasing process and imputation. (Phasing distinguishes the maternal copy of each chromosome from the paternal one, and imputation makes educated guesses about DNA markers that weren’t actually tested. Imputation, he said, is not perfect, but it’s pretty good.) The revamped matching system at MyHeritage will find more matches more accurately. He showed a side-by-side comparison of the half sibling example with the existing and new systems, and the improvement was substantial. The new matching algorithm is undergoing final testing and validation now. It should be out for all MyHeritageDNA users by year’s end.
Fun with Big Data
The next part of the talk was an exploration of the cool things you can do with large datasets. MyHeritage scientists are compiling massive genealogies from public trees at Geni.com, a MyHeritage subsidiary. After individual trees are screened for consistency and cleaned up of obvious errors, they are merged. The largest merged tree has 13 million people in it! (Of course, a tree that large will have errors, but the idea behind massive datasets is that the “noise” gets swamped out by “signal”, such that the errors have only minor effect.)
Once the large family trees had been assembled, the MyHeritage scientists could use them to explore. One thing they looked at was the mismatch rates in mtDNA and yDNA; that is, how often did the expected haplogroup not align with the actual one? The mismatch rates were 0.3% and 2.0%, respectively. The 2% figure accords well with other studies of misattributed paternity.
Another use for the massive trees was to examine migration routes in the last millenium. I can’t do justice to this part of the talk without the map visuals, but it was really neat to see how families that originated in Europe spread, first within that continent and then elsewhere. It’s easy to extrapolate those migratory patterns to how our (much) earlier ancestors expanded from Africa.
Another spin on this information is that, as travel got easier, the distance between the birthplaces of mated partners increased. In the early 1700s, people tended to marry someone from nearby, but that’s much less true today. The MyHeritage scientists asked which partner was more likely to move following marriage: the husband or the wife. Turns out, it’s the women!
New Feature: Surveys
Finally, Mr Navon described a new feature at MyHeritage: Surveys. These are described on the website as “a comprehensive research project to study the relationship between genetics and behavior, personal characteristics, and culture.” The results, when combined with our DNA data, may lead to scientific discoveries and academic publications. The company assures users that “We will never release your individual-level data to any third party without asking for and receiving your explicit authorization to do so.”
These analyses aren’t likely to break down any specific brick wall, but they get the GEEK stamp of approval for creativity. Knowing their scientists have the leeway to innovate on company time suggests to me that MyHeritage will come out with novel tools in the future.
Other Posts in this Series
You can see what the other companies had to say at i4GG by following these links: