The Zoo Problem
Imagine a 5-year-old telling you about their trip to the zoo. They might, in breathless excitement, report that the camels are across from the lions and the tortoises are across from the bug house and the lemurs are across from the otters! But that wouldn’t tell you which animals are on the same side of the path. If we know which are adjacent on one side we can figure out the other side, but the tyke hasn’t given us enough information. Is it camel–tortoise–lemur? Or camel–bug–lemur? Or camel–tortoise–otter? Or camel–bug–otter?
The zoo problem illustrates the challenge of analyzing our autosomal DNA results, only the DNA testing companies have it about 700,000 times worse. That’s roughly how many bits of DNA, called SNPs, they test.
Recall that we have two copies of each autosomal chromosome, one inherited from mom and one from dad. Thus, for each spot in our DNA, we have either one or two different versions of the DNA bases A, C, G, and T. The technology used to analyze our samples might report that you have A & G at Position 1, C & T at Position 2, A & T at Position 3, and two Gs at Position 4. Alas, as with our animal-loving kindergartener, they can’t tell whether you inherited A-C-A-G, A-C-T-G, A-T-A-G, or A-T-T-G from the same parent.
Partitioning the sides of our DNA results is called phasing. For those of us fortunate enough to test a parent, phasing is relatively easy. The genealogy companies can tell which chunks of DNA came from the tested parent and, by elimination, which came from the untested one. Some companies even use this information to “side” our DNA matches automatically.
For the rest of us, it’s not so simple. We can phase individual segments that we share with our DNA relatives, but there are two problems.
- We won’t necessarily have segment matches at every position in our genomes. For example, here are my father’s DNA matches on chromosome 1 from one of the smaller databases. There are large regions that can’t be phased at all because no one matches there.
- We have 22 autosomal chromosomes. Even if we can fully phase each one, how can we determine which phased copy of each chromosome came from the same parent?
Phasing with SideView™
In April 2022, AncestryDNA unveiled a new technology called SideView™, which phases our genomes piece-by-piece, rather than all at once, as when a parent has tested. They can do this because their database is so large—more than 22 million people—that most of us will have enough matches to cover most of our genomes. By comparison, the next largest database is at 23andMe, with around 13 million tested.
AncestryDNA’s support article for SideView uses a cartoon image to show how it works. In the cartoon, the segments are only a few SNP bases long. In reality, of course, they will be hundreds or thousands of SNPs long. The principle is the same, though.
This solves Problem 1 above: with a large enough database, most of the genome can be phased. But we still need a way to associate phased chromosome 1 with phased chromosome 2 and so on. That’s where closer matches come in.
A first cousin will share segments on most of the chromosomes. For example, in this screenshot from MyHeritage, we can see that these two cousins share at least one segment on 20 of the 22 autosomes. Only chromosomes 14 and 21 are left out. That means we can theoretically label 20 of the phased chromosomes by parent.
Other matches might share DNA on an unlabeled chromosome as well as on already-labeled ones, allowing SideView to phase all 22 chromosomes by parent. Of course, SideView is simultaneously doing the same thing for the other parent, giving an even more robust evaluation.
An algorithm like SideView still can’t tell which chromosome copies are paternal or maternal, but it should know which ones all came from the same parent. Initially, it assigns them to “Parent 1” and “Parent 2”. We can then specify which is maternal and which paternal based on our own family knowledge.
Not Just for Ethnicity Estimates Anymore
Thus far, AncestryDNA has used SideView to sort our ethnicity estimates by parent. This can be quite handy in determining whether your parents both had similar genetic backgrounds or not. In my case, it accurately determines that my mother is primarily French and my father is German and Irish.
Now, AncestryDNA is extending that technology to our matches. I was fortunate to chat recently with one of their scientists, who explained how it works. (Any errors in the description below are entirely mine. It was a lot of information to take in at once, and my note-taking skills are rusty.)
- SideView will assign a DNA match to a parent side when ≥90% of the shared segments are labeled as coming from that parent.
- A match will be assigned to both sides if any individual shared segments are labeled “both” (as with fully-identical regions or runs of homozygosity).
- A match will be assigned to both sides if two or more segments are labeled from one parent and two or more segments are labeled from the other parent. This will happen with your direct descendants and descendants of your full siblings.
- A match will not be assigned to a side at all if 70–90% of the shared segments are labeled from one parent and one shared segment is labeled both. In those cases, there isn’t enough evidence to assign the match to either one parent or both.
- A match will be unassigned if none of the previous criteria are met.
- Finally, new matches will remain unassigned until the next SideView update, which will happen periodically.
Here’s what it will look like in your match list.
The Million Dollar Question
What we really want to know is: How well does SideView work? For that, we need to look at some real examples. Bear in mind that SideView is still in the beta phase, meaning it’s still being tested and improved. My evaluation may well be outdated in a few months.
As an initial assessment, I looked at 25 DNA testers from a variety of backgrounds, with and without endogamy. None of these individuals had a parent who has tested, so the phasing was based entirely on SideView. For each person, I tallied the number of matches assigned to Parent 1, Parent 2, Both Sides, and Unassigned. When I knew the tree sufficiently well, I also tallied the number of assignments that were wrong.
Here’s what I found:
The main impression right off the bat is how few assignments were obviously wrong. Of the 13 individuals for whom I felt comfortable making that call, only three had a match assigned incorrectly; a fourth probably does. Put another way, for those 13 people, there were 4–5 incorrect calls out of 1,782 matches, or 0.3%. That’s remarkably accurate! AncestryDNA claims that SideView can offer “95 percent precision for 90 percent of customers”, and they are meeting that expectation.
Another observation is that for people without endogamy, roughly 80–90% of their matches were assigned to a parent. This held true even for people who don’t have as many matches as a typical European–American, like African–Americans and Brits. Honestly, I didn’t expect this. Fewer matches means less data with which to perform phasing, so I’d expect those people to have more unassigned matches. SideView is doing a great job here as well.
Finally, there’s endogamy, the practice of marrying within the same community over many generations. People from endogamous populations genuinely are related to their DNA matches in multiple ways. We wouldn’t expect SideView to work very well for those people, and it doesn’t. SideView was able to make a parental call less than 50% of the time. The majority of matches were either unassigned or linked to both sides.
Unassigned matches aren’t bad, they’re just not particularly informative on the surface. And with endogamy, matches may well be related through both parents. This isn’t necessarily a flaw with SideView, just an unfortunate reality of endogamy.
Perhaps, in the future, SideView will include segment size in its calculations. For example, my mother has a paternal first cousin who is unassigned. He probably falls into the category described above in which most of his shared segments are labeled paternal while one is labeled maternal. It is entirely possible that he shares a segment through my grandmother. That segment is probably from a much more distant connection, though, and almost certainly small.
Interestingly, when only one parent was from an endogamous population, SideView performed quite well.
Are You Thinking What I’m Thinking?
When our parents make the chromosomes that they will pass on to us via eggs and sperm, their cells literally mix and match from the chromosomes they themselves inherited from their parents, our grandparents. Where our chromosome copy “swaps” from grandma to grandpa is called a crossover point.
In theory, SideView should be able to detect crossover points, because the phased matches on a given side will never span one. Our matches should look like this, but on a much larger scale. The vertical lines represent crossover points, in this case, two on the paternal side and one on the maternal.
Of course, this is probably a long way off. SideView is new technology, and the database may not be large enough yet to accurately call crossover points. But Boy Howdy, wouldn’t that be cool?!?