Struggles with Smithville

The most frustrating unknown father case I’ve ever worked is “Midge” (not her real name).  Midge has great matches!  I even know the town her biological father was from.  That’s not the problem.

The problem is that half the town are Smiths, they’re all related, and “Smithville’s” local pastime appears to be extramarital sex.  Of Midge’s top three paternal matches at AncestryDNA, two have misattributed parentage events (MPEs).  The jury’s still out on the fourth.  Her three top matches at 23andMe match one another as first cousins, but none of the trees overlap.  For all I know, none of their trees are biologically correct.

There are so many MPEs in Midge’s match lists, I created a custom label to remember who they are.

Oh, and Midge’s father was obviously born outside of wedlock, too!

A silver lining of the endogamy is that Midge usually has secondary matches to her top matches that represent both sides of the matches’ trees, so I can validate some of the pedigrees.  These are people who share a lot of DNA with the primary match but not much with Midge.  This is how I know so many of their trees are not biologically correct.  It’s tedious work, though.

Manual clustering has been a huge help, so I was eager to try out Ancestry’s new custom clusters—available through Ancestry’s Pro Tools package for an added fee—to nudge things along.

Note to self:  Custom clusters are not meant for endogamy.  I know this.  A girl can dream.

Manual Clustering

My manual cluster includes descendants of “Mack” & “Sarah” Smith, a Smithville couple who had 10 children between 1885 and 1915.  One of those children was probably Midge’s grandmother.1

Cluster matrix of some of Midge's paternal matches.

 

The blue-shaded cM column shows shared DNA with Midge.  Matches in the upper left quadrant are all descendants of Teddy, a son of Mack & Sarah.  The group in the bottom right quadrant descend from three of their other children.  Shared centimorgan values between matches are above the diagonal and family relationships are below the diagonal.  (C = cousin, g = great, h = half, nib = nibling (aunt/uncle–niece/nephew), R = removed, sib = sibling, and x2 indicates a double relationship)

CS and ML (circled) both have misattributed fathers.  I’m confident that CS is Teddy’s daughter (but not his wife’s) based on circumstantial evidence and hypothesis testing in BanyanDNA.  I’m still working on ML’s father.  JR’s mother is a known descendant of Mack & Sarah, but her father is misidentified, and she seems to have a second, undocumented 2C relationship to GA and a different undocumented relationship to CS.

Confused yet?  This is only the stuff I’ve managed to figure out.  There’s so much more I’m still struggling with.

Custom Clustering at AncestryDNA

To create a custom cluster at Ancestry, I used PS as the match of interest and limited the cluster to matches who share 100–400 cM with Midge.  Because LA, DM, MC, and JH descend from Mack & Sarah but share less than my 100-cM threshold, I added them as “sidekicks” to force their inclusion in the cluster.

Custom cluster generated by AncestryDNA.

The custom cluster actually did a pretty darned good job, despite the endogamy!  The only confirmed match it failed to find was PA.  The cluster included three children of other matches (black strike), whom I’d just as soon ignore but who aren’t really a problem.  And it found two more people (arrows) I hadn’t included in my manual cluster.  I’ve already figured out where HF belongs in this family (via an undocumented grandparent).  I’m not convinced that LP belongs to this cluster at all.

Gift Horse

Not to sound ungrateful—this is a welcome addition to the AncestryDNA toolbox—but it could be so much more!

While the example above was fairly informative and accurate, the custom cluster using “Trina,” Midge’s top grandpaternal match and the same centimorgan range as above is less helpful.

  • First, with 41 matches, it’s too large.  Ideally, a range of 100–400 cM would pull in 2nd cousins from the same great grandparents.  (See Angie Bush’s helpful blog for more guidelines).  While it’s certainly possible that Midge has 40+ cousins through the same great grandparents as Trina, most won’t have done a DNA test.  Many member of this cluster are endogamous red herrings.
  • Second, some of these matches are parent–child while others share as little as little as 30 cM with one another.  I can’t tell the difference at a glance.  Applying a color gradient, as I did in my manual cluster, would be extremely helpful.
  • Third, with endogamy, distant cousins can share more than 100 cM because they’re related in multiple ways; the DNA adds up.  Such matches tend to have more but shorter segments than closer relatives.  If we could exclude matches from a cluster based on the longest segment and/or average segment size, we could reduce a lot of the endogamous noise.
  • Being able to exclude matches based on how they match one another (not just Midge) would also be extremely valuable.
  • A minor grievance:  the match of interest (Trina in this case) is buried somewhere in the middle of the list.  I’d prefer them to be at the top so they’re easier to find.
  • And finally, I’d love to be able to name the clusters after the MRCA rather than having it named for the match of interest I used to generate it.

All thing considered, this is a promising addition to the Ancestry’s Pro Tools package that I’ll be using regularly from here on out.

–––––

1 I know this is Midge’s grandmaternal side because ML, who also has an unknown father, shares two substantial segments on the X chromosome with Midge at 23andMe.  Women have two copies of the X chromosome, one from each parent, but men only inherit a single copy from their mothers.  For Midge and ML to share X-DNA, they must be related through both of their grandmothers.

11 thoughts on “Struggles with Smithville”

  1. This is fascinating and clearly a cluster that highlights many interesting ideas. My first question reading this was how many of these people have DNA at MyHeritage, Gedmatch and FamilyTreeDNA?

    1. There are additional matches at MyHeritage and FamilyTreeDNA (some of whom are at Ancestry as well). Midge is not at GEDmatch.

  2. Please define MPE. PLease define nib, hnib, hgnib, etc. No need to define sib, but who is sib to whom. This is SO close to being a great article.

    1. Thank you for the suggestions. I’ve added that information to the post.

      The relationships are between the “row” person and the “column” person, same as the centimorgan values. For example, PS and CS share 1,472 cM (in the cell above the diagonal) and are half-niblings (in the cell below the diagonal).

  3. > none of their trees are biologically incorrect.

    Did you mean ‘correct’?

    Food for thought; it appears that “Midge” is looking for more than what most of the family members have: certain knowledge of who their biological father is.

    You said half the town is Smiths; is that an exaggeration, meaning it feels like it? With 41 matches, 100 would be a tiny town.

    It sounds like you should consider the possibility that her father isn’t even in town and possibly never was.

    Sounds like you need a tool that runs all the probabilities for each person to identify the most likely relationship(s) for each pair. Isn’t there such a tool already?

    Also sounds like you should look around town for “blindfold orgy party” flyers. 🙂

    1. Edited, thanks!

      I’m not exaggerating about the number of Smiths in the town, at least not much. I don’t think I’ve found a paternal match yet who doesn’t have at least one Smith great grandparent; some have six or more. All of the trees trace back to Smithville. If Midge’s father wasn’t born there, his parents were, so Smithville is still the path to find him.

      Forty-one matches is only the Trina cluster. Midge has more than 600 paternal matches sharing at least 50 cM.

      There isn’t a tool that can analyze all of the probabilities of every match and also suggest the most likely placement of each, at least not in a single analysis. I’ve been using BanyanDNA iteratively to evaluate different portions of the tree, but it’s slow going. Mostly, I’ve found that every branch has a few people who appear to be misplaced.

  4. > so Smithville is still the path to find him.

    You’re saying that you’re fairly certain that Midge’s mother and father are both from Smithville, based on the DNA?

    I’m thinking if Midge’s father and grandfather are both NPE, you could literally have to test every male in town who has the right features (even if they’re not claimed; that’s likely not a Smith-only party), and starting with the right age range, but excluding those you already know can’t be. And if they’ve passed already or if they decline, at least one of their children (and those might also be NPE, so still looking for a sib match, but even a sib match might not be conclusive; you need to test the genetic father for that).

    Can you ask for a BanyanDNA add-on to study a tree from all point of view, not just one, to do at least semi-automatedly what you’ve been doing manually?

    Does knowing that one person is a closer relative change the odds for another person?

    Spaghetti “tree”? 🙂

    1. Midge’s mother raised her. We know who is; she’s not from Smithville. I’m fairly certain that both of Midge’s paternal grandparents are from Smithville, though. I don’t necessarily need to identify her grandfather, only her grandmother. If Midge’s father was born out of wedlock as I suspect, he was probably raised by his birth mother or by his grandparents. He was born before anonymous adoption was widespread.

      As for BanyanDNA, I’m one of the founders and I can assure you that we’ve been discussing that for years. It turns out the coding isn’t that simple.

  5. Fascinating! I didn’t know that 23andMe has maybe reinstated chromosome information (re the X match). I may have misread that. It’s nice to read an article about clustering. I can see how the fact so many matches share more than 20cM with each other is pretty powerful evidence they have common common ancestors.

    1. 23andMe hasn’t restored the chromosome browser. I’ve been working this case since before the CB went away! I’ve heard rumors that it might be coming back, though. Fingers crossed!

  6. Thank you. This is greatly encouraging.
    I have many matches without trees. Especially the “given a kit for Christmas” crowd.
    Custom matching has really helped place them on a branch and subbranch.
    Am now using it to map out some 8th cousins who are close to each other, to build out their subtrees and validate them. I have some matches with them of 25-40cM at Ancestry although most are smaller and not helpful. Custom clusters involves bigger numbers on their side so helps with relationships. Hopefully the subtree work can help.
    Ultimately the MRCA for all of these is around the limit of the furthest tree – or beyond.
    So a grand alliance of all branches is still a long way off, almost certainly requiring Y-DNA from the few who still retain the ancestral surname.
    My matches’ ancestors also have sisters marrying brothers and similar – not pedigree collapse but what I think of as a “braided tree”. I have a lot of experience with these but nowhere near mastery. They should be easier than your SMITHs.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.