The most frustrating unknown father case I’ve ever worked is “Midge” (not her real name). Midge has great matches! I even know the town her biological father was from. That’s not the problem.
The problem is that half the town are Smiths, they’re all related, and “Smithville’s” local pastime appears to be extramarital sex. Of Midge’s top three paternal matches at AncestryDNA, two are MPEs. The jury’s still out on the fourth. Her three top matches at 23andMe match one another as first cousins, but none of the trees overlap. For all I know, none of their trees are biologically incorrect.
 There are so many MPEs in Midge’s match lists, I created a custom label to remember who they are.
There are so many MPEs in Midge’s match lists, I created a custom label to remember who they are.
Oh, and Midge’s father was obviously born outside of wedlock, too!
A silver lining of the endogamy is that Midge usually has secondary matches to her top matches that represent both sides of the matches’ trees, so I can validate some of the pedigrees. These are people who share a lot of DNA with the primary match but not much with Midge. This is how I know so many of their trees are not biologically correct. It’s tedious work, though.
Manual clustering has been a huge help, so I was eager to try out Ancestry’s new custom clusters—available through Ancestry’s Pro Tools package for an added fee—to nudge things along.
Note to self: Custom clusters are not meant for endogamy. I know this. A girl can dream.
Manual Clustering
My manual cluster includes descendants of “Mack” & “Sarah” Smith, a Smithville couple who had 10 children between 1885 and 1915. One of those children was probably Midge’s grandmother.1

The blue-shaded cM column shows shared DNA with Midge. Matches in the upper left quadrant are all descendants of Teddy, a son of Mack & Sarah. The group in the bottom right quadrant descend from three of their other children. Shared centimorgan values between matches are above the diagonal and family relationships are below the diagonal.
CS and ML (circled) both have misattributed fathers. I’m confident that CS is Teddy’s daughter (but not his wife’s) based on circumstantial evidence and hypothesis testing in BanyanDNA. I’m still working on ML’s father. JR’s mother is a known descendant of Mack & Sarah, but her father is misidentified, and she seems to have a second, undocumented 2C relationship to GA and a different undocumented relationship to CS.
Confused yet? This is only the stuff I’ve managed to figure out. There’s so much more I’m still struggling with.
Custom Clustering at AncestryDNA
To create a custom cluster at Ancestry, I used PS as the match of interest and limited the cluster to matches who share 100–400 cM with Midge. Because LA, DM, MC, and JH descend from Mack & Sarah but share less than my 100-cM threshold, I added them as “sidekicks” to force their inclusion in the cluster.

The custom cluster actually did a pretty darned good job, despite the endogamy! The only confirmed match it failed to find was PA. The cluster included three children of other matches (black strike), whom I’d just as soon ignore but who aren’t really a problem. And it found two more people (arrows) I hadn’t included in my manual cluster. I’ve already figured out where HF belongs in this family (via an undocumented grandparent). I’m not convinced that LP belongs to this cluster at all.
Gift Horse
Not to sound ungrateful—this is a welcome addition to the AncestryDNA toolbox—but it could be so much more!
 While the example above was fairly informative and accurate, the custom cluster using “Trina,” Midge’s top grandpaternal match and the same centimorgan range as above is less helpful.
While the example above was fairly informative and accurate, the custom cluster using “Trina,” Midge’s top grandpaternal match and the same centimorgan range as above is less helpful.
- First, with 41 matches, it’s too large. Ideally, a range of 100–400 cM would pull in 2nd cousins from the same great grandparents. (See Angie Bush’s helpful blog for more guidelines). While it’s certainly possible that Midge has 40+ cousins through the same great grandparents as Trina, most won’t have done a DNA test. Many member of this cluster are endogamous red herrings.
- Second, some of these matches are parent–child while others share as little as little as 30 cM with one another. I can’t tell the difference at a glance. Applying a color gradient, as I did in my manual cluster, would be extremely helpful.
- Third, with endogamy, distant cousins can share more than 100 cM because they’re related in multiple ways; the DNA adds up. Such matches tend to have more but shorter segments than closer relatives. If we could exclude matches from a cluster based on the longest segment and/or average segment size, we could reduce a lot of the endogamous noise.
- Being able to exclude matches based on how they match one another (not just Midge) would also be extremely valuable.
- A minor grievance: the match of interest (Trina in this case) is buried somewhere in the middle of the list. I’d prefer them to be at the top so they’re easier to find.
- And finally, I’d love to be able to name the clusters after the MRCA rather than having it named for the match of interest I used to generate it.
All thing considered, this is a promising addition to the Ancestry’s Pro Tools package that I’ll be using regularly from here on out.
–––––
1 I know this is Midge’s grandmaternal side because ML, who also has an unknown father, shares two substantial segments on the X chromosome with Midge at 23andMe. Women have two copies of the X chromosome, one from each parent, but men only inherit a single copy from their mothers. For Midge and ML to share X-DNA, they must be related through both of their grandmothers.
 
	
	
	