“How can identical twins have different ancestry test results?”
My twins each did a 23andMe DNA test and the results verified that they are identical; however, their ancestry composition is not!
I can see why you’re confused. The same DNA should give the same ancestry results. And yet they haven’t.
This does not mean these companies are doing shoddy work. They aren’t—they are doing outstanding, cutting edge science that brings DNA testing to many, many people.
It is just that the analysis is complicated enough that it is incredibly difficult to get an exact result. There is some wiggle room.
So don’t take percentages as exact numbers. As we talk about in a previous answer, they can vary by 10 or even 20% pretty easily and sometimes even a bit more. So two people might have the same amount of Eastern European even if a DNA test says one has 20% and the other person has 40%.
Despite slightly different ancestry test results, these identical twin boys have the exact same ancestry. (Image: Wikimedia Commons)
A few, often random “unreadable” DNA spots make ancestry predictions vary slightly
But even when the percentages are small, this kind of thing is a bit more unsettling when you get different results from the exact same DNA. These differences mostly come from how the computer algorithm splits up the DNA into thousands of windows, analyzing one window at a time. And how blank spots (or “missing calls”) in the data affect how the DNA is interpreted and/or split up into those pieces.
These “missing calls” are an inevitable consequence of any test like the ones these companies run.
23andMe reports that on average they get good reads on more than 98% of the markers they look at which is really very good. But it still means that each test has thousands and thousands of spots that could not be read.
And importantly, the same markers do not come up as “missing calls” in different tests. Each time someone’s DNA is read, you can end up with a different 2% being uninterpretable.
Small differences in data quality can arise in different DNA samples from the same person (or twins). This can lead to slightly different ancestry predictions from the same DNA.
These spots on the DNA that can’t be read in a particular assay can tip the ancestry scales one way or another. What might look English with a smattering of these missed reads, looks German with a different set of no calls.
So in the first case, a piece of one identical twins’ DNA might look English while the same or an overlapping piece of DNA will look German with the second identical twin. It wouldn’t take too many differences like this to shift enough DNA to make the two not look identical from an ancestry point of view.
Although ancestry predictions are affected by these unreadable spots, the test that figures out if people are related is not affected. Even with the “missing calls” the companies can see that the twins have the same DNA.
DNA landmarks help identify ancestry
These companies use what is called reference DNA to figure out where your DNA came from. This reference DNA is the DNA of people whose families have stayed in a part of the world for a long time. Or who have very detailed family trees.
The company compares your DNA to the DNA landmarks they found in these families. The parts that match landmarks German people have is considered “German”, the parts that match landmarks British people have is considered “English” and so on.
Sounds easy enough but most people are not like the reference group. They have lots of different ancestral DNA scattered in chunks throughout their DNA. This is where things can get tricky.
To get around this mixed DNA, the computer program analyzes small sections of the DNA at a time. These “windows” have around 100 or so DNA markers in them which translates to thousands of such windows.
Remember those “missing calls” we talked about earlier? They can affect how a window is interpreted. If by chance the program interprets it as German for twin 1 and English for twin 2, then the two will appear to be a little bit different.
English or German?
Let’s imagine that we are trying to distinguish whether a phrase is German or English but this is all we can read:
The only full word we have is die. Unfortunately die is a word in both English and German. (In English it means to stop living and in German it is a definite article.)
A computer could probably tell better than the jury on the Simpsons that this is the English die. But without “Bart” in the middle, it would be impossible to know. (Image: Imgur)
This same sort of thing can happen with DNA. If the no calls happen in the right spots it can be hard to tell the ancestry of that DNA.
One way the program can deal with this is by looking at the windows around it. For example, if a bunch of windows to the right and left are German, then it will call our uninterpretable window German too. And if there are a lot of English windows around, it will call it English.
Imagine this result:
In this case the program would conclude that the unknown window is German because of what is around it. This is called smoothing.
Now imagine that it looks like this:
With German windows on one side and English on the other, it has to make its call without any helpful context. Let’s say this is Twin 1’s result and that the program says it is English. Here now is Twin 1:
Now let’s say Twin 2’s read gives this:
Even though this has the same number of “missing calls”, 6, we can see it is probably German because Katze is the German word for cat. So Twin 2 would end up being this:
Now the twins have different ancestries with the same DNA. Twin 1 is 4/7 English and 3/7 German over this stretch while Twin 2 is 3/7 English and 4/7 German.
And the differences can become larger if nearby windows are also hard to identify because of “missing calls”. Let’s say that the third window can’t be read because of these no calls:
Because with Twin 2 this window is in the middle of a German stretch, it will be called German. The same may not be true for Twin 1. It could get called English or German.
If the window is called English, we now have an even bigger difference between the twins. And this kind of thing can go on, here and there, across the thousands of windows in our DNA.
One Piece of the Puzzle
What I just described is certainly part of the reason why the same DNA can end up with different results. But it is by no means the only way it can happen.
Analyzing DNA for ancestry is very complicated for lots of reasons (check out this outstanding blog from 23andMe to get a feel for what they are up against). It is technically very challenging.
Still, if there were a way to reduce the number of “missing calls” it might at least make the results more consistent with the same DNA. The problem is that to reduce the no calls, you’d almost certainly need to spend more money for your test. Potentially a lot more money.
So you have the push and pull of affordability versus perfect consistency. Well, not perfect as that would be impossible, but near perfect.
All of this shows why ancestry DNA results are really just one part of your genealogy puzzle. The part of the test that identifies relatives who share DNA can provide more of the pieces to fill in your puzzle. And of course good old fashioned family tree building using a variety of public documents is an important part too.
All three together can give you a more complete picture of your ancestry compared to just one or two of the others.
For those interested, here are the actual results from these twins: