If I submitted two identical spit samples to an ancestry test, would the results be the same?

August 15, 2023

A curious adult from Minnesota asks:

"If I submit two separate Ancestry DNA kits of my spit samples taken on the same day, will the resulting two raw DNA text files be identical?"

Nope! Even if you submitted two identical spit samples, at exactly the same time, you’d get slightly different results.

The differences would be minor. It would still be very easy to tell that the samples came from the same person (or potentially identical twins!). And the ancestry predictions would probably be more or less the same.

So how can this happen?

It essentially boils down to testing errors. Most companies advertise that they have an accuracy rate of ~99%. Or in other words, they’ll get it wrong around 1% of the time. (That’s just for the DNA letters themselves, not necessarily the ancestry/health predictions.)

1% doesn’t sound like much, but most current ancestry tests look at half a million genetic locations. At that scale, a 1% error rate means several thousand mistakes. This will cause your two identical samples to be slightly different.

These differences would definitely show up in the raw DNA text files. And they might also affect the ancestry or health predictions.

Sample collection tube with label “SPIT TO HERE”.
Two identical spit samples would give slightly different results due to testing error (Image via Shutterstock)

Errors in testing the DNA

So exactly how accurate are ancestry tests these days?

AncestryDNA currently looks at over 700,000 locations in a person’s DNA.1 And they report an accuracy rate of >99% for each location tested.2

So if we combine those numbers together, AncestryDNA would make a correct call at about 693,000 locations. Or to flip it around, it might make a mistake at up to 7,000 locations.

Similarly, 23andMe looks at 690,000 genetic locations and advertises >99% accuracy.3,4 That means the raw data from 23andMe might have as many as 6,900 errors.

Some of those errors will be a wrong DNA letter. Maybe the real answer is “T”, but the raw data says “A”. In other cases, you might get a non-answer like “not determined”. That just means that the quality at that position was so low, it was impossible to tell what the answer was.

So how many differences would there be in the two identical spit samples?

Imagine your first sample has around 7000 errors. And now imagine that the second sample has around 7000 errors too. While it’s possible that in some cases they made the same error, most of the mistakes would be different.

So you could be looking at up to 14,000 differences in the two samples!

While that sounds like a lot, the two samples would be mostly the same. They’d be identical at 686,000 locations. Or probably even more similar than that, given both companies say that they’re at least 99% accurate.

Cartoon girl standing in front of a mirror, where her reflection has a few differences.
Even a low error rate can lead to thousands of differences between identical samples. Small differences can add up, potentially affecting health or ancestry predictions. (Image modified from Shutterstock)

Use raw data with caution

It’s a good idea to be careful accessing and using the raw data from any ancestry company. It hasn’t been validated for accuracy and can be misleading.

The error rate makes it especially risky to make any health decisions based on the raw data. There’s increasing concern about the potential harms in people getting incorrect information from their raw data, and the importance of validating any findings with a clinical-grade test.5

Clinical-grade genetic tests have a lot of quality controls in place to ensure patients get accurate information. So if you see something in your raw data that concerns you, it’s a good idea to talk to your doctor (or a genetic counselor) to see whether it’s something you should be worried about. In some cases, they might recommend an additional round of testing to confirm the result.

The raw data can also have some things that are easy to misinterpret. For example, 23andMe’s raw data will always include some results for the Y chromosome…. even if the tested person doesn’t have a Y chromosome at all! That’s because 23andMe uses the same test for everyone, which includes the Y chromosome. Since the Y chromosome is very similar to the X chromosome, there will usually be a few “spurious” (false) results.

This kind of incorrect information won’t make it into the final reports. The analysis algorithms are able to tell that those are false positives and ignore them. But someone looking at the raw data might wonder if they have an extra chromosome.

Some other minor differences could end up in the final reports for your two identical samples. The differences would be minor, since the vast majority of the raw data would be the same. But this is exactly why identical twins might receive slightly different ancestry results!

Author: Abbey Thompson, PhD

Abbey received her PhD in Genetics in 2018, and now works as the Director of Educational Outreach for the Stanford Genetics Department. As part of this, she acts as “The Tech Geneticist,” where she answers questions and edits all new Ask-a-Geneticist articles. She also leads the Stanford at The Tech program, which brings Stanford scientists to The Tech Interactive to run genetics activities with visitors in the BioTinkering Lab.

Ask a Geneticist Home