How does DNA store data?

March 12, 2024

A 9 year old visitor at The Tech Interactive asks:

"How is information stored in DNA?"

DNA makes us who we are. It exists in every living organism and works like a blueprint that tells cells whether to become a human or a chimpanzee. But how is this information actually stored in DNA?

Tiny molecule, Big Data

To understand how DNA stores data, we need to first know what DNA is.

DNA is a molecule that exists in every cell in our body. It is a long string made up of small subunits called bases. There are 4 different types of DNA bases: adenine (A), cytosine (C), guanine (G), and thymine (T).

DNA structure
DNA is made up of 4 bases—A, C, G, and T (Image: Wikimedia Commons)

Each of these bases is only 0.34 nanometers long, or about one hundred thousandth of the width of our hair1!

Yet, each of our cells carries 6 billion of these super tiny bases stringed up into DNA molecules. If you stretch out the DNA from a single cell end-to-end, it would be roughly 6 feet long.

These 6 billion DNA bases contain information that dictates how a cell will grow, move, and interact with other cells. But to decode this information from DNA, we need to think of DNA as a language and the DNA bases as letters.

An alphabet with 4 letters

The English language has 26 letters in its alphabet. DNA, on the other hand, only has 4 letters: A, C, G, and T. These letters are arranged into 3-letter words, which can form paragraphs called genes.

Each of these genes work like a cookbook recipe. But instead of making delicious baked goods, genes contain instructions to make proteins. Those proteins are the building blocks of our cells and carry out different tasks in our body. 

Some proteins help cells take up nutrients and turn them into energy. Others are responsible for launching an attack when the cell encounters a virus.

Imagine a cookbook written in DNA letters. Each one of us would be carrying a cookbook with 6 billion letters stored in every single one of our cells. But only around 1% of that book contains recipes for proteins.2 What about the rest of the 99%?

A sequence of DNA with the 4 ‘letters’ (Image: Wikimedia Commons)

A choose-your-own-adventure cookbook

While scientists still don’t fully understand the function of the 99% DNA not containing genes, here is what we do know.

Every cell in your body carries the same DNA cookbook, but different types of cells play entirely different roles. For example, your skin cell looks and behaves very differently from your muscle cell or your bone cell.

So how can the exact same cookbook be used to build all the different cell types around our body? The answer lies in the non-gene parts.

Within this 99% of DNA, there are signals that read like “please follow this recipe if you are a skin cell” or “don’t follow this recipe if you are a muscle cell.”

These “Read me” signals guide each type of cell to follow a distinct set of recipes and therefore allow them to behave differently in our body. So it’s really like each cell can pick and choose their own adventure from the same DNA cookbook!

DNA as data storage?

The DNA within us encodes messages and instructions for our cells. So can we also use DNA to write our own messages and store other data? Yes, scientists are already doing that!

We live in an age of data explosion. From the YouTube browsing history on your phone to pictures of the universe taken by telescopes, there is an overwhelming amount of information that needs to be stored.

Traditionally, people might use pen and paper to record data. With the rise of the digital age, we now rely on flash drives and hard drives to store larger amounts of data. But none of these are perfect. Some can’t survive in harsh environmental conditions for very long. And some data storage forms quickly become obsolete.

DNA is a potential solution to this problem.

DNA is incredibly efficient at storing data. It is estimated that we can store 433 exabyte of data in just 1 gram of DNA — that’s almost a billion 512GB flash drives3! And DNA can be surprisingly long-lasting and resistant to environmental strains. Researchers have figured out ways to preserve DNA so that it could last for over 2 million years4.

With the newest technology, we are now able to customize DNA molecules and read the information from them. In fact, a team of scientists managed to convert all of Shakespeares’ sonnets into DNA bases and pack them into molecules. But more research is needed to reduce the cost and make the technology more accessible.

Maybe in a few years, instead of flash drives we’ll all be carrying ‘DNA drives’ around!

Scientists converted our alphabet and punctuations into DNA sequences (Adapted from Akram et al 2018)

DNA

character

DNA

character

DNA

character

DNA

character

DNA

character

AAA

0

AGG

A

CCA

K

CTG

U

GGA

.

AAC

1

AGT

B

CCC

L

CTT

V

GGC

!

AAG

2

ATA

C

CCG

M

GAA

W

GGG

(

AAT

3

ATC

D

CCT

N

GAC

X

GGT

)

ACA

4

ATG

E

CGA

O

GAG

Y

GTA

ACC

5

ATT

F

CGC

P

GAT

Z

GTC

ACG

6

CAA

G

CGG

Q

GCA

[]

GTG

ACT

7

CAC

H

CGT

R

GCC

:

GTT

AGA

8

CAG

I

CTA

S

GCG

,

TAA

?

AGC

9

CAT

J

CTC

T

GCT

-

TAC

;

 

Author: Lucy Zhang

When this answer was published in 2024, Lucy was a Ph.D. candidate in the Department of Genetics, studying the roles of immune cells during heart development and congenital heart disease. She wrote this answer while participating in the Stanford at The Tech program.

Ask a Geneticist