Does our DNA really have as much information as an encyclopedia set?

March 21, 2019

A curious adult asks:

"It is said that our DNA contains as much information as a complete set of encyclopedias or more. Is this also true of plants? Like a begonia?"

Yes, that’s true! I’ll spare you the math (it’s at the bottom for those who want it) but I reckon your DNA would fill up around 100 encyclopedia volumes. 

As for begonias, different types have different amounts of DNA. I actually found a study called ‘Genome size variation in Begonia’ published by Belgian agricultural scientists. They had the same question as you, and looked at 60 different types of begonias to get their answer! 

It turns out that begonias contain anywhere from 7 to 43 encyclopedia volumes of DNA. 

Here’s a visualization of the DNA in a few other assorted plants and animals:

A graphical representation of the amount of information in a genome as encyclopedia volumes.
Image by G. Riesen.

The long and short of it

Why does wheat have almost 40 times as much DNA as rice? And a mosquito 3 times as much as a ladybug? Or a banana 3 times as much as a fruit fly?

This is a real mystery, and an old one. It baffled scientists even before we had discovered DNA’s famous double-helix shape. 

Scientists at the time were pretty sure that DNA told organisms how to develop, but they weren’t sure why it came in such different sizes, even for creatures that seemed really similar. 

Does wheat really need 37 times more information to make itself than rice does?  

Junk science

Let’s back up a little -- what information does DNA actually hold? 

The encyclopedia analogy is a nice visual, but DNA isn’t really read cover-to-cover like a book. It’s more like a choose-your-own-adventure recipe collection. It’s got a bunch of different instructions in it for making molecules a cell might need -- these recipes are what we call genes

Genes are very important ….but only 2% of your DNA is actually made up of them! That means that 98% of it is something else.

This non-recipe stuff is sometimes called ‘junk’ DNA. Wheat and rice actually have a similar number of genes, but wheat has a lot more of this other kind of DNA.

Is ‘junk DNA’ really junk?

Aside from being a little rude, the idea of ‘junk DNA’ raises some questions. Why is there so much of it? Is it really just ‘junk’? 

Definitely not! There is lots of evidence that ‘junk’ DNA is doing something. For example, some of it helps your cells decide how to read the DNA.

Every cell in your body has a copy of the master DNA plan, but they read different parts of it depending on the type of cell they are. For example, your skin cells skip over the genes that help build bones. We say that those genes aren’t expressed in your skin… and that’s a good thing! You wouldn’t want your skin turning into bone, would you?

Some ‘junk’ DNA provides instructions for reading the rest of the genome

Some of our ‘junk’ DNA works like the instructions telling you which page to read next in the choose-your-own-adventure book. It tells cells which recipes to use and how often. It can also tell them to change the recipes they use in different situations. This allows your body to change itself as you grow, find new things in your environment or change your diet. This process is called the regulation of gene expression.

Because of its role in regulation and other evidence, many scientists are pushing back against the term ‘junk DNA’. They’d rather call it ‘non-coding’ DNA. We know that it doesn’t have the code for making a molecule, but that doesn’t necessarily make it junk!

The dark side of DNA

So most of our DNA isn’t made up of genes, and different organisms have different amounts of this non-coding stuff (‘junk’ DNA). The question of how much ‘information’ is in our DNA depends on what that DNA is doing. 

It's possible that at least some of it is just sitting around! Would you count that as a meaningful part of your DNA encyclopedia set? Or is some of it truly junk?

Or what if there are other codes hidden in these dark parts of our DNA, holding secrets about ourselves that we haven’t discovered yet? 

All we can say for now is how much DNA we have in total. I suspect that the question of how much that DNA actually has to say will be a mystery for a long time.

 

Appendix: The math

The fundamental unit of information computers use is called a ‘bit’ – each one is like a little switch that can be on or off, representing a zero or a one. By combining bits, computers can store information. Two bits can have four possible values – 00, 01, 10 and 11. Since there are four possible base pairs of DNA, each one in a sequence can be recorded as two bits of data. 

That means that 4 million DNA base pairs is 8 million bits of information. 8 bits makes a byte, so 8 million bits is 1 million bytes. A million bytes is called a megabyte! Our total conversion rate is then 4 million base pairs per megabyte.

Humans have ~3.5 billion base pairs. That translates into 875 megabytes of data, easily fitting onto most thumb drives these days.

To convert from megabytes to encyclopedias, we have to decide how much data an encyclopedia contains. This is a little fuzzier, but I found a Wikipedia page that suggested using 6 bytes/word as an average. Since there are ~1.375 million words per volume of Encyclopedia Britannica, that gives 8.25 megabytes per volume. Continuing with our human genome example, 875 megabytes then fits into 106.06 volumes! I’ve done similar math for the other organisms, rounding to the nearest encyclopedia for visualization.

Author: Guillaume Riesen

When this answer was published in 2019, Guillaume was a Ph.D. candidate in the Department of Neuroscience, studying binocular visual perception in humans in Justin Gardner’s laboratory. He wrote this answer while participating in the Stanford at The Tech program.

Ask a Geneticist