Why does a codon have three letters?
An undergraduate student from India asks:
“Why do we have three-base-codons when it is possible to translate four- or five-base-codons?”
DNA only has 4 different letters: A, T, C, and G. But a protein can have 20 different amino acids. This is where codons come in — they help translate the DNA code into a protein code.
The triplet codon is a “sweet spot” that nature landed upon. It’s just enough to cover all the possible amino acids. Codons of 4 or 5 letters would give way more combinations than we need.
If codons were longer, genes would be too big. It’s just not worth it.
Let's dive into what codons are, and why just a little bit of redundancy is good.
Cracking the Codon
DNA contains all of the instructions for life written in the pattern of A, T, C and G nucleotides. But on its own DNA can’t do anything. Proteins do most of the important things in the cell.
The parts of DNA that contain recipes for proteins are called genes. But a cell can’t go straight from a gene to a protein. First, the information in the gene needs to be copied as RNA. Much like DNA, RNA is a type of nucleic acid. But it’s a little bit more fragile than DNA, and uses a U instead of a T. Copying DNA into RNA is called transcription.
Next the cell needs to read the RNA code. This is called translation, since the cell needs to translate information from a nucleic acid language into a protein language. This poses a problem since there are 20 different amino acids but only 4 nucleotides. If one nucleotide coded for one amino acid, we would only be able to make a very limited number of proteins.
If you read RNA in nucleotide pairs with every two nucleotides coding for an amino acid, then the total number of combinations would add up to 16 unique pairs for 16 different amino acids. That doesn’t quite bring us to 20 amino acids, so it’s clear that a nucleotide doublet won’t work.
This is where codons come in. If we read the RNA in triplets, every unique set of three nucleotides can code for a different amino acid. There are 64 different combinations of triplet nucleotides, which gives more than enough to have one triplet per amino acid. Plus some extra combinations that can code for the beginning and end of a protein.
64 combinations is actually more than is strictly necessary. If you take a look at the chart above, you’ll notice that there’s extra combinations leftover for some amino acids to have more than one possible codon.
This redundancy is actually advantageous. It helps minimize the damage from mutations!
For example, alanine has four possible codons: GCU, GCC, GCA, and GCG. If a mutation happens in the last position of this codon, it won’t matter. The codon would still lead to an alanine amino acid. This kind of mutation is called a silent mutation, since it doesn’t change the protein.
The third position that is different between these two codons is called the wobble position. Looking at the chart closer you’ll notice that the identity of the nucleotide in this third position doesn’t matter as much for a lot of amino acids. That means the odds of random mutations in the DNA sequence changing the protein sequence is much lower.
Three is key
If some redundancy in the code is good, why is more redundancy not better? In other words, why aren’t codons a quartet of four nucleotides, or a quintet of five nucleotides?
Let's consider the number of unique combinations that we get from all the possible lengths of codon.
If you look in the table below, you can see that a quartet code would give 4 times the number of combinations that a triplet code would give us. And a quintet code would give 4 times the number of combinations that a quartet code gives.
42 = 16
43 = 64
44 = 256
45 = 1024
If the sequence of a protein was encoded in quartets instead of triplets, genes would need to be 33% longer. And if we used quintets, genes would be 66% longer!
Genes make up only a small percentage of the human genome, so using quartet or quintet codons wouldn’t make our genomes much larger. But, in bacteria and viruses 47-97% of the genome codes for proteins, so increasing the length of a gene would increase the size of the genome by quite a lot.1