4.3: Gene Expression

Gene expression is the process that takes the information encoded in DNA and converts it into functional biomolecules, primarily proteins. The information in DNA, encoded in the sequence of nitrogenous bases along its length, is used as the instructions for protein production. While there are some important exceptions, nearly all of the coding DNA is used in this way, directing the cell to construct amino acid chains in particular sequences. The exceptions include the production of some biologically active RNA molecules, such as transfer-RNA molecules, which are not made into protein.


This Central Dogma of Molecular Biology describes this general pathway of information flow. A generally-true version which applies to eukaryotic organisms can be communicated quickly:

DNA → RNA  → protein

This sequence of events describes in basic terms the information flow that occurs as our genes, inherited in the DNA we get from our biological parents, are expressed in our bodies.


The two-step sequence here includes, first, the transcription of information in DNA to RNA. The DNA molecules contained in a eukaryotic cell nucleus contains portions of nucleic acid sequence that are coding sequences. These sequences, called ‘genes,’ are used as a template to produce RNA that leaves the nucleus.
The RNA carries the information from genes with it in the form of the sequence of its bases. After exiting the nucleus, messenger RNA (mRNA) interacts with cellular components called Ribosomes, where the information contained within its sequence is translated into a specific sequence of amino acids. The ribosome reads the mRNA and engages in the assembly of the polypeptide chain with the selection of amino acids guided by small transfer RNA molecules (tRNAs) that are able to translate the code. Each sequence of 3 nitrogenous bases on the mRNA are converted to a single, select amino acid based on the genetic code.


Transcription is the process whereby the information in DNA is copied into the RNA molecules that can take it out of the nucleus. The ‘language’ of the code is unchanged in this process, as each nucleotide in DNA is copied into a nucleotide of RNA.


Translation is the process involving a change of ‘language,’ from the nucleotide sequence in the mRNA to the amino acid sequence of the nascent protein.
Transcription occurs within the nucleus, where the coding DNA remains. The translation process is carried outside of the nucleus, in the of the cell, by ribosomes.


Ribosomes are large protein-nucleic acid complexes that contain ribosomal RNA (rRNA) and proteins. The proteins and rRNA are organized into two subunits, one large and another small. Ribosomes function by binding to mRNAs and holding them in a way that allows the amino acids encoded by the RNA to be joined sequentially to form a polypeptide. Transfer RNAs are the carriers of the appropriate amino acids to the ribosome.
The ribosome is an important example of a non-protein catalyst, as it catalyzes the reactions that form peptide bonds, linking amino acids together.

The Genetic Code

A code can be thought of as a system for storing or communicating information. A familiar example is the use of letters to represent the names of airports (e.g., PDX for Portland, Oregon and ORD for Chicago’s O’Hare). When a tag on your luggage shows PDX as the destination, it conveys information that your bag should be sent to Portland, Oregon. To function well, such a code must have unique identifiers for each airport and people who can decode the identifiers correctly. That is, PDX must stand only for Portland, Oregon and no other airport. Also, luggage handlers must be able to correctly recognize what PDX stands for, so that your luggage doesn’t land in Phoenix, instead.

How does this relate to genes and the proteins they encode?

Genes are first transcribed into mRNA, as we have already discussed. The sequence of an mRNA, copied from a gene, directly specifies the sequence of amino acids in the protein it encodes. Each amino acid in the protein is specified by a sequence of 3 bases called a codon in the mRNA (Figure 7.81). For example, the amino acid tryptophan is encoded by the sequence UGG on an mRNA. All of the twenty amino acids used to build proteins have, likewise, 3-base sequences that encode them.


Given that there are 4 bases in RNA, the number of different 3-base combinations that are possible is 43, or 64. There are, however, only 20 amino acids that are used in building proteins in cells. This discrepancy in the number of possible codons and the actual number of amino acids they specify is explained by the fact that the same amino acid may be specified by more than one codon. In fact, with the exception of the amino acids methionine and tryptophan, all the other amino acids are encoded by multiple codons. Codons for the same amino acid are often related, with the first two bases the same and the third being variable. An example would be the codons for alanine: GCU, GCA, GCC and GCG all stand for alanine. This sort of redundancy in the genetic code is termed degeneracy.

Additionally, several 3-letter codons are read as ‘start’ or ‘stop’ messages that spur the intiation or the end of the process of translation.

Translating the code

While the ribosomes are literally the factories that join amino acids together using the instructions in mRNAs, another class of RNA molecules, the transfer RNAs (tRNAs) are also needed for translation (Figure 7.83 and Interactive 7.1). Transfer RNAs are small RNA molecules, about 75-90 nucleotides long, that function to ‘interpret’ the instructions in the mRNA during protein synthesis. Transfer RNAs are extensively modified post-transcriptionally and contain a large number of unusual bases. The sequences of tRNAs have several self complementary regions, where the single-stranded tRNA folds on itself and base-pairs to form what is sometimes described as a clover leaf structure.

This structure is crucial to the function of the tRNA, providing both the sites for attachment of the appropriate amino acid and for recognition of codons in the mRNA. In terms of the bead analogy above, someone or something has to be able to bring a red bead in when the instructions indicate UGG, and a green bead when the instructions say UUU. This, then, is the function of the tRNAs. They must be able to bring the amino acid corresponding to the instructions to the ribosome.

A given transfer RNA is specific for a particular amino acid. Assemblies of charged tRNAs, loaded up with their respective amino acids, await use near the ribosome. The base-pairing of the anticodon on a charged tRNA with the codon on the mRNA is what brings the correct amino acids in to the ribosome to be added on to the growing protein chain (Figure 7.85).


Polypeptide processing

What happens to the newly synthesized polypeptide after it is released from the ribosome? Functional proteins are not simply strings of amino acids. The polypeptide must fold properly in order to perform its function in the cell. It may also undergo a variety of modifications such as the addition of phosphate groups or sugars, etc. Some proteins are produced as inactive precursors that must be cleaved by proteases to be functional.

Proper folding of a protein into its 3-dimensional conformation is necessary for it to function effectively. As described in an earlier chapter (HERE), the folding of a protein is largely influenced by hydrophobic interactions that result in folding of the protein in such a way as to position hydrophobic residues in the interior, or core, of the protein, away from the aqueous environment of the cell.

Proper folding may also involve the interaction of regions of the polypeptide that are distant from each other, so that portions of the N-terminal region of the polypeptide may be in close proximity to parts of the C-terminus of the final folded molecule.

As a polypeptide emerges from the ribosome, protein chaperones bind to and shield regions of polypeptides and keep them from improperly interacting with one another or with other proteins in the vicinity until they can fold into their correct final shape (Figure 7.98). In addition, other chaperones that are able sequester proteins in such a way as to permit unfolding and refolding of misfolded polypeptides. These proteins ensure that the vast majority of proteins in cells are folded into their correct, functional 3-dimensional shapes.

Sorting and Delivery

An additional challenge in eukaryotic cells is the presence of internal, membrane-bounded compartments. Each compartment contains different proteins with different functions. But the vast majority of proteins in eukaryotic cells are made by ribosomes in the cytoplasm of the cell.

Each of the thousands of proteins made in the cytoplasm must, therefore, be delivered to the appropriate cellular compartment in which it functions. Some proteins are delivered to their destinations in an unfolded state, and are folded within the compartment in which they function. Others are fully folded and may be post-translationally modified before they are sent to their cellular (or extracellular) destinations.

All this sorting and delivering (frequently across membrane barriers) is a complex and amazing process. But the information necessary to guide proteins to their final destinations is built into their structure, and recognized by cellular machinery which guides the process.

Information Processing: Translation



Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introductory Biochemistry by Carol Higginbotham is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book