Personal tools
You are here: Home Science Introduction The engine of evolution Mutations in proteins, genes & genomes

Mutations in proteins, genes & genomes

A very short introduction to current knowledge on what mutations do to the structure of genomes, genes and proteins.

Evolution over many generations is inevitable, because organisms pass on their genetic material from one generation to the next, together with small changes. These small changes are called "mutations" and the sum of all genetic material is called "genome".

Hereditary information is encoded by the genome

  • The genome is a long sequence of DNA.
    • The importance of DNA sequences for inheritance is well understood.
    • Here we neglect all other potential ways of transmitting hereditary information.
  • DNA is like a text written with the 4 letters A, T, C, G.
    • The structure of DNA is a double helix (picture), where
      • Adenine pairs with Thymine and
      • Cytosine pairs with Guanine.
      • A, T, C, G = the 4 bases = nucleotides; if paired they are called base pairs (bp).
    • This knowledge started the molecular biology revolution in 1953.
    • Today the reading of DNA sequences is a routine activity. It is called "sequencing".
  • Sequences of whole genomes are available today.
    Access genomes over the web via one of the many Genome Browsers:


There are various forms of genome organization

For almost every rule, there seems to be an exception here. Here are the rules:

  • Genome size does not necessarily reflect complexity.
    • sometimes more complex genomes are longer (see comparison here)
    • but many exceptions exist (e.g. Amoeba dubia genome = 230 x Human genome)
    • genomes cannot be shorter than essential genes allow.
  • Protection and location of the DNA can vary.
    • just a few proteins = viruses that need a host cell to replicate
    • in the cell, but no special nucleous = 'prokaryonts' (microbes like bacteria, archaea)
    • in a special nucleous in the cell = 'eukaryonts' (includes all animals, plants, fungi)
    • in a special compartment of the cell = organellar genomes like mtDNA, cpDNA
  • Copies of the genome per cell can vary.
    • 1 copy = haploid = prokaryonts = 'standard asexual organisms'
    • 2 copies = diploid = typical eukaryotes= 'standard sexual organisms'
    • 3,4 or more copies = triploid, tetraploid or polyploid
  • Mode of inheritance can vary for different chromosomes.
    • autosomal = passed on by both parents (most chromosomes)
    • maternal = passed on only by the mother (e.g. mitochondrial DNA)
    • paternal = passed on only by the father (e.g. Y-chromosome)
    • x-linked = mother has 2 copies, father has 1 copy (e.g. X chromosome)
    • horizontal gene transfer = exchange of genes between non-related individuals  (mostly in bacteria)


Chromosomes are the large units of genome organisation.

  • Chromosomes are molecules
    • very long molecules = an uninterrupted chain of DNA-bases
    • cleverly packed (highly coiled by association with 'histone' proteins)
  • Physical structure of chromosomes can vary.
    • ring chromosomes can be found in
      • some viruses,
      • bacteria and other prokaryotes,
      • plasmids in bacteria,
      • mitochondrial and other organellar DNA in eukaryotes.
    • linear chromosomes like in eukaryotes have special structures:
      • The tips contain special structures (telomeres), composed of repetitive sequences that are replicated by a special enzyme (telomerase) to avoid the problems that arise out of copying a linear DNA molecule with the standard machinery (DNA polymerase).
      • Somewhere in the middle, there is a so-called 'centromere'. It's nature is somewhat of a mystery in most species except that we know that this is where the chromosome binds to the spindle microtubules that draw the sets of chromosomes apart during cell division.
  • Chromosome numbers are usually invariant within species
    • changes in chromosome number are rare.
    • the diploid human genome has 46 chromosomes (overview) + mitochondrial DNA
    • the diploid Drosophila melanogaster genome has 8 chromosomes (+mtDNA)
  • Genes are located at their corresponding locations (loci) on a chromosome.
    • the genetic distance between two genes can be measured by the frequency of recombination between them (idea of the early geneticist Morgan)
    • Sturtevant showed in 1913 that this can be used to determine the linear order of genes along the chromosome by observing the behaviour of gene combinations in experimental crosses.
    • this is the basis for genetic mapping, a hugely important tool for geneticists


Types of DNA sequences

  • Genes encode for proteins.
    • ca. 500 in the simplest bacteria (with parasitic life-style)
    • ca. 30000 in the most complex organisms known (man)
  • Introns break genes up into exons.
    • only exons code for proteins
    • introns have signal sequences for the cutting process (splicing)
    • introns can contain neutral sites or have regulatory functions
  • Regulatory sequences are at least as frequent as genes.
    • promotors + enhancers bind proteins that control the initiaition of transcription
  • Spacer regions exist.
    • they have no function except to maintain distance between genes.
  • Transposable elements (TEs) can jump around in the genome.
    • TEs can defy Mendel's laws, if they are active.
    • TE length is up to 10 Kbp; fragments left behind can be short.
    • TEs encode the enzymes that help them move.
    • TE insertion sites can be random or specific.
    • There are several classes of TEs with different modes of operation.
    • If TEs use RNA as an intermediate stage and produce infectious particles for moving around, then they are called 'retroviruses'. Examples: HIV and influenza.
    • TEs (in man 44%) can make a genome much bigger than appears necessary by accumulating many repeatitive sequences.
    • TEs are often found close to centromeres and telomeres in the so-called 'heterochromatin'; there are only few 'normal' genes in these regions.
  • Other repetitive sequences ('Satellite DNA')
    • highly repetitive, very short and often rich in the bases A + T
    • come as small (up to 100 repeats or so) tandem arrays of ...
      • ... minisatellites (0.5 - 30 Kbp) or
      • ... microsatellites (2-5 bp)
    • Microsatellites are highly variable in repeat number between individuals and thus very useful genetic markers for genetic mapping. In humans there are ca. 30000 microsatellites.
  • Large stretches of DNA have no known function ('Junk DNA').
    Such sequences include
    • pseudogenes = broken genes without current function
    • broken TEs that have inserted into each other
    • unnecessary duplications and repetitive sequences
    • other sequences
    Future research may or may not find a hidden function for these sequences.

Recombination of DNA sequences depends on their location

  • The further apart the physical distance on a chromosome the more recombination events can be expected between two genes.
  • Recombination rates are highly variable between different regions of the genome.
  • Recombination hot-spots (usually between genes) have been found in humans and chimps, but it is not yet clear, how many species have them.
  • Segregation of chromosomes during meiosis usually leads to free recombination for genes on different chromosomes.
  • Long-distance recombination comes exclusively from 'crossing-over' events.
  • Short-distance recombination has to consider 'gene-conversion' as well (the exchange of short stretches of DNA within a gene).
  • Recombination is greatly reduced in (and often near) regions of 'heterochromatin'.
  • There is usually no recombination over all or most of the length of Y chromosomes (or W chromosomes, if females are the heterogametic sex like in birds, some fishes and some insects).


Genes encode for proteins

The central dogma of Molecular Biology

  • Information always flows from DNA -> RNA -> Protein
  • The only exception is that sometimes RNA can be copied back to DNA. This has a clear structural basis in the similarity of the DNA and RNA molecules.


DNA is transcribed (copied) into mRNA (messenger-RNA)

  • This is like making a working copy of the key information before further processing
  • The transcription is done by the 'transcription complex' that includes many proteins (recognize regulatory sequences for start and stop, unwind the DNA, synthesize the mRNA according to the DNA template, correct errors, ...)
  • In eukaryotes, the mRNA needs to mature and has to be transported out of the nucleous. Maturation means cutting ('splicing') out all the 'introns' between the 'exons' to arrive at a mRNA that directly encodes the desired protein. In many more complex organisms there can be several ways how exons need to be arranged. This is done by alternative splicing, by which one gene can produce several different proteins.
  • In prokaryotes there is usually no need for mRNA to mature; here also several genes may be regulated at the same time in an 'operon' that can be transcribed to the same mRNA to speed up the activation of a metabolic pathway that is facilitated by a group of genes.
  • In rare cases RNA-editing can also lead to specific changes (like single bases).


mRNA is translated into an amino acid sequence

  • This is like using a cipher to translate an encoded message into readable text.
  • The molecular machine that does the translation is huge. It is called 'ribosome' and consists of many dozens of proteins and special RNA molecules (called 'ribosomal RNA' or rRNA). See a part of a ribosome's molecular structure here.
  • The ribosome uses the genetic code to translate the mRNA into a chain of amino acids. It does so in steps of 3 bases at a time (= 1 triplett = 1 codon = 1 amino acid). The core of the translation is done by 'transfer-RNAs' (tRNAs) that have an 'anti-codon' to match the codon they translate and that carry the corresponding amino acid for linking it to the growing chain.
  • The genetic code is almost universal and translates 61 codons into 20 amino acids (there are 3 stop codons; see table here). Only a few genetic systems, including mitochondrial DNA have a few deviations from the universal code.
  • The code is 'degenerate', i.e. some amino acids are encoded by more than 1 codon
  • Helpful features of the genetic code in analyses of evolution:
    • non-synonymous mutations are often under strong selection
      • mis-sense mutations just change one amino acid
      • non-sense mutations change the position of the stop codon (drastic change)
    • synonymous mutations are often under weak or no selection
      • up to 6 codons can encode for the same amino acid
      • depending on other features of the organism (local GC-bias, availability of certain tRNAs, ...) there may be some weak selection for the use of a particular codon. This can be observed as codon bias.
    • In a codon the
      1. base usually changes the amino acid (a few exceptions exist)
      2. base always changes the amino acid
      3. base often does not change the amino acid (exceptions exist)
    • Use computer programs to count the number of synonymous and non-synonymous positions in a sequence, as this depends on the actual sequence.


Proteins facilitate life

Amino acid chains fold into 3D structures to form proteins

Structural biology is concerned with predicting how.

  • Primary structure
    = a specific sequence made out of the 20 universal amino acids that build life.
  • Secondary structure
    = low level structural element of a protein (e.g. alpha-helix, beta-sheet ...)
  • Tertiary structure
    = overall folding structure of the whole amino acid chain
  • Quaternary structure
    = overall fold of all related amino acid chains (structure of a more complex machine)
  • Post-translational modifications
    = sometimes a protein needs chemical processing by other proteins to become active
  • General lessons from protein structures:
    • frequently there is an active center of the protein where amino acid changes are catastrophic
    • general destabilisations of the structure are only tolerable to some degree
    • most proteins have unimportant positions where almost any amino acid will do
  • It is very complicated to predict the structure (let alone the function) just from a sequence and the laws of physics.
  • Comparative analyses of proteins show that similar sequences often have similar structures and similar functions.
    • one can group proteins into families with similar structures and functions
    • often these originate by gene duplication and subsequent divergent evolution
    • below a sequence similarity of ca. 25%, randomness becomes difficult to exclude

Visit the Gallery of the Jena Library of Biological Macromolecules to get an impresson of the complexity of the structure of proteins. Here is the protein complex that catches light in the eyes of mammals to allow us to see.


Life relies on a huge metabolic network of proteins

The mechanics of life critically depend on

  • taking up nutrients and generating energy from them (mostly 'ATP')
  • breaking them down into basic building blocks
  • transforming these building blocks into required building blocks, if necessary
  • synthesizing proteins, DNA and other important biomolecules
  • growing properly in order to reproduce and take up nutrients ...

To do this, each cell operates a huge metabolic network of enzymes, and in case of multi-cellular organisms a sometimes very complicated network of developmental genes that determine which organ is built where.

For a taste of the complexity of these networks, just check out a map of important metabolic pathways that includes some 550 enzymes. Studying such maps shows that

  • some parts of the network are essential for survival
  • there is a large degree of redundancy built in to cope with emergencies
  • not every pathway is needed all the time.

This knowledge about the complexity of life becomes important, when we want to assess the effects on fitness that various mutational changes of DNA can have.


Mutations change the DNA sequence

  • Mutations provide the ultimate source of the variability used in evolution and in animal and plant breeding.
  • Mutations are random changes and are independent of any 'requirements' of organisms.
  • Mutations are induced by the chemical reactions that a particular DNA molecule experiences. Thus each type of mutation occurs at its own rate.

Types of mutations

  • Point mutations (= 'nucleotide substitutions')
    • at non-coding DNA site (=> nothing happens, unless it's a regulatory sequence)
    • at synonymous site (=> does not change amino acid; minor effect, if at all)
    • at non-synonymous sites (=> changes amino acid; can have a major effect)
  • Insertions or Deletions
    • of non-coding sites (=> nothing happens, unless it's a regulatory sequence)
    • within protein coding DNA:
      • length is a multiple of 3: no frame-shift, affects just those amino acids
      • other length: all subsequent amino acids are affected as well => major damage
  • Chromosome rearrangements
    • Insertions of many genes
    • Deletions of many genes (=> dangerous, if genes were important)
    • Inversions (change the orientation of a large block along the chromosome)
    • Translocations (from one chromosome to the next)
    • Fusion of two chromosomes
    • Breaking up a chromosome
    • Polyploidisation (doubling of the whole genome)


Effects of mutations on fitness

  • Mutations can be advantageous or harmful or neutral in their effect on fitness.
  • Most mutations of any effect are deleterious.
  • The more proteins a mutation affects, the bigger its effect and the larger its probability of being deleterious.
  • DME:  distribution of mutational effects
    The DME is a largely unknown, but important quantity for evolutionary models.
  • DDME: distribution of deleterious mutational effects
    Recent progress has lead to estimates of the DDME that are compatible with a log-normal distribution (see estimates in the fruitfly Drosophila)
  • DAME: distribution of advantageous mutational effects
    Mathematical theories predict that the DAME probably follows an exponential law, but estimates of the sizes of these effects are difficult.
  • Epistasis: Interactions between mutations.
    Different mutations can interact in their effects on fitness, e.g. if they both affect the same structures:
    • 'synergistic epistasis': the combined effect is larger than the individual effects
    • 'antagonistic epistasis': the combined effect is smaller than the individual effects
    On average these two types of interactions seem to be equally frequent.
    For many models it is in order to assume that no such interactions occur.


Rates of origin of mutations

  • Mutation rates are generally low
     - e.g. 10-8 / generation for humans
     - organisms spend much effort to correct any errors
  • Mutation rates vary along the genome and among genomes.
  • Each type of mutation occurs at its own rate.
    • Transitions (A<->G or C<->T) are often much more frequent than
      transversions (all other point mutations)
    • CpG sites in mammals mutate much more frequent than other sites
  • Mutagenic environments can increase the mutation rate substantially
    • Radiation
    • Chemical mutagens
  • Can be observed by sequencing known relatives
    (Raw observation, no selective removal, lots of work)
  • Can be inferred from neutral evolution in the past
    (if the the sequences and the divergence date of two species is known)
Document Actions