Personal tools
You are here: Home People Loewe, Laurence Teaching Evolutionary Genetics Practical 1 Coalescent

Practical 1 Coalescent

The goal of this practical is to get a feeling for the variability of neutral evolution by simulating a large number of different paths of neutral evolution using SimCoal2. The data generated this way are then analyzed using Arlequin3.


Instructions for WinXP


1. Install

  • Make a folder for the practical
  • Google for and download the latest version of Simcoal2 for Windows
  • Google for and download the latest version of Arlequin3 for Windows
  • Unzip both packages into the practical folder


2. Familiarize yourself with the documentation of Simcoal & Arlequin

  • No need to read everything at this stage, but you should skim the documentation up to the point where you know where to find what you are looking for


3. Test run the SimCoal sample file:

  • Paste the sample configuration file for SimCoal from below into a file with the name "MitoEvol_V1.par"
  • Put MitoEvol_V1.par into the folder where the Simcoal binary is
  • Doubleclick on the SimCoal executable to start the simulations. You will be asked for:
    • Generic input file name (=> enter "MitoEvol_V1", without ".par")
    • No of random samples to generate (=> enter 100 for this test)
    • Genotypic (1) or Haplotypic (0) data output for Arlequin  (=> enter 0)
  • Wait for the results (= few seconds only for these few simple simulations): You will see a new results-folder ( "MitoEvol_V1") in Simcoal's directory. It contains:
    • Many Arlequin analysis files (*.arp files), one for each sample
    • One Arlequin analysis batch file (*.arb file) for automated analysis of all individual samples
  • Warning: SimCoal will overwrite your results folder without cleaning it, so make sure you always increase the version counter in your configurations filename, before you start simcoal (or delete your previous results folder).

4. Test run an Arlequin analysis of the SimCoal sample file

  • Open Arlequin (Goto arlequin folder and doubleclick "WinArl3.exe")
  • Open the batch project file
    • Click File>OpenProject
    • Navigate to the simcoal folder and then to the simulation results "MitoEvol_V1" folder
    • Change the type of files that is displayed in the window from *.arp to *.arb (at the bottom of the window)
    • Select the "MitoEvol_V1.arb" batch file
  • Select the "Settings" Tab and select the following:
    • Under "Molecular diversity indices" (3rd from bottom) select "standard diversity indices", "molecular diversitiy differences", and all 4 "Theta" values.
  • Switch back to the "BatchFile" tab
  • Make sure the Batch file settings choice is "Use interface settings"
  • Select the first 4 results to summarize (From Gene diversity to Theta values)
  • Press start (now Arlequin will walk over all your project files in less than 1 sec per file) and don't worry about a "List index out of bounds (101) error, if you encounter it.
  • After completing everything, Arlequin will have generated:
    • A results folder for each sample that had been generated (*.res)
    • A few *.sum results files that contain a table where each row is a sample and each column is a parameter of interest

5. Analyze the variation in diversity in your 100 coalescence simulations in Excel
  • Start Excel  (or your favourite plotting program)
  • Open the "theta.sum" file in excel and follow the default import suggestions
  • Make nice plots of your data:
    • Histograms of theta(pi) and theta(s)
    • Test for correlations between the two and plot them against each other
  • Note for Excel users: To generate histograms in Excel, you need a special add-in:
    • Goto Tools > Add-Ins ...
    • Select the add-in "Data analysis" and install it
    • To generate a histogram, go to Tools > Data Analysis, then select histogram and point to the cells that cointain the raw data. If you don't want to specify the bins, Excel will come up with some. The rest should be straight forward.


6. Now that you know how to run such simulations in principle, adjust the SimCoal2 input file to reflect

  • either a 10 fold increase in mutation rates as observed in recent pedigrees or
  • some other change in the biological model or analysis that you are interested in (e.g. demography)
  • make several test runs (don't forget to rename the PAR-files to avoid a data chaos)
  • make several quick and dirty analyses until you think you found something interesting


7. Once you have your interesting question / analysis, run it in production quality

  • Run 10000 samples (enough in many cases) to get a full distribution of
    • Theta and Tajima's D for the model in MitoEvol_V1
    • of those results that you want to use in your final analysis
  • Be careful and make a CPU time + Disk space estimate first, especially if you allow for recombination and large Ne...
  • If computers run for longer, then watch out for automated logouts. Don't leave your simulations completely unattended in the lab. Overnight computation is not possible here for many reasons.


8. For those of you who need a mark on this, please write up a nice ca. 1000 word essay (incl. plots) of what you did and how you interpret it. Please include

  • Your analyses of the MitoEvol_V1 runs and what they can say about the precision with which the date of a most recent common ancestor can be inferred in such a data set.
  • What further changes you wanted to implement upon the standard file and why
  • Your results and their potential significance.


Your final hand-in must contain your text with all figures that show your results and the SimCoal2 input file that you used to define your model (printout). Hand in is handled by Judith McQueen. She will need your essays at Monday the 12th Marth 2007 by 12.00 




SimCoal2 - input file ("MitoEvol_V1.par")


//Parameters for the coalescence simulation program : simcoal.exe
1 samples to simulate : 1 population of mitochondrial DNA tentatively resembling some human parameters
//Population effective sizes (number of genes 2*diploids)
2500
//Samples sizes (number of genes 2*diploids)
25
//Growth rates    : negative growth implies population expansion
0
//Number of migration matrices : 0 implies no migration between demes
0   
//historical event: time, source, sink, migrants, new deme size, new growth rate, migration matrix index
0 historical event //Stop growth
//Number of independent loci [chromosome] (Number of sequences of 300 bp per gamete)
1 1
// Chromosome structure 1 begins with number of loci
1
//per block: data type, number of loci, per generation recombination and mutation rates and optional parameters
DNA    1000        0.000      0.000002    0.33


Document Actions