The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny...' --Isaac Asimov
That's Funny… random header image

Phylogeny workflow

January 13th, 2009 by eric

This is my general phylogeny workflow, starting with raw FASTA sequences and ending in a maximum parsimony or maximum likelihood phylogenetic tree with distances. Programs used: ARB 07.12.06org, Seaview, PAUP* 4.0 beta 10 (MacOSX), PHYLIP 3.68, ModelTest Server 1.0, PRAP2, Inkscape, XFIG.

  1. get sequences into ARB, via e.g. greengenes or SILVA or RDP. If importing an alignment in FASTA format you may need to use the following IFT (save with rest of arb .ift files, e.g. usr/arb/lib/import/fasta_wgap.ift):
    AUTODETECT      ">*"
            #Global settings:
    KEYWIDTH        1
    BEGIN   ">??*"
    MATCH   ">*"
            SRT "* *=*1:*\t*=*1"
            WRITE "name"
    MATCH   ">*"
            SRT "*|*=*1"
            WRITE "full_name"
    SEQUENCEAFTER   "*"
    SEQUENCESRT     ""
    SEQUENCECOLUMN  0
    SEQUENCEEND     ">*"
    # DONT_GEN_NAMES
    CREATE_ACC_FROM_SEQUENCE
    END     "//"
  2. in ARB: prune tree to closest isolates, closest clones/from environment, and enough internal branches for context. make a copy for each tree you work on.
  3. in ARB: mark all in tree, look at alignment, unselect those not full length/long enough, note start and end of full length section by position.
  4. in ARB: export sequence to fasta (File–>Export–>Export to foreign format). You may want to use a simple export file (EFT) like the following (save it with the others e.g. usr/arb/lib/import/fasta_simple.eft or wherever) :
    SUFFIX          fasta
    BEGIN
    >*(name)
    *(|export_sequence)

    During the export use a hypervariable SAI (made by parsimony) to filter by quality (Lane 1991 mask) “-=.0123456” and region to export, using the starting and ending positions you determined previously

  5. in SEAVIEW, open exported FASTA file and save as NEXUS file (the export filter from ARB to PAUP doesn’t seem to work very well)
  6. PAUP may get grouchy if you have digit-only sequence names, so you can run the following script to temporarily change the names.
    s/'\([0-9]\{4,\}\)'/'\1tmpname'/g

    Save the script as ‘fromdigits.sed’ (it’s easier as a file because the apostrophes in the regular expression complicate things on the command line) and run

    sed -f fromdigits.sed [filename.nexus] > [filename.converted.nexus]

    After running PAUP you can change the names back in the PHYLIP tree files by using the following sed script:

    sed -e 's/\([0-9]\{4,\}\)tmp[name]*/\1/g' [treefile.phy] > [treefile.converted.phy]

    Or, from NEXUS tree files:

    sed -e 's/\([0-9]\{4,\}\)tmp[name]*/\1/g' [treefile.nexus] > [treefile.converted.nexus]
  7. in a text editor, open NEXUS file and append PAUP block to end, save as new file. for bootstrapping, the PAUP block may come from MODELTEST (NOTE bug in PAUP 4.10b means you have to add an extra command to MODELTEST input block: “default lscores longfmt=yes”). for ratcheting the PAUP block may come from PRAP2
  8. run in PAUP to get parsimony/likelihood trees, save trees in phylip format. if you only have a NEXUS tree, open it in a texteditor and delete the introductory NEXUS block. keep only what is between the outermost parentheses.
  9. in ARB: input consensus tree with bootstraps back into arb using Tree–>TreeAdmin–>Import. make sure to remove the period (.) from the tree name or it won’t import.
  10. in ARB: save distance matrix to ‘infile’ by doing Tree–>Build Tree–>Distance Methods–>Phylip Distance Matrix, then use the same filter as above for trees. Hit ‘y’ and try to save the file that opens up as ‘/tmp/infile’. If no file pops up, make sure you’ve installed ‘xedit’ or make a soft link from e.g. /usr/bin/xedit to some other text editor e.g. gedit, kedit. If you can’t save the file anywhere, copy it from the ~/.arb_tmp directory in your home folder.
  11. in ARB: save tree to /tmp/arbtree (Tree–>Tree Admin–>Export).
  12. here is a script that does the following 3 steps:
    #
    # Script file for automating the process of adding distances to likelihood tree
    # REC

    # turns tree from arb ('arbtree') and distance matrix from dnadist ('distmatrix') into arb tree plus distances ('arbtree.fitch.outtree')

    filedate=`date +%s`

    mkdir tmp$filedate
    cd tmp$filedate

    # clean up tree file
    # the sed '1d' removes the notes line -- if you have zero or more than one you'll need to change this or do it manually
    # the tr -d [:space:] gets rid of all the whitespace
    sed '1d' < ./../arbtree | tr -d [:space:] > intree

    # unroot tree file
    (echo y; echo w; echo u; echo q)|(retree)
    mv intree arbtree.retree.intree
    mv outtree intree

    # run fitch to add the distances to the tree we've supplied
    cp ./../distmatrix infile
    (echo d; echo u; echo -; echo y)|(fitch)
    mv intree arbtree.fitch.intree
    mv infile arbtree.fitch.infile
    mv outtree arbtree.fitch.outtree
    mv outfile arbtree.fitch.outfile
    cd ..
  13. to do it manually: in a text editor: delete first line and despace tree for phylip, or do the following on the command line:

    sed '1d' < arbtree | tr -d [:space:] > intree

    with PHYLIP: run retree –> write unrooted tree, then mv outtree intree

    with PHYLIP: run fitch, choose options D –> minimum evolution, U –> input file, – –> no negative lengths

  14. in ARB: import outtree with Tree–>Tree Admin–>Import
  15. get NDS info organized and displayed correctly, e.g.
    • –to rename accession numbers to accession numbers from greengenes
      copy full_name to acc

      :*=*(full_name)

      then get rid of extraneous info:

      /[0-9][0-9]* [A-Z][A-Z]*[0-9]*\.*[0-9]* //
      /\..*//
    • –to rename full_name by sequence info
      copy full_name to tmp and to backup (just in case)

      :*=*(full_name)

      then get rid of extraneous info:

      /[0-9][0-9]* [A-Z][A-Z]*[0-9]*\.*[0-9]* //
    • then copy tmp to full_name
      :*=*(tmp)
  16. in ARB: export to XFIG (Tree–>Export to xfig) with no handles, no colors, full tree
  17. in XFIG: immediately export to SVG
  18. in INKSCAPE: open svg tree file and edit as necessary
  19. in INKSCAPE: resize page (File –> document properties –> fit page to selection), resize selection to maximize page, export bitmap 300dpi png
  20. Tags:   · · No Comments

Leave A Comment

0 responses so far ↓

  • There are no comments yet...add one by filling out the form below.