This is my general phylogeny workflow, starting with raw FASTA sequences and ending in a maximum parsimony or maximum likelihood phylogenetic tree with distances. Programs used: ARB 07.12.06org, Seaview, PAUP* 4.0 beta 10 (MacOSX), PHYLIP 3.68, ModelTest Server 1.0, PRAP2, Inkscape, XFIG.
- get sequences into ARB, via e.g. greengenes or SILVA or RDP. If importing an alignment in FASTA format you may need to use the following IFT (save with rest of arb .ift files, e.g. usr/arb/lib/import/fasta_wgap.ift):
AUTODETECT ">*"
#Global settings:
KEYWIDTH 1
BEGIN ">??*"
MATCH ">*"
SRT "* *=*1:*\t*=*1"
WRITE "name"
MATCH ">*"
SRT "*|*=*1"
WRITE "full_name"
SEQUENCEAFTER "*"
SEQUENCESRT ""
SEQUENCECOLUMN 0
SEQUENCEEND ">*"
# DONT_GEN_NAMES
CREATE_ACC_FROM_SEQUENCE
END "//" - in ARB: prune tree to closest isolates, closest clones/from environment, and enough internal branches for context. make a copy for each tree you work on.
- in ARB: mark all in tree, look at alignment, unselect those not full length/long enough, note start and end of full length section by position.
- in ARB: export sequence to fasta (File–>Export–>Export to foreign format). You may want to use a simple export file (EFT) like the following (save it with the others e.g. usr/arb/lib/import/fasta_simple.eft or wherever) :
SUFFIX fasta
BEGIN
>*(name)
*(|export_sequence)During the export use a hypervariable SAI (made by parsimony) to filter by quality (Lane 1991 mask) “-=.0123456” and region to export, using the starting and ending positions you determined previously
- in SEAVIEW, open exported FASTA file and save as NEXUS file (the export filter from ARB to PAUP doesn’t seem to work very well)
- PAUP may get grouchy if you have digit-only sequence names, so you can run the following script to temporarily change the names.
s/'\([0-9]\{4,\}\)'/'\1tmpname'/g
Save the script as ‘fromdigits.sed’ (it’s easier as a file because the apostrophes in the regular expression complicate things on the command line) and run
sed -f fromdigits.sed [filename.nexus] > [filename.converted.nexus]After running PAUP you can change the names back in the PHYLIP tree files by using the following sed script:
sed -e 's/\([0-9]\{4,\}\)tmp[name]*/\1/g' [treefile.phy] > [treefile.converted.phy]Or, from NEXUS tree files:
sed -e 's/\([0-9]\{4,\}\)tmp[name]*/\1/g' [treefile.nexus] > [treefile.converted.nexus] - in a text editor, open NEXUS file and append PAUP block to end, save as new file. for bootstrapping, the PAUP block may come from MODELTEST (NOTE bug in PAUP 4.10b means you have to add an extra command to MODELTEST input block: “default lscores longfmt=yes”). for ratcheting the PAUP block may come from PRAP2
- run in PAUP to get parsimony/likelihood trees, save trees in phylip format. if you only have a NEXUS tree, open it in a texteditor and delete the introductory NEXUS block. keep only what is between the outermost parentheses.
- in ARB: input consensus tree with bootstraps back into arb using Tree–>TreeAdmin–>Import. make sure to remove the period (.) from the tree name or it won’t import.
- in ARB: save distance matrix to ‘infile’ by doing Tree–>Build Tree–>Distance Methods–>Phylip Distance Matrix, then use the same filter as above for trees. Hit ‘y’ and try to save the file that opens up as ‘/tmp/infile’. If no file pops up, make sure you’ve installed ‘xedit’ or make a soft link from e.g. /usr/bin/xedit to some other text editor e.g. gedit, kedit. If you can’t save the file anywhere, copy it from the ~/.arb_tmp directory in your home folder.
- in ARB: save tree to /tmp/arbtree (Tree–>Tree Admin–>Export).
- here is a script that does the following 3 steps:
#
# Script file for automating the process of adding distances to likelihood tree
# REC
# turns tree from arb ('arbtree') and distance matrix from dnadist ('distmatrix') into arb tree plus distances ('arbtree.fitch.outtree')
filedate=`date +%s`
mkdir tmp$filedate
cd tmp$filedate
# clean up tree file
# the sed '1d' removes the notes line -- if you have zero or more than one you'll need to change this or do it manually
# the tr -d [:space:] gets rid of all the whitespace
sed '1d' < ./../arbtree | tr -d [:space:] > intree
# unroot tree file
(echo y; echo w; echo u; echo q)|(retree)
mv intree arbtree.retree.intree
mv outtree intree
# run fitch to add the distances to the tree we've supplied
cp ./../distmatrix infile
(echo d; echo u; echo -; echo y)|(fitch)
mv intree arbtree.fitch.intree
mv infile arbtree.fitch.infile
mv outtree arbtree.fitch.outtree
mv outfile arbtree.fitch.outfile
cd .. - in ARB: import outtree with Tree–>Tree Admin–>Import
- get NDS info organized and displayed correctly, e.g.
-
–to rename accession numbers to accession numbers from greengenes
copy full_name to acc:*=*(full_name)then get rid of extraneous info:
/[0-9][0-9]* [A-Z][A-Z]*[0-9]*\.*[0-9]* //
/\..*// - –to rename full_name by sequence info
copy full_name to tmp and to backup (just in case):*=*(full_name)then get rid of extraneous info:
/[0-9][0-9]* [A-Z][A-Z]*[0-9]*\.*[0-9]* // - then copy tmp to full_name
:*=*(tmp)
-
–to rename accession numbers to accession numbers from greengenes
- in ARB: export to XFIG (Tree–>Export to xfig) with no handles, no colors, full tree
- in XFIG: immediately export to SVG
- in INKSCAPE: open svg tree file and edit as necessary
- in INKSCAPE: resize page (File –> document properties –> fit page to selection), resize selection to maximize page, export bitmap 300dpi png
to do it manually: in a text editor: delete first line and despace tree for phylip, or do the following on the command line:
with PHYLIP: run retree –> write unrooted tree, then mv outtree intree
with PHYLIP: run fitch, choose options D –> minimum evolution, U –> input file, – –> no negative lengths
Tags: linux · phylogeny · scienceNo Comments
0 responses so far ↓
There are no comments yet...add one by filling out the form below.