Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
**PyPHLAWD Analysis of Iguania** Here, I provide all details, files, and commands used for this particular PyPHLAWD analysis. The following instructions allow replication of my analysis, and the commands for each module are provided. Note that directory and file path locations will be different depending on where these data are downloaded and stored locally. In the module commands, paths to directories are denoted by /PATH/, whereas files will end with typical extensions (.txt, .fasta, etc). ***PyPHLAWD Run*** I performed a PyPHLAWD analysis for Iguania using a pre-compiled database *vrt.05082018.db* available from the PyPHLAWD website. Running PyPHLAWD produced a table of information for all sequences occurring in the taxonomy label 'Iguania', which totaled 134,028 records. To give PyPHLAWD the best chance to find all 66 loci, I ran a baited analysis that relied on the unaligned fasta files resulting from the SuperCRUNCH analysis of Iguania with data downloaded directly from NCBI. These were the 'baits' for the analysis, and are provided in the Baits folder. The following command was used to run the analysis: ``` python /PyPHLAWD-master/src/setup_clade_bait.py Iguania /pyphlawd_runs/Iguania/Baits /pyphlawd_runs/pre-db/vrt.05082018.db /pyphlawd_runs/Iguania/Baited_Analysis logfile ``` The complete set of output files are available as a zipped directory. ***Post-Analysis*** The main outputs I wanted to see were the unaligned fasta files and the aligned fasta files of all loci. However, it was not easy to see how many taxa were included across loci as the output of PyPHLAWD writes fasta files with accession numbers only. In addition, PyPHLAWD will use all available subspecies for the analysis, which may not be desirable (as in this case). To overcome these obstacles, I used a script I wrote to relabel the the fasta files from PyPHLAWD (works for aligned and unaligned files) using the taxonomy table produced in the analysis. With all the aligned files in the directory '/Relabel/', I ran the following command: ``` python Relabel_Output_Files.py -i /Relabel/ -t /pyphlawd_runs/Iguania/Baited_Analysis/Iguania_8511/Iguania_8511.table -r desciption ``` This relabeled the aligned fasta file records with the original NCBI description lines. From here, I ran the files through the SuperCRUNCH ***Filter_Seqs_and_Species.py*** module to weed out any extra taxa with subspecies labeling (such that all subspecies are elevated to the species level and the best representative sequence is selected). I then used the ***Relabel_Fasta.py*** module of SuperCRUNCH to relabel these fasta file records by species name, and subsequently used the ***Concatenation.py*** module of SuperCRUNCH to concatenate these alignments and determine how many taxa and sequences were available. This PyPHLAWD analysis of Iguania produced 65 alignment files that contained 1,069 unique taxa, and the final concatenated alignment contained 10,397 sequences and was 66,100 base pairs long. A report of the number of loci for each taxon is contained in the *Taxa_Loci_Count.log* file, and the *Data_Partitions.txt* file contains information about the location of each locus in the concatenated alignment, which can be used for partitioned phylogenetic analyses and other applications. The final phylip alignment was used as input to run a RAxML analysis.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.