Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
**PyPHLAWD Analysis of Dipsacales** Here, I provide all details, files, and commands used for this particular PyPHLAWD analysis. The following instructions allow replication of my analysis, and the commands for each module are provided. Note that directory and file path locations will be different depending on where these data are downloaded and stored locally. In the module commands, paths to directories are denoted by /PATH/, whereas files will end with typical extensions (.txt, .fasta, etc). ***PyPHLAWD Run*** I performed a PyPHLAWD analysis for Dipsacales using a pre-compiled database `pln.05082018.db` available from the PyPHLAWD website. Running PyPHLAWD produced a table of information for all sequences occurring in the taxonomy label 'Dipsacales', which totaled 12,348 records. To give PyPHLAWD the best chance to find the 4 loci, I ran a baited analysis that relied on the same set of unaligned fasta files used in the PyPHLAWD paper. These were the 'baits' for the analysis, and are provided in the Baits folder. The following command was used to run the analysis: ``` python /PyPHLAWD-master/src/setup_clade_bait.py Dipsacales /pyphlawd_runs/Dipsacales/Baits /pyphlawd_runs/pre-db/pln.05082018.db /pyphlawd_runs/Dipsacales/Baited_Analysis logfile ``` The complete set of output files are available as a zipped directory. ***Post-Analysis*** The main outputs I wanted to see were the unaligned fasta files and the aligned fasta files of all loci. However, it was not easy to see how many taxa were included across loci as the output of PyPHLAWD writes fasta files with accession numbers only. To overcome these obstacles, I used a script I wrote to relabel the the fasta files from PyPHLAWD (works for aligned and unaligned files) using the taxonomy table produced in the analysis. With all the aligned files in the directory '/Relabel/', I ran the following command: ``` python Relabel_Output_Files.py -i /Relabel/ -t /pyphlawd_runs/Dipsacales/Baited_Analysis/Dipsacales_4199/Dipsacales_4199.table -r taxon ``` This relabeled the aligned the fasta file records with the taxon labels present in the `Dipsacales_4199.table` table file. This script will use whatever taxon name is present in that particular column, whether it is a species label or subspecies label. In this case, I wanted to use the subspecies names, if present. I then used the `Concatenation.py` module of SuperCRUNCH to concatenate these alignments and determine how many taxa and sequences were available. This PyPHLAWD analysis of Dipsacales produced 4 alignment files that contained 641 unique taxa, and the final concatenated alignment contained 1,510 sequences and was 6,666 base pairs long. A report of the number of loci for each taxon is contained in the `Taxa_Loci_Count.log` file, and the `Data_Partitions.txt` file contains information about the location of each locus in the concatenated alignment, which can be used for partitioned phylogenetic analyses and other applications. The final phylip alignment was used as input to run a RAxML analysis.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.