Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
# Phylign Phylign (Břinda *et al*, 2023) allows efficient searching and full alignment of query sequences to huge datasets of bacterial assemblies that have been phylogenetically compressed using MiniPhy (https://github.com/karel-brinda/MiniPhy). We have recently adapted Phylign to allow users to align query sequences to AllTheBacteria v0.2 or subsets of this dataset. Detailed information, including how to run Phylign on computing clusters, can be found in the README on our GitHub page (https://github.com/AllTheBacteria/Phylign/blob/main/README.md). Briefly, the main steps to set this up are as follows: 1. Install conda via `apt-get` on Linux, `brew` on OS X or using the instructions on the Anaconda website (https://conda.io/projects/conda/en/latest/user-guide/install/index.html). 2. Clone the Phylign repository by opening a terminal and running `git clone https://github.com/AllTheBacteria/Phylign && cd Phylign`. 3. Install all of the Phylign dependencies in a conda environment by running `conda env create -f environment.yaml && conda activate phylign`. 4. Download the MiniPhy-compressed batches of assemblies you want to query and place them in `asms/`. The compressed assemblies are available at https://ftp.ebi.ac.uk/pub/databases/AllTheBacteria/Releases/0.2/assembly/. 5. Download the compressed COBS indices from that match the batches of assemblies you downloaded and place them in `cobs/`. The COBS indices are available at https://ftp.ebi.ac.uk/pub/databases/AllTheBacteria/Releases/0.2/indexes/phylign/. 6. If you only want to query a subset of the assemblies, modify `data/batches_2m.txt` to just include the batches of assemblies you are interested in. 7. Replace all of the files in `input/` with your query FASTA or FASTQ files. 8. Run `make` to search for the query sequences in the assemblies you downloaded. The results will be saved to `output/`.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.