Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
Here we provide data sets we selected to benchmark the performance of RIBAP. ### Performance benchmark We used the genomes in the `input-performance-benchmark` folder to test the performance of RIBAP in calculating core genes on the species and genus level. All calculations were run on a HPC using the following command (example input for _Enterococcus_ on genus level): ```bash nextflow run hoelzer-lab/ribap -r 1.0.3 -w work --output results/enterococcus-genus-lvl/ --fasta 'Enterococcus/*/*fna' -profile slurm,singularity --chunks 20 ``` Please note that the parameter `--chunks` determines into how many parallel running jobs the ILP corpus will be split (per default 8 chunks). Especially for many input genomes, RIBAP can otherwise take a lot of time (a current limitation of the computationally challenging ILP approach). The (reduced bc of size) output can be found in `output-perofrmance-benchmark`. **Please note** that the FASTA files in the output folders are compressed with `xz` and need to be uncompressed first! ### Resource benchmark We used the three data sets in the `input-resource-benchmark` folder to test the runtime and disk space requirements of RIBAP with (`--keepILPs`) and without (the default) the automatic removal of ILP intermediate results. The _Chlamydia psittaci_ genomes used for this benchmark were sampled from [https://osf.io/rbca9/](https://osf.io/rbca9/). We used the following commands for the resource benchmark: ```bash # laptop nextflow run hoelzer-lab/ribap -r 1.0.3 --fasta '*.fasta' --cores 8 --max_cores 8 -profile local,docker -w work --output ribap-results --keepILPs # HPC nextflow run hoelzer-lab/ribap -r 1.0.3 --fasta '*.fasta' -profile slurm,singularity -w work --output ribap-results --keepILPs ``` We also provide the results of the RIBAP pipeline. ### Percentage of conserved proteins (POCP) For the genomes selected for each species (see `input-performance-benchmark`), we calculated the percentage of conserved proteins (POCP, https://github.com/hoelzer/pocp v2.3.1) according to [Qin _et al._ (2014)](https://journals.asm.org/doi/full/10.1128/JB.01688-14) but using DIAMOND instead of BLASTP for protein alignments to investigate how similar the selected genomes on protein level are. In the `pocp-results` folder, you can find the resulting matrices for each species. As input, we used the proteins annotated via RIBAP which uses internally Prokka. For further details see our publication and [https://github.com/hoelzer-lab/ribap](https://github.com/hoelzer-lab/ribap).
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.