Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
# Calculate percentage of conserved proteins Originally introduced by [Qin et al. in 2014](https://pubmed.ncbi.nlm.nih.gov/24706738/), POCP is a valuable metric for assessing prokaryote genus boundaries. Here, I introduce a computational pipeline for automated POCP calculation, aiming to enhance reproducibility and ease of use in taxonomic studies. The POCP pipeline is implemented in Nextflow with Conda and Docker support and is freely available on GitHub at [github.com/hoelzer/pocp](https://github.com/hoelzer/pocp). To showcase the pipeline performance and output, I re-analyzed genomic data of 15 species from a study about the genus delineation of _Chlamydiales_ species: [Pannekoek Y, _et al_. "Genus delineation of _Chlamydiales_ by analysis of the percentage of conserved proteins justifies the reunifying of the genera _Chlamydia_ and _Chlamydophila_ into one single genus _Chlamydia_". Pathog Dis. 2016 Aug 1; 74(6)](https://doi.org/10.1093/femspd/ftw071) Here, POCP values were used to justify the reunifying of the genera _Chlamydia_ and _Chlamydophila_ into one single genus _Chlamydia_. I obtained the genome FASTAs from NCBI based on [Supplementary Table 1](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/femspd/74/6/10.1093_femspd_ftw071/5/ftw071_supplementary_data.zip?Expires=1705169484&Signature=2CefC0YREtGwolB8yajaabV8yfOkjmuR7Hf-K~w8Q-d1vBYf10O5nKDlqQXD~l8ogelIJbeKDjjRpKY2dYUuGaMZw89wXvzAqsxXxtJfTOVqJSiBOofzNt2AY0Jqbm6TKo60G653LVJTZE9ceT3UiuO7uhntgkC6LsGcf0Y9e7-2kn0xPyrLmcwZjI1lToMiJ5oNiA9tLiZese9CFXSh5VoJM~BB2-sG1b2MOaBoRqOaL-6ySwEzYS65wbrAJFZ7q~aXUz7jPYE2~fg3Ld~V6CG7arsjDP5neHDL-vnvTClp73IRXA~UhOxHjAfaQTSuq2tigT9ncljn-0l7717NCA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA) from the previously mentioned study. The pipeline in version 2.3.1 ran 26 minutes on a Linux laptop with 8 cores and less than 4 GB RAM usage using the following command: ```bash # install/update the pipeline nextflow pull hoelzer/pocp # run nextflow run hoelzer/pocp -r 2.3.1 --genomes '*.fasta' -profile local,docker ``` This repository provides * the input FASTA genomes * `Prokka` gene and protein annotations * `DIAMOND` ortholog findings * POCP calculations and final table output If you use the POCP Nextflow pipeline, please cite the original POCP study that introduced the metric and the POCP-nf pipeline: **[Qin, Qi-Long, _et al_. "A proposed genus boundary for the prokaryotes based on genomic insights." Journal of bacteriology 196.12 (2014): 2210-2215.](https://pubmed.ncbi.nlm.nih.gov/24706738/)** **[Martin Hölzer. "POCP: An automatic Nextflow pipeline for calculating the percentage of conserved proteins in bacterial taxonomy". SOME JOURNAL. Hopefully in 2024.]()** ### Abbreviations used in file names * _Chlamydia abortus_, cab * _Chlamydia avium_, cav * _Chlamydia caviae_ , cca * _Chlamydia felis_, cfe * _Chlamydia gallinacea_, cga * _Chlamydia ibidis_, cib * _Chlamydia muridarum_, cmu * _Chlamydia pecorum_, cpe * _Chlamydia pneumoniae_, cpn * _Chlamydia psittaci_, cps * _Chlamydia trachomatis_, ctr * _Parachlamydia acanthamoebae_, pac * _Simkania negevensis_, sne * _Waddlia chondrophila_, wch * _Candidatus Rubidus massiliensis_, cru
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.