Main content



Loading wiki pages...

Wiki Version:
***TL;DR for impatient people: If you just need a qiime2 GTDB classifier quickly and don't care how I did it, just download the .qza files and follow the two template bash scripts for how to slice the classifier to your primers.*** **Goal**: To correlate SSU amplicons with GTDB taxonomy so we know how to pick genomes that are representative of organisms identified using amplicon sequencing surveys. **Method**: Took SSU sequences files from GTDB ssu_r86.1_20180911, curated semi-manually to remove obvious incongruencies between GTDB/SILVA132 taxonomy (using RDP classifier) at domain and phylum levels, and created taxonomy/fna artifacts. Can be used in your qiime2 pipeline with minimal modification (slice to your primer sequences and then train your classifier). **NB**: Most scripts/steps are included but some of it I did using bash tools like cut/sed and have not included all of these steps. **Note for oceanographers**: It seems like some taxa that are abundant in amplicons are either missing or under-represented currently in GTDB. Two examples I've seen so far are *Candidatus* Actinomarina and SAR11 Group IV. They don't get classified beyond "Bacteria" using the "qiime feature-classifier scikit-learn" with default parameters in qiime2-2018.8. If you fool around with the settings (i.e. set --p-confidence to -1), you can get qiime2 to give a proper phylum-level classification but it seems inaccurate beyond that level (e.g. *Candidatus* Actinomarina SSU seqs get classified to some other actino from a non-marine environment).
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.