***TL;DR for impatient people: If you just need a qiime2 GTDB classifier quickly and don't care how I did it, just download the .qza files and follow the two template bash scripts for how to slice the classifier to your primers.*** **Goal**: To correlate SSU amplicons with GTDB taxonomy so we know how to pick genomes that are representative of organisms identified using amplicon sequencing surveys. **Method**: Took SSU sequences files from GTDB ssu_r86.1_20180911, curated semi-manually to remove obvious incongruencies between GTDB/SILVA132 taxonomy (using RDP classifier) at domain and phylum levels, and created taxonomy/fna artifacts. Can be used in your qiime2 pipeline with minimal modification (slice to your primer sequences and then train your classifier). **NB**: Most scripts/steps are included but some of it I did using bash tools like cut/sed and have not included all of these steps. **Note for oceanographers**: It seems like some taxa that are abundant in amplicons are either missing or under-represented currently in GTDB. Two examples I've seen so far are *Candidatus* Actinomarina and SAR11 Group IV. They don't get classified beyond "Bacteria" using the "qiime feature-classifier scikit-learn" with default parameters in qiime2-2018.8. If you fool around with the settings (i.e. set --p-confidence to -1), you can get qiime2 to give a proper phylum-level classification but it seems inaccurate beyond that level (e.g. *Candidatus* Actinomarina SSU seqs get classified to some other actino from a non-marine environment).
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.