<p><strong>Abstract</strong></p> <p>Marine bacteria and archaea play key roles in global biogeochemistry. To improve our understanding of this complex microbiome, we employed single cell genomics and a randomized, hypothesis-agnostic cell selection strategy to recover 12,715 partial genomes from the tropical and subtropical euphotic ocean. A substantial fraction of known microbial coding potential was recovered from a single, 0.4 mL ocean sample, which indicates that genomic information disperses effectively across the globe. Yet, we found each genome to be unique, implying limited clonality within prokaryoplankton populations. Light harvesting and secondary metabolite biosynthetic pathways were numerous across lineages, highlighting the value of single cell genomics to advance the identification of ecological roles and biotechnology potential of uncultured microbial groups. This genome collection enabled functional annotation and genus-level taxonomic assignments for &gt;80% of individual metagenome reads from the tropical and subtropical surface ocean, thus offering a model to improve reference genome databases for complex microbiomes.</p> <p><strong>License</strong></p> <p>CC BY-NC. See <code>license.txt</code> in OSF Storage for more details.</p> <p><strong>Contents</strong></p> <p>On Drive:</p> <ul> <li><code>gorg-tropics_sags_tableS2.xlsx</code><ul> <li>Table S2 from Pachiadaki et al describing all SAGs within</li> </ul> </li> <li><code>mock-metagenomes.tar.gz</code><ul> <li>SAGs used to construct mock metagenomes, and resulting mock metagenomic reads used to test the GORG-classifier</li> </ul> </li> <li><code>gorg-tropics</code><ul> <li><code>annotations</code><ul> <li><code>gbk.tar.gz</code><ul> <li>Prokka-generated genbank files for each SAG</li> </ul> </li> <li><code>tbl.tar.gz</code><ul> <li>Annotation tables using prokka and an updated swiss-prot</li> </ul> </li> </ul> </li> <li><code>contigs.tar.gz</code><ul> <li>Assembled contigs for all GORG-Tropics SAGs</li> </ul> </li> </ul> </li> </ul> <p>On OSF Storage:</p> <ul> <li><code>gorg-tropics</code><ul> <li><code>classifier-dbs</code><ul> <li>Files required for the GORG-classifier (<a href="https://github.com/BigelowLab/gorg-classifier" rel="nofollow">https://github.com/BigelowLab/gorg-classifier</a>)</li> <li><code>GORG_v1.fasta.gz</code><ul> <li>Concatenated contigs of the entire reference</li> </ul> </li> <li><code>GORG_v1.tsv.gz</code><ul> <li>Tab-delimited annotations that link back to .faa headers</li> </ul> </li> <li><code>GORG_v1_CREST.faa.gz</code><ul> <li>CREST taxonomy annotated FAA</li> </ul> </li> <li><code>GORG_v1_CREST.fmi</code><ul> <li><code>kaiju</code> index for CREST taxonomy annotations</li> </ul> </li> <li><code>GORG_v1_NCBI.faa.gz</code><ul> <li>NCBI taxonomy annotated FAA</li> </ul> </li> <li><code>GORG_v1_NCBI.fmi</code><ul> <li><code>kaiju</code> index for NCBI taxonomy annotations</li> </ul> </li> <li><code>taxonomy</code><ul> <li>These files are matched to IDs within their respective indexes and are used to add taxonomy names</li> <li><code>CREST</code><ul> <li><code>names.dmp</code></li> <li><code>nodes.dmp</code></li> </ul> </li> <li><code>NCBI</code><ul> <li><code>names.dmp</code></li> <li><code>nodes.dmp</code></li> </ul> </li> </ul> </li> </ul> </li> </ul> </li> </ul>
