Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
The generation and analysis of this collection of MAGs is detailed in https://www.biorxiv.org/content/10.1101/2021.10.13.464073v1 Each <bin>.tar.gz file contains the output from the Anvi'o summary, namely the contigs, gene sequences, reference marker sequences, gene coverages, mean genome coverage, N50, completion / redundancy estimates, and kofamscan annotations (unfiltered). The contig sequences were also uploaded to the ENA under project PRJEB36523. The methods are reproduced below. **Methods** Samples used to generate metagenomic libraries were collected in January 2019 during sampling campaign PNK 110. For each sample replicate, approximately 50 - 100 L of groundwater was filtered sequentially through 0.2 µm and 0.1 µm pore sized PTFE filters (142 mm, Omnipore Membrane, Merck Millipore, Germany; Table S1). With the exception of H32 (did not yield sufficient volumes), each well was sampled in triplicate. H32 was duplicated using a sample previously collected during campaign PNK108 (November 2018). Filters were frozen on dry ice and stored at -80 °C prior to extraction. DNA was extracted using a phenol-chloroform based method, as previously described59, and resulting DNA extracts were purified using a Zymo DNA Clean & Concentrator kit. Metagenome libraries were generated with a NEBNext Ultra II FS DNA library preparation kit, in accordance with manufacturer’s protocols. DNA fragment sizes were estimated using an Agilent Bioanalyzer DNA 7500 instrument with High Sensitivity kits depending on DNA concentrations and recommendations of protocols (Table S1). Sequencing of the 32 samples was performed at the Core DNA Sequencing Facility of the Fritz Lipmann Institute in Jena, Germany using an Illumina NextSeq 500 system (2 x 150bp). Resulting metagenomic library sizes ranged from 16.4 to 22.1 Gbp (mean = 19.6 Gbp; Table S1), and raw data was deposited into the ENA under project PRJEB36523. *Metagenomic assembly and binning* Adapters were trimmed and raw sequences subjected to quality control processing using BBduk v38.51. Assembly and binning were performed as previously described. Briefly, all libraries were independently assembled into scaffolds using metaSPAdes v3.12, all of which were taxonomically classified per Bornemann et al. For individual assemblies, open reading frames (ORFs) were identified using Prodigal v2.6.3 in meta mode. To generate coverage profiles, all quality-assessed and quality-controlled (QAQC) sequences from each of the 32 metagenomic libraries were mapped back to each of the 32 scaffold databases using Bowtie2 v2.3.4.3 in the sensitive mode. Scaffolds were binned using differential coverages and tetranucleotide frequencies with Maxbin. Additionally, ESOM and abawaca were used for both manual and automatic binning, based on tetranucleotide sequence signatures, using 3 kbp and 5 kbp or 5 kbp and 10 kbp as minimum scaffold sizes, respectively. DAS Tool was used with default parameters to reconcile resulting bin sets. Complete sets of bins from each of the samples were dereplicated using dRep v2.4.0. All scaffolds, bin assignments, ORF predictions, and taxonomic annotations were then imported into Anvi’o v6.0. Each of the resulting 1,275 bins was manually curated in Anvi’o v6, considering both coverage and sequence compositions. In the end, 1,224 bins passed the 30% completeness [median = 61%, IQR = (49%,73%)] and 10% redundancy [median = 0%, IQR = (0%,1.4%)] quality thresholds. *Characterizations of the Metagenome-assembled Genomes* ORFs originating from all of the resulting metagenome-assembled genomes (MAGs) were annotated using kofamscan with the “detail” flag, and KO annotations were filtered using a custom script (https://git.io/JtHVw). This utility preserves hits with scores of at least 80% of the kofamscan defined threshold, as well as those exhibiting a score > 100 if there is no threshold. We elected to relax the default thresholds since all MAGs representing putatively chemolithoautotrophic microbes were verified manually, and we noticed that the best reciprocal blast hits with known reference sequences routinely scored below the kofamscan thresholds, i.e., we favored false positives over false negatives since we included a secondary verification step. KEGGDecoder was used to assess the metabolic potential of five of the primary chemolithoautotrophic pathways: the Calvin-Benson-Bassham cycle, the Wood-Ljundahl pathway, the reverse citric acid cycle, the 4-hydroxybutyrate 3-hydroxypropionate pathway, and the 3-hydroxypropionate bicycle. MAGs were examined in greater depth if a given pathway was > 50% complete. MAGs representing potential chemolithoautotrophs were re-annotated using the online BlastKoala server with essential steps verified through blast against the RefSeq database. A collection of HMM models was used to determine which form of Rubisco was detected, along with potential hydrogenases. Using blastp, dissimilatory bisulfite reductases (dsrAB) were compared to a database compiled by Pelikan et al. to predict whether the pathway operated in an oxidative or reductive manner. Blast was used to compare gene hits for narGH/nxrAB (nitrate reductase / nitrite oxidoreductase) to a custom database based on sequences presented within Lücker et al. All QAQC reads were remapped to a database consisting of only contigs of dereplicated MAGs. Normalized coverages for each of the MAGs was determined by scaling the resulting Anvi’o-determined coverages based on the number of RNA polymerase B (rpoB) genes identified in the QAQC-filtered reads. RpoB sequences were identified using ROCker with the precomputed model76. Scaling factors were calculated by dividing the maximum number of rpoB identified in the 32 metagenomic libraries by the number of rpoB detected in each sample. Reported values represent averages of the triplicates/replicates, unless stated otherwise. The taxonomy of each MAG was evaluated using the GTDB\_TK tool kit in concert with the Genome Taxonomy Database (release 89) and its associated utilities. Single copy marker genes were identified and aligned with GTDB\_TK for all bacterial MAGs, and a phylogenetic tree of the concatenated alignment was constucted using FastTree2 v2.1.10 in accordance with the JTT+CAT evolutionary model. The resulting phylogenetic tree was then imported into iToL for visualization, and all MAGs were subjected to growth rate index (GriD) analysis within each metagenomic library. Previously generated mRNA-enriched and post-processed metatranscriptomic libraries were procured from project PRJEB2878387. The groundwater source of these metatranscriptomes was collected in August and November 2015. QAQC filtered reads were mapped to MAGs using Bowtie2 v2.3.5 in sensitive mode, and the total number of rpoB transcripts from each metatranscriptomic library were determined, as described above for metagenomes. The transcriptomic coverages for each ORF from each MAG were determined using Anvi’o v6 and normalized via scaling factor calculations based on the total number of rpoB reads from the original metatranscriptome library (i.e., the coverage of each ORF from each MAG was normalized to a community-wide estimate of the transcriptional activity of a house-keeping gene in each sample). Means were determined considering all of the metatranscriptomes generated from a given well, including different sampling timepoints. While well H32 was only sampled once, mean values from all other wells account for three to four metatranscriptome coverages each. Additionally, an average of the resulting normalized coverages for each MAG from each sample (sum of the MAG transcriptional coverage divided by the number of ORFs) was determined to estimate the relative transcriptional activity of the MAGs across the transect. Data was compiled and processed using R v.3.5.2 with Rstudio v1.1.463 and the tidyverse package, and color schemes were generated using the RColorBrewer utility. All MAGs were deposited in project PRJEB36505’s data repository.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.