Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
This project folder contains working data for the resubmission version of the Arctic algal genomes paper. Feel free to add in contributions you are making here, although please: - update the wiki page with a brief description (name, date, where to look, brief description) - attach a README file (methodology, key results, take home messages) with your data upload. -* **Contents** **Fasta files and PFAM data of algal genomes and transcriptomes:** (Richard Dorrell, Alan Kuo, Asaf Salamov, 25 April 2021)- linked archives of (1) the peptide sequence libraries for each Arctic genome (full and decontaminated versions) and all publically available marine algal genome and MMETSP reference transcriptome used for comparative PFAM and HGT analysis in the pan-algal dataset, sorted by taxonomic affiliation; and (2) full and decontaminated PFAM annotation files for each library included within the dataset, separated into genomes- and transcriptomes-only datasets. **Paper resubmission- 2022:** (Richard Dorrell, 4 June 2022) provides the current resubmission draft assembled for *Review Commons*. *Main text figures* (Richard Dorrell, 4 June 2022) provides completed figures (including legends) for the manuscript submission main text figures and tables. *Supporting figures* (Richard Dorrell, 4 June 2022) provides completed figures (including legends) for the manuscript submission supporting figures. *Supporting tables* (Richard Dorrell, 4 June 2022) provides completed supporting tables (including legends) for the manuscript submission. *Archived paper submissions:* (Richard Dorrell, 16 November 2020) provides paper resubmission drafts assembled for *Nature Microbiology*, *eLife* and *Review commons* between November 2019 and August 2022. *Supporting data folders:* (Richard Dorrell, Connie Lovejoy, 4 October 2022) provides documents submitted to *PLoS* in September 2022. **Supporting Data:** (Richard Dorrell, 26 November 2020) provides data for all analyses performed in this study not otherwise referenced in main text or supporting figures or tables. Data is divided thematically into the folders listed below; each folder contains a **README** file explaining its contents. *Multigene tree topologies* (Richard Dorrell, Zoltan Füssy, 1 April 2020) provides details of the BUSCO multigene tree generated for phylo-PCA and CAFE analysis. *18S and 16S trees*: (Richard Dorrell, Connie Lovejoy, 8 May 2020): contains raw and trimmed alignments of a selected set of NCBI and MMETSP 18S rDNA and chloroplast 16S rDNA sequences, with biogeographical provenance labelled, consensus MrBayes and RAxML topologies; corresponding alignments and RAxML trees for 18S alignments enriched with all TARA Oceans v4 or v9 sequences; and 16S alignments enriched with Tara Oceans v4v5 sequences; that are more closely related to four sequenced Arctic species (CCMP2293, CCMP2436, CCMP2298, or CCMP2097) than their closest non-Arctic relatives *TARA Oceans calculations*: (Richard Dorrell, Federico Ibarbalz, 26 November 2020) presents TARA abundance calculations for phylogenetically verified 18S v4 and v9; and 16S v4v5 ribotypes. These trees indicate probable ancient colonisations of the Arctic ocean by the Arctic cryptomonad and pelagophyte, recent colonisations by the haptophyte and chrysophyte, and largely Arctic specificity of each species. *Genome quantitative analysis*: (Richard Dorrell, Juan Jose Pierella Karlusich, Joel Dacks, 21 February 2022) presents excel sheets calculating the biogeographical isolation point, temperature range tolerance, and coding properties (numbers of gene models, PFAMs, complete single-copy or duplicated eukaryotic BUSCOs, and amino acid composition) for all genomes and transcriptomes referenced in this study. An interactive map of all culture strain isolation sites used in this study is provided at https://www.google.com/maps/d/u/0/edit?mid=1dLO8-_xddGCvsxPUtUAJyu00hqv3KVoJ. *phylPCA and CAFE analysis* (Andrei Stecca Steindorff, Alan Kuo, Richard Dorrell, 26 November 2020) provides outputs from phylogenetically aware principle coordinate analysis of PFAM distributions in algal genomes and transcriptomes, to compute overall similarity in the coding properties of different algal genomes and trasncriptomes; and CAFE analysis, to identify PFAMs expanded or contracted in different Arctic species within the complete dataset of algal genomes and transcriptomes. Finally, considering the union of this, and the Bray-Curtis distributions below, this folder tabulates PFAMs whose presence or absence are specifically associated with the divergence of Arctic species from non-Arctic relatives. *PFAM Bray-Curtis distributions:* (Richard Dorrell, Beth Richardson, 31 January 2021) contains excel sheets that calculate the relatedness between PFAM libraries from different algal species, demonstrating *(1)* Arctic species are convergent to one another in PFAM content; *(2)* this convergence is stronger than and separate to comparable convergence in Antarctic species and *(3)* Arctic diatoms do not participate in this convergence *Environmental PFAM distributions*: (Richard Dorrell, Juan Jose Pierella Karlusich, Nastasia Freyria, 4 June 2020): provides alignments and RAxML format trees of all sequences for five PFAMs enriched in arctic algal species (PF11999/IBPs; PF03988/DUF347; PF03831/PhnA; PF06017/ARF; and PF12213/DpnE); identified from a combined library of all sequences from uniref, MMETSP, jgi algal genomes, v2 *Tara* Oceans metaT libraries, and a dedicated transcriptomic survey of Northwater Inlet. These phylogenies reveal evidence for specific subgroups of these proteins unique to polar eukaryotic algae, including evidence for within-Arctic and within-Antarctic horizontal transfers of ice-binding proteins. *Within Arctic HGTs*: (Richard Dorrell and Nikola Zarevski, 8 May 2020) provides details of the LAST best hit methodology used to infer the probable size and dynamics of within Arctic HGTs in each sequenced species, alongside an excel sheet of candidate genes that may have undergone within Arctic HGTs. *Candidate within-Arctic HGT trees* (Richard Dorrell and Nikola Zarevski, 26 June 2020) details the BLAST and phylogenetic pipeline, curated alignments, and RAxML tree outputs of genes inferred to have been transferred within the Arctic ocean for each Arctic algal genome. *LHCs* (Richard Dorrell, Elisabeth Richardson, 14 June 2020 *deleted 2021, re-added 29 August 2022*) provides tabulated frequencies, alignments and phylogenies of all light-harvesting complex proteins identified in cryptomonad, chrysophyte, chlorophyte, haptophyte, pelagophyte, dictyochophyte, diatom, and dinoflagellate genomes and transcriptomes; and the frequencies with which different functional groups of Lhc proteins have undergone probable duplication, and horizontal transfer events, in Arctic, Antarctic and non-polar sequence libraries.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.