Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
This is the bits from the chat during seminar visit Oct. 21 2017: 1) Katie will work on getting RNA extracted from dung beetles with Mojgan. Need Trizol (double check amounts), need to talk with OUHSC (https://omrf.org/research-faculty/core-facilities/next-generation-sequencing/ngs-pricing/). GOAL: 1) Set up basics to get running within the next two weeks 2) Samples to OMRF by the end of November. Things to check with OMRF: 1) How to do PO/account/etc. 2) How much RNA/concentration 3) How to transport sample (dry ice ok?) Gearing up to extract RNA: 1) Mojgan will cut beetles longitudinally in half, and extract RNA from one half (the other half will be used to do glycogen, lipid, and protein measures again). **Current transcriptomic pipeline:** *1) Generate transcriptome* 1) Trimmomatic on all reads 2) FastQC to assess quality 3) Assemble using Trinity (https://galaxy.inf.ethz.ch/?tool_id=trinity). 4) Cluster using cd-hit-est. 5) Use Diamond to BLAST again NR 6) Use Megan to sort out non-beetle reads 7) Blast remaining reads using Blast2GO 8) Annotate using Blast2GO *2) Differential expression* 1) Map all QC passed reads to annotated transcriptome using HiSAT 2) Generate transcripts using StringTie 3) Count number of transcripts using FeatureCounts. 4) Differential expression using DeSeq2 5) Used diff expressed transcripts to look for significantly enriched GO terms 6) Use KAAS to get KEGG annotations (http://www.genome.jp/tools/kaas/) 7) Use FeatureCount table to look at pathway enrichment. ##Received Sequencing Data April 13 2018## Notes: 1) Noticed high ribosomal RNA content (10 - 20%) 2) Decided to download Silva database of ribosomal sequences - https://www.arb-silva.de/ - SILVA Ref -> Eukaryota -> Opisthokonta -> Holozoa -> Metazoa (Animalia) -> Eumetazoa -> Bilateria -> Arthropoda -> Hexapoda -> Insecta -> Pterygota -> Neoptera -> Coleoptera 3) Will generally be following: https://galaxyproject.org/tutorials/nt_rnaseq/ 4) Also will use https://ribogalaxy.ucc.ie/ for rRNA removal 5) Attempted ribogalaxy, found no reads aligned to rRNA, decided to go ahead with Trinity assembly. (May 7 2018) **Ok, so the overall plan is:** 1) Ensure all raw data is uploaded to: (DONE) - Amazon Drive - done - OSF - done 2) Load data to ftp://usegalaxy.org with WinSCP (DONE) - Use FTP mode - Ensure mode is set to "passive" - Regular Galaxy credentials are needed 3) Load all sequences into usegalaxy.org (DONE) 4) Using guidelines here, organize into collections (DONE): - https://galaxyproject.org/tutorials/nt_rnaseq/ 5) Run initial FastQC, then Trimmomatic to remove adapters, then FastQC again (DONE) - Use Adaptclip first - Sliding window: quality score of 26 - Minimum length of 80 6) Collect data on total reads at beginning and end (produce Excel file) (DONE) 7) Then copy all Trimmomatic collections into a new "Trinity" history. Download these and upload them to Amazon Drive and OSF. 8) "Unhide datasets" to show original datasets. Using the concatenate (cat) command, concatenate all left and all right reads. (DONE) History here: https://usegalaxy.org/u/kmarsh32/h/all-files 9) Using RiboGalaxy, remove ribosomal sequences using Bowtie. Use FASTA file (included on Galaxy) of all beetle rRNA sequences. (Tried, decided was not helpful May 7 2018) 10) Run Trinity using default parameters. (Currently completed, although some questions remain: https://biostar.usegalaxy.org/p/27899/) First draft transcriptome: 223,786 sequences ) 11) Ran cd-hit-est with default parameters (n = 10) at: http://weizhongli-lab.org/metagenomic-analysis/result/?jobid=20180509061344090635018522 (DONE May 9, 2018, a total of 118,301 consensus sequences, N50 = 299) Info: https://github.com/weizhongli/cdhit/wiki/3.-User's-Guide#CDHITEST 12) Decided to rerun cd-hit-est with n = 8, c = .90 to increase amount of clustering (DONE May 15). 13) Reran Trinity May 22 on https://galaxy.ncgas-trinity.indiana.edu/ because of concerns about Galaxy Main assembly. Ended up with 161,002 sequences (MUCH better). Then ran Quast, using O. taurus genome size (40Gbp) found: N50 = 1833, MUCH better, DONE). 14) Reran cd-hit-est, n = 8, c = 0.9. Found: 106,020 sequences, n50 = 1764 (DONE) 14) Build custom NCBI protein database of all arthropod protein sequences (DONE May 24 2018). Used this to build Diamond database in Galaxy main, then conducted Diamond alignment with default settings and produced BLAST tabular file. 15) Using Megan, assigned taxonomy to those reads and extracted all reads that were assigned "arthropod" to form draft transcriptome (41,134 reads, DONE May 28). N50 = 2052. 16) Needed to use R to match those read names with the actual DNA sequences (DONE May 28). 17) Using Blast2GO, do a local BLAST against nr. Export as Blast XML. Notes on 17): - https://www.ncbi.nlm.nih.gov/books/NBK52637/ - Next time try: "perl update_blastdb.pl --decompress --passive base_database_name" (MUCH better) - To run local blast in Blast2GO, use .pal file ##Differential expression testing 1) Use HiSat2 in Galaxy to align trimmed reads to draft transcriptome (DONE). 2) Used Stringtie to assemble transcripts (DONE) 3) Use Stringtie to tie all the assemblies together (DONE) 4) Use FeatureCounts to get count tables (DONE) 5) Use DeSEQ2 to do differential expression testing (DONE). ##Other useful links: 1) Downloading Galaxy data: https://biostar.usegalaxy.org/p/25932/ 2) Use Git ##ALTERNATIVE: Try O. taurus genome mapping: https://galaxyproject.org/learn/custom-genomes/ https://data.nal.usda.gov/dataset/onthophagus-taurus-genome-assembly-10
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.