Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
**Resolving the complex genetic basis of phenotypic variation and variability of cellular growth** *Naomi Ziv, Bentley M. Shuster, Mark L. Siegal, David Gresham, 2017* ---------- For any questions, explanations, raw image data and information on production of figures, please contact Naomi Ziv, `nz375@nyu.edu` ---------- Explanation of files: ---------- **1) QTL mapping** A) HPC.QTL.R - R script used for QTL mapping (one and two dimensional QTL scans), run on high performance computing cluster. Two dimensional scan permutations are computationally intensive. Requires R packages ‘snow’ and ‘QTL’. Uses as input: L.geno.csv and H.geno.csv. Will create output: QTLmapping.Rdata B) Ziv2017QTLAnalysis.R - Main R script used for analysis of QTL (calling loci and fitting multi-QTL models). Requires R packages ‘QTL’ and ‘plyr’. Can be edited slightly to assess different thresholds. Uses as input: QTLmapping.Rdata and RenameQTLMarkers.txt. Will create output hkQTLworkspace0.05.Rdata C) L.geno.csv - Data for QTL mapping. Comma-delimited text file containing phenotype and genotype data for QTL mapping. Compatible with RQTL. Growth limiting glucose concentration (0.22mM glucose). Phenotypes are: GR - mean growth rate, GRsd - growth rate standard deviation, GRcount - number of colonies, lag - median lag duration, lagmed - lag MAD, lag count - number of colonies, GR.PC - plate corrected mean growth rate, GRsd.PC - plate corrected growth rate standard deviation (NOT USED), lag.PC - plate corrected median lag duration, lagmad.PC - plate corrected lag duration MAD (NOT USED), GR.GRsd.RES - loess residuals growth rate standard deviation on mean, lag.lagmad.RES - loess residuals lag duration MAD on median. D) H.geno.csv - Data for QTL mapping. Comma-delimited text file containing phenotype and genotype data for QTL mapping. Compatible with RQTL. Non-growth limiting glucose concentration (4.44mM glucose). Phenotypes are: GR - mean growth rate, GRsd - growth rate standard deviation, GRcount - number of colonies, lag - median lag duration, lagmed - lag MAD, lag count - number of colonies, GR.PC - plate corrected mean growth rate, GRsd.PC - plate corrected growth rate standard deviation (NOT USED), lag.PC - plate corrected median lag duration, lagmad.PC - plate corrected lag duration MAD (NOT USED), GR.GRsd.RES - loess residuals growth rate standard deviation on mean, lag.lagmad.RES - loess residuals lag duration MAD on median. E) RenameQTLMarkers.txt - Tab delimited text file describing the locations of markers used for genotyping, from (Gerke et al. 2009). F) QTLmapping.Rdata - R data file containing R variables created with the script HPC.QTL.R, loaded by Ziv2017QTLAnalysis.R for further QTL analysis. Contains: cross.H - RQTL cross object for high glucose, cross.L - RQTL cross object for low glucose, operm.hk.H - permutations for one dimensional scan for high glucose, operm.hk.L - permutations for one dimensional scan for low glucose, operm2.hk.H - permutations for two dimensional scan for high glucose, operm2.hk.L - permutations for two dimensional scan for low glucose, out.hk.H - one dimensional scan for high glucose, out.hk.L - one dimensional scan for low glucose, out2.hk.H - two dimensional scan for high glucose, out2.hk.L - two dimensional scan for low glucose. G) hkQTLworkspace0.05.Rdata - R data file containing R variables created with the script Ziv2017QTLAnalysis.R. Contains all of QTLmapping.Rdata variables plus: Afits.H - additive loci models for high glucose, Afits.L - additive loci models for low glucose, findunique - function for collapsing close loci, fitadd - function for fitting additive loci models, fitint - function for fitting additive and interaction loci models, getpys - function for getting approximate physical position, getQTL - function for identifying QTL, Ifits.H - additive and interaction loci models for high glucose, Ifits.L - additive and interaction loci models for low glucose, pys - approximate physical positions, qtls.H - QTLs identified for high glucose, qtls.L - QTLs identified for low glucose. H) Segdata.Rdata - R data file containing R variables. Contains: Segdata - microcolony growth data, see also SegregantGrowth.txt in section 3, agg - growth summary statistics. ---------- **2) Other analysis files** A) Ziv2017SEQAnalysis.R - R script used for analysis. Requires R packages ‘QTL’ and ‘plyr’. Uses as input: SEQdata.Rdata, Multipool.Rdata. Also uses QTLmapping.Rdata and RenameQTLMarkers.txt which are found in section 1. B) doMultipool.sh - shell script for running MULTIPOOL on many samples with different parameter settings. C) NZ_align_example.q - example of a file used for aligning fastq read files, script represents an example for a single sample, originally scripts iterated on all samples. D) NZ_snps_example.q - example of a file used for calling SNPs, script represents an example for a single sample, originally scripts iterated on all samples. E) snp.py - python script used for extracting and reorganizing information about SNPs. Creates .snp files. F) Oak.Vineyard.multipool.datagen.R - R script used for organizing sequencing data, in general and for multipool analysis. Uses as input: .snp files, creates SEQdata.Rdata, txt files per chromosome and sample with snp positions and read counts - used for multipool analysis and Multipool.Rdata. G) SEQdata.Rdata - R data file containing R variables. Contains: SEQdata - results from all sequencing analysis, see also SNPsummary.txt in section 3. H) Multipool.Rdata - R data file containing R variables. Contains: Multipool - results from all MULTIPOOL analysis, see also Multipool.txt in section 3. I) HXTdata.Rdata - R data file containing R variables. Contains: AllRepdata - Microcolony growth data for allele replacement strains, RecHemdata - Microcolony growth data for Reciprocol hemizygote strains, see also AlleleReplaceGrowth.txt and ReciprocalHemizygoteGrowth.txt in section 3. ---------- **3) Datasets in .txt format** A) SegregantGrowth.txt - Sergeant growth data, Tab delimited text file containing data for 1,304,734 micro-colonies. Columns are: GR - growth rate, lag - lag duration, colony - unique colony identifier (within plate), Isize - initial size, Genotype - strain, Environment - media, well, plate, field. Strains are: BC241 - Vineyard, BC248 - Oak, BC251 - F1, BC252 - F1 and 477 F2s with unique identifiers (i.e: F2-12-A1). B) AlleleReplaceGrowth.txt - Allele replacement strains growth data, Tab delimited text file containing data for 298,066 micro-colonies. Columns are: GR - growth rate, lag - lag duration, colony - unique colony identifier (within plate), Isize - initial size, Genotype - strain, Environment - media, well, plate, field. Strains are: DGY279 - Oak, DGY1059 - Oak with Vineyard HXT7 whole locus, DGY1150 - Oak with Vineyard HXT7 single aa, DGY278 - Vineyard, DGY1055 - Vineyard with Oak HXT7 whole locus, DGY1077 - Vineyard with Oak HXT7 single aa, DGY419 - F1, DGY1197 - F1 with Oak HXT7 whole locus, DGY1195 - F1 with Vineyard HXT7 whole locus. C) ReciprocalHemizygoteGrowth.txt - Reciprocol hemizygote strains growth data, Tab delimited text file containing data for 92,922 micro-colonies. Columns are: GR - growth rate, lag - lag duration, colony - unique colony identifier (within plate), Isize - initial size, Genotype - strain, Environment - media, well, plate, field. Strains are: F1-O6-KO - F1 with oak HXT6 knocked out, F1-W6-KO - F1 with vineyard HXT6 knocked out, F1-O7-KO - F1 with oak HXT7 knocked out, F1-W7-KO - F1 with vineyard HXT7 knocked out. D) SNPsummary.txt - Summary of all sequence data, Tab delimited text file summarizing SNPs identified by genomic sequencing. Contains 1,895,619 rows and columns: car - chromosome, pos - position, ref - reference allele, alt - alternate allele, qual - quality, RefF - number of reference forward reads, RefR - number of reference reverse reads, AltF - number of alternate forward reads, AltR - number of alternate reverse reads, Rdepth - read depth, Sample - sample identifier, set - identifier that groups samples, dilution - chemostat dilution rate (if applicable_, Id - SNP identifier (0 - not found in parents, 1 - oak, 2 - vineyard, 3 - found in both parents, 1.1/2.2 - positions where each parent has different allele from reference, 1.2/2.2 - alleles with frequency < 0.75 in parents). E) Multipool.txt - Summary of multipool results, Tab delimited file summarizing MULTIPOOL reseals. Contains 5,742,416 rows and columns: bin - bin start position, AF - allele frequency, LOD - LOD score, tag - identifier, car - chromosome, RF - MULTIPOOL recombination rate parameter, N - MULTIPOOL number of individuals parameter.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.