This component contains files and scripts of relevance to the analyses of the total proteomics dataset obtained from human induced pluripotent stem cells with the following PIK3CA genotypes: WT/WT, WT/H1047R, H1047R/H1047R.
Protein quantitation was performed using a novel Bayesian approach based on the Markov Chain Monte Carlo (MCMC) method (Robin et al. 2019) and was carried out by Dr Xavier Robin, Linding Lab. For further details, see the directory: Mass_spec_workflow_and_MSMS/
The directory Post_MCMC_analysis_R/ contains RNotebook and .RData files necessary to reproduce all subsequent results using the MCMC-determined protein ratios.
Specific experimental details as obtained from the accompanying publication:
**Label-free total proteomics**
*Sample preparation*
Cells were cultured to subconfluence in Geltrex-coated T175 flasks, and protein was harvested by lysis in 3 ml modified RIPA buffer (50 mM Tris-HCl pH 7.5, 150 mM NaCl, 1% NP-40, 0.5% Na-deoxycholate, 1 mM EDTA) supplemented with phosphatase inhibitors (5 mM ß-glycerophosphate, 5 mM NaF, 1 mM Na3VO4) and protease inhibitors (Roche cOmplete ULTRA Tablets, EDTA-free). The lysates were sonicated on ice (4x 10s bursts, amplitude = 60%; Bandelin Sonopuls HD2070 sonicator) and spun down for 20 min at 4300g. Ice-cold acetone was added to the supernatant to achieve a final concentration of 80% acetone, and protein was left to precipitate overnight at -20 degrees Celsius.
Precipitated protein was pelleted by centrifugation at 2000g for 5 min and solubilized in 6 M urea, 2 M thiourea, 10 mM HEPES pH 8.0. Protein was quantified using the Bradford assay and 8 mg of each sample were reduced with 1 mM dithiothritol, alkylated with 5 mM chloroacetamide and digested with endopeptidase Lys-C (1:200 v/v) for 3 h. Samples were diluted to 1 mg/ml protein using 50 mM ammonium bicarbonate and incubated overnight with trypsin (1:200 v/v). Digested samples were acidified and urea removed using SepPak C18 cartridges. Peptides were eluted, and an aliquot of 100 μg set aside for total proteome analysis. The peptides were quantified using the Pierce quantitative colorimetric peptide assay. The equalized peptide amounts were lyophilized and resolubilized in 2% acetonitrile and 1% trifluoroacetic acid in order to achieve a final 2 μg on-column peptide load.
*Mass spectrometry (MS) data acquisition*
All spectra were acquired on an Orbitrap Fusion Tribrid mass spectrometer (Thermo Fisher Scientific) operated in data-dependent mode coupled to an EASY-nLC 1200 liquid chromatography pump (Thermo Fisher Scientific) and separated on a 50 cm reversed phase column (Thermo Fisher Scientific, PepMap RSLC C18, 2 uM, 100A, 75 um x 50 cm). Proteome samples (non-enriched) were eluted over a linear gradient ranging from 0-11% acetonitrile over 70 min, 11-20% acetonitrile for 80 min, 21-30% acetonitrile for 50 min, 31-48% acetonitrile for 30 min, followed by 76% acetonitrile for the final 10 min with a flow rate of 250 nl/min.
Survey-full scan MS spectra were acquired in the Orbitrap at a resolution of 120,000 from m/z 350-2000, automated gain control (AGC) target of 4x105 ions, and maximum injection time of 20 ms. Precursors were filtered based on charge state (≥2) and monoisotopic peak assignment, and dynamic exclusion was applied for 45s. A decision tree method allowed fragmentation for ITMS2 via electron transfer dissociation (ETD) or higher-energy collision dissociation (HCD), depending on charge state and m/z. Precursor ions were isolated with the quadrupole set to an isolation width of 1.6 m/z. MS2 spectra fragmented by ETD and HCD (35% collision energy) were acquired in the ion trap with an AGC target of 1e4. Maximum injection time for HCD and ETD was 80 ms for proteome samples.
*Whole-exome sequencing (WES) and FASTA file generation*
WES was performed on a single clone per genotype to generate cell-specific databases for downstream mass spectrometry searchers. Genomic DNA was extracted with Qiagen’s QIAamp DNA Micro Kit according to the manufacturer’s instructions, followed by quantification using the Qubit dsDNA High Sensitivity Assay Kit and by dilution to 5 ng/μl in the supplied TE buffer. The samples were submitted for library preparation and sequencing by the SMCL Next Generation Sequencing Hub (Academic Laboratory of Medical Genetics, Cambridge). Sequencing was performed on an Illumina HiSeq 4000 with 50X coverage across more than 60% of the exome in each sample. Raw reads were filtered with Trimmomatic (Bolger, Lohse, & Usadel, 2014) using the following parameters: headcrop = 3, minlen = 30, trailing = 3. The trimmed reads were aligned to the human reference genome (hg19 build) with BWA (H. Li & Durbin, 2010), followed by application of GATK base quality score recalibration, indel realignment, duplicate removal and SNP/indel discovery with genotyping (McKenna et al., 2010). GATK Best Practices standard hard filtering parameters were used throughout (Depristo et al., 2011).
In order to find non-reference, mutated peptides in the MS data, we increased the search FASTA file with mutations affecting the protein sequence, as detected by WES with a high sensitivity filter: QD < 1.5, FS > 60, MQ > 40, MQRankSum < -12.5, ReadPosRankSum < -8.0, and average DP > 5 per sample. The Ensembl Variant Effect Predictor (VEP) with Ensembl v88 was used to predict the effect of the mutations on the protein sequence (McLaren et al., 2016). For every variant with an effect on the protein sequence we added the predicted mutated tryptic peptide at the end of the protein sequence.
*Mass spectrometry searches*
Raw files were processed using MaxQuant 1.5.0.2 (Tyanova, Temu, & Cox, 2016) with all searches conducted using cell-specific databases (see Whole-exome sequencing and FASTA file generation), where all protein sequence variants were included in addition to the reference (Ensemble v68 human FASTA). Methionine oxidation, protein N-terminal acetylation and serine/threonine/tyrosine phosphorylation were set as variable modifications and cysteine carbamidomethylation was set as a fixed modification. False discovery rates were set to 1% and the “match between runs” functionality was activated. We filtered out peptides that were associated with multiple identifications in the MaxQuant msms.txt file, had a score < 40, were identified in the reverse database or came from known contaminants. Analysis of the observed peptides passing these filters was performed using a Monte Carlo Markov Chain model as described previously (Robin et al., 2019). Briefly, the model predicted the average ratio (sample versus control) of a peptide as a function of the observed protein concentration (obtained from the MaxQuant evidence.txt file). Combined with a noise model, a distribution of likely values for the parameters was obtained. The mean and standard deviation of this resulting distribution was used to calculate a z-score which was used together with the fold-change (FC) for subsequent filtering for differentially expressed proteins (|z| > 1.2; |ln(FC)| > ln(1.2)