Main content

4. Output

Menu

Loading wiki pages...

View
Wiki Version:
Output ------ ---------- All example images here are the expected output from the Quick Start breast cancer example. After running SigProfilerClusters, the project folder should have the following file structure: ![Output file structure](https://mfr.osf.io/export?url=https://osf.io/9zsfj/?direct%26mode=render%26action=download%26public_file=False&initialWidth=774&childId=mfrIframe&parentTitle=OSF+%7C+output_fileStructure.png&parentUrl=https://osf.io/tfvq7/&format=2400x2400.jpeg =40%x) All folders highlighted below contain output from SigProfilerClusters. All other outputs are from [SigProfilerSimulator][1] and [SigProfilerMatrixGenerator][2]. - DBS - `plots` - SBS - `clustered` - `nonClustered` - `simulations` - `project_intradistance_genome_context` - `project_intradistance_original_genome_context` - project_simulations_genome_context - `project_simulations_genome_simContext_sorted` - vcf_files - `vcf_files_corrected` @[toc](Folders) ### clustered ### This folder contains the partition of clustered mutations, which have been subclassified into one of four categories (five if using VAFs/CCFs; see categories below). Within each subclass subfolder, a single VCF file is saved for each sample. When probability=True, an additional column is added to each VCF file. This column contains the probability of observing each clustered event (all mutations sharing the same group number value). This probability is calculated by assuming that each mutation in a single clustered event is independently generated and the probability of observing an entire event is equal to the product of the probabilities of observing each individual mutation. ### nonClustered ### This folder contains the partition of non-clustered mutations for either single base substitutions (SBS) or indels (ID). Within each subfolder, a single VCF file is saved for each sample. ### plots ### This folder contains the following two files: *project_intradistance_plots_context_corrected.pdf* This file contains a plot for each sample that displays the mutational spectra (left) of all of the mutations in the real sample (top), the clustered mutations (middle), and the non-clustered mutations (bottom). This visualization also shows the distribution of IMDs (right) for the different clustered and non-clustered substitutions in the real data (green) compared to the simulated data (red). The shaded red area represents the 95% confidence interval across the background model. <br> ![intradistance plot](https://mfr.osf.io/export?url=https://osf.io/98s7c/?direct%26mode=render%26action=download%26public_file=False&initialWidth=738&childId=mfrIframe&parentTitle=OSF+%7C+imd_plot.png&parentUrl=https://osf.io/z5wre/&format=2400x2400.jpeg =100%x) *rainfallPlots_clustered_project_corrected.pdf* This file contains a rainfall plot for each sample displaying the IMD distributions of substitutions across genomic coordinates. Each dot represents the minimum distance of a given mutation to the nearest adjacent mutation, colored based on its categorized mutational event. The horizontal red line reflects the sample-dependent IMD threshold for each sample. Clustered mutations may occur above this threshold as after performing regional corrections for mutation density. <br> ![rainfall plot](https://mfr.osf.io/export?url=https://osf.io/a627r/?direct%26mode=render%26action=download%26public_file=False&initialWidth=738&childId=mfrIframe&parentTitle=OSF+%7C+rainfall_plot.png&parentUrl=https://osf.io/vxwun/&format=2400x2400.jpeg =100%x) ---------- ---------- ### simulations ### #### [project]_intradistance_[genome]_[context] #### Folder containing the IMDs for each sample across all simulations. There is a single folder for each sample and single file for each simulation within this subfolder. ---------- #### [project]_intradistance_original_[genome]_[context] #### Folder containing the IMDs for all mutations in each sample found within the real data. ---------- #### [project]simulations_[genome]_[simContext]_sorted #### All simulations order alphanumerically by sample then by chromosome, and position. <br><br> ---------- ---------- ### vcf_files_corrected ### This folder contains the partions of clustered and nonclustered mutations after performing a localized IMD correction as described below.<br><br> #### [project]_clustered #### *[project]_clustered_vaf.txt* : Table of all clustered mutations with their corresponding IMD and variant allele frequencies (VAF) or cancer cell fractions (CCF). All header information is contained in the table below. Deprecated files: *[project]_clusters_of_clusters_imd.txt* and *[project]_clusters_of_clusters.txt* |Column|Name| |---|---| |1|project| |2|sample| |3|placeholder (for formatting purposes) | |4|genome| |5|mutation type| |6|chromosome| |7|start| |8|end| |9|reference allele| |10|alternate allele| |11|somatic or germline| |12|IMD plot (for plotting purposes) | |13|clustered group of mutations| |14|IMD| |15|VAF/CCF| |16|probability (if run with probability=True) <br> ---------- ##### SNV ##### - *[project]_clustered.txt* : Table of all clustered mutations. Same header as *[project]_clustered_vaf.txt* applies without the VAF/CCF column. - Remaining folders are matrices generated for all clustered mutations using [SigProfilerMatrixGenerator][3]. <br><br><br> ---------- ##### subclasses ##### Each type of mutation cluster has a folder containing: - *[project]\_clustered_[class].txt* : A file containing a table of all mutations assigned to that class. This file has the same header defined in the table above with an additional column containing class assignment of each mutation. - Folders of matrices generated by [SigProfilerMatrixGenerator][3] for each class of mutations. <br><br> The definition of each class is below: |Class|Definition| |---|---| |1|**Smaller clustered events. Class 1 is separated into different types of small clustered events below.**| |1a|Doublet base substitutions (DBS)| |1b|Multibase substitutions (MBS)| |1c|Omikli| |2|**Larger clustered mutations (i.e. kataegis). Class 2 is separated by processivity below.**| |2Y| Processivity with respect to pyramidine bases| |2K| Processivity with respect to ketones.| |2N|Not processive| |2S| Processivity with respect to strong base-pairs (ie. C:G or G:C)| |3|All remaining clustered events.| ---------- #### [project]_nonClustered #### This folder has an SNV subfolder that contains: - *[project]_nonClustered.txt* containing a table of all nonclustered mutations. This file contains all the information defined in the first table of this page except the IMD columns. - All remaining folders are output from [SigProfilerMatrixGenerator][3] for these nonclustered mutations. [1]: https://osf.io/usxjz/wiki/2.%20Simulations/#Output_72 [2]: https://osf.io/s93d5/wiki/home/ [3]: https://osf.io/s93d5/wiki/4.%20Using%20the%20Tool%20-%20Output/
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.