Main content

3. Using the Tool - SBS, ID, DBS Input

Menu

Loading wiki pages...

View
Wiki Version:
<h1> Using SigProfilerMatrixGenerator </h1> SigProfilerMatrixGenerator works in conjunction with [other SigProfiler tools][1] but can also be run alone on input datasets. This section goes over the function arguments, input files, and all the output folders and files in detail. @[toc](Quick Links) - [**Home**][2] - [**Installing** SigProfilerMatrixGenerator][3] - [**Using** SigProfilerMatrixGenerator - **Output**][4] - [**Quick Start Example** for SigProfilerMatrixGenerator][5] - [**Currently** Supported Genomes][6] ---------- From within a python session, you can now generate the matrices as follows: ``` $ python3 >>from SigProfilerMatrixGenerator.scripts import SigProfilerMatrixGeneratorFunc as matGen >>matrices = matGen.SigProfilerMatrixGeneratorFunc(project, genome, vcfFiles, exome=False, bed_file=None, chrom_based=False, plot=False, tsb_stat=False, seqInfo=False) ``` From within a R session, you can now generate the matrices as follows: ``` $ R >> library("reticulate") >> use_python("path_to_your_python3") >> py_config() >> library("SigProfilerMatrixGeneratorR") >> matrices <- SigProfilerMatrixGeneratorR("BRCA", "GRCh37", "/Users/ebergstr/Desktop/BRCA/", plot=T, exome=F, bed_file=NULL, chrom_based=F, tsb_stat=F, seqInfo=F, cushion=100) ``` ### Function Arguments ### These are the acceptable parameters that can be passed into the function call.<br> **Required:**<br> - **project:** Project name for this instance of matrix generation. <br> *Type:* string <br> *Example:* "alexandrov_lab_test_1" - **genome:** Reference genome to use for the matrix generation. <br> *Type:* string <br> *Example:* "GRCh37" - **vcfFiles:** Full path of the saved input files in the desired output folder. <br> *Type:* string <br> *Example:* "/Users/test/Desktop/alexandrov_lab_test_1" **Optional:**<br> - **exome:** Downsamples mutational matrices to the exome regions of the genome. <br> *Type:* boolean <br> *Default:* False <br> *Example:* exome=True - **bed_file:** Downsamples mutational matrices to custom regions of the genome. Requires the full path to the [BED file][7]. <br> *Type:* string <br> *Default:* None <br> *Example:* bed_file="/Users/test/Desktop/bed_files/sample_1.bed" - **chrom_based:** Outputs chromosome-based matrices. <br> *Type:* boolean <br> *Default:* False <br> *Example:* chrom_based=True - **plot:** Integrates with SigProfilerPlotting to output all available visualizations for each matrix. <br> Type: boolean <br> *Default:* False <br> *Example:* plot=True - **tsb_stat:** Outputs the results of a transcriptional strand bias test for the respective matrices. <br> Type: boolean <br> *Default:* False <br> *Example:* tsb_stat=True - **seqInfo:** Ouputs original mutations into a text file that contains the SigProfilerMatrixGenerator classificaiton for each mutation. <br> Type: boolean <br> *Default:* False <br> *Example:* seqInfo=True - **cushion:** Adds an Xbp cushion to the exome/bed_file ranges for downsampling the mutations. <br> Type: integer <br> *Default:* 100 <br> *Example:* cushion=250 All **string** arguments must be surrounded by quotation marks ex. *"test"* and all **boolean** arguments must be *True* or *False*. <br><br> ### Input File ### This tool currently supports the following formats: * [MAF][8] <br> Mutation Annotation Format [[example.maf][9]] * [VCF][10] <br> Variant Call Format [[example.vcf][11]] <br> If files are in .vcf format, each sample must be saved as a separate file. * [ICGC][12] <br> * Simple text file [[example.txt][13]] <br> The user must provide variant data adhering to one of these four formats. <br><br> ### Folder Structure ### ![enter image description here](https://files.osf.io/v1/resources/s93d5/providers/osfstorage/5c74312f82a3950017d11ff5?mode=render =50%x) <br> The final output is divided into three folders: * **Input:** Contains copies of the user-provided input files.![enter image description here](https://files.osf.io/v1/resources/s93d5/providers/osfstorage/5c74320162c82a0019db475f?mode=render =50%x) * **Logs:** Contains the error and log files for the submitted job. ![enter image description here](https://files.osf.io/v1/resources/s93d5/providers/osfstorage/5c74320f8d5d98001a39c827?mode=render =50%x) <br> All errors are saved in the *sigProfilerMatrixGenerator_[project]_[genome].err* file and all progress checkpoints are saved in the *sigProfilerMatrixGenerator_[project]_[genome].out* file within the specified output folder. * **Output:** Contains the DBS, SBS, INDEL, TSB, plots, and vcf_files folders. All matrices are saved in the appropriate folders. ![enter image description here](https://files.osf.io/v1/resources/s93d5/providers/osfstorage/5cc6a29c00a81000175b5de5?mode=render =50%x) <br><br> ### File Extensions ### All output files will have a file extension indicative of which arguments were passed in as **True**. By default, the files will have *.all* file extension. The rest of the file extensions are explained below. * *.exome* <br>exome argument was passed in as **True** and contains all the mutations mapped out to the exome. * *.region* <br> bed_file argument was passed in as **string** and contains all the mutations mapped out to the input bed_file regions. * *.chrx where x denotes which chromosome i.e. chr1, chrA, etc.* <br> chrom_based argument was passed in as **True** and contains all the mutations mapped out to each chromosome. <br><br> [1]: https://osf.io/mc45g/ [2]: https://osf.io/s93d5/wiki/home [3]: https://osf.io/s93d5/wiki/2.%20Installation/ [4]: https://osf.io/s93d5/wiki/4.%20Using%20the%20Tool%20-%20Output/ [5]: https://osf.io/s93d5/wiki/6.%20Quick%20Start%20Example/ [6]: https://osf.io/s93d5/wiki/7.%20Currently%20Supported%20Genomes/ [7]: https://samtools.github.io/hts-specs/VCFv4.2.pdf [8]: https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/ [9]: https://osf.io/dkjwr/ [10]: https://samtools.github.io/hts-specs/VCFv4.2.pdf [11]: https://osf.io/8vm4p/ [12]: https://docs.icgc.org/submission/guide/overview/submission-file-format/ [13]: https://osf.io/xfphr/
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.