Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
## Example data for SLiM Discovery The attached data can be used for testing the SLiM discovery tools of [SLiMSuite](https://github.com/slimsuite/SLiMSuite/). ``` 315B POL30.Scer.apid-hq.acc 697K POL30.Scer.apid-hq.dat 35K POL30.Scer.apid-hq.fas 74M uniprot.yeast.147537.2019-05-14.fas.gz 50K elm2019.motifs 69K elm2019.split.motifs 39K elm2019.reduced.motifs 210B LIG_PCNA_PIPBox_1.motif ``` Example usage can be found in the [SLiMSuite CookBook](https://github.com/slimsuite/SLiMSuite/wiki/docs_md/CookBook.md). ### POL30 interactors The `POL30.Scer.apid-hq.*` files are three different formats of "high quality" interactors with _Saccharomyces cerevisiae_ protein POL30 (human PCNA orthologue) from the [APID](http://cicblade.dep.usal.es:8080/APID/init.action) interaction database: * `*.acc` = Uniprot accession numbers. * `*.dat` = Full uniprot flat file. * `*.fas` = Protein fasta file in [SLiMSuite format](http://slimsuite.blogspot.com/2015/10/file-format-fasta-seqfile-fasfile.html). These represent alternative input formats for the same data. ### Yeast proteomes The larger `uniprot.yeast.147537.2019-05-14.fas.gz` file is a gzipped fasta format of yeast proteomes downloaded from Uniprot on `2019-05-14`. This are all Uniprot proteomes for the [Saccharomycotina (true yeasts)](https://www.uniprot.org/taxonomy/147537) subphylum (TaxID:147537). Two proteomes with non-specific (`9XXXX`) species codes have been filtered out. These data are provided for using [GOPHER](http://rest.slimsuite.unsw.edu.au/gopher) to generate predicted orthologue alignments for conservation masking. ### ELM Data [ELM](http://elm.eu.org) motif classes (downloaded `2019-05-02`) are provided in the `elm2019.motifs` file. The `elm2019.split.motifs` file contains the same data split into different motif variants. SLiMSuite will generate this file when required if it does not already exist. `elm2019.reduced.motifs` contains a "reduced" set of ELM class definitions, generated using [SLiMMaker](http://rest.slimsuite.unsw.edu.au/slimmaker) as described in the [QSLiMFinder paper](https://www.ncbi.nlm.nih.gov/pubmed/25792551?dopt=Abstract) by aligning ELM instances for that motif and then extracting a regular expression motif pattern from the alignment. Reduced ELM definitions lose a lot of the complexity and curator-derived knowledge. They are primarily useful as a test dataset of "True Positive" motifs that should be recoverable from sets of ELM instance proteins, but may also be useful for cleaner [CompariMotif](http://rest.slimsuite.unsw.edu.au/comparimotif) screens for known motif due to their reduction in complexity. `LIG_PCNA_PIPBox_1.motif` contains a single motif for simplified examples. ### References **Gouw M1 et a;.** The eukaryotic linear motif resource - 2018 update. [_Nucleic Acids Res._ 46:D428-D434 (2018)](https://www.ncbi.nlm.nih.gov/pubmed/29136216) **The UniProt Consortium.** UniProt: a worldwide hub of protein knowledge. [_Nucleic Acids Res._ 47: D506-515 (2019)](https://doi.org/10.1093/nar/gky1049)
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.