Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
This is a data repository for a large study that involved the analysis and prediction of homotypic (self-self) transmembrane domain interactions. Publication: - Yao Xiao, Bo Zeng, Nicola Berner, Dmitrij Frishman, Dieter Langosch, Mark George Teese - Experimental determination and data-driven prediction of homotypic transmembrane domain interfaces - Computational and Structural Biotechnology Journal - Volume 18, 2020, Pages 3230-3242 - ISSN 2001-0370 - https://doi.org/10.1016/j.csbj.2020.09.035 Contributors: - Yao Xiao - Bo Zeng - Mark Teese - Dieter Langosch - Dmitrij Frishman Contact: - Mark Teese Affiliation: - [Technical University of Munich][2] - [TNG Technology Consulting GmbH][6] Related website with machine-learning tool: - www.thoipa.org Related open-source software repositories: - [THOIPApy code repository][3] - [datoxr code repository][4] - [pytoxr code repository][5] Open Science Foundation Repository Contents: - data - THOIPA_data.7zip - homologues (BLAST data files and alignments) - interface_predictions (predictions from THOIPA, PREDDIMER, TMDOCK used for validation) - interface_residues (experimental data on TM homodimer interfaces from NMR, ETRA, and crystal structure experiments) - residue_properties (data on conservation, polarity, coevolution etc calculated for each residue of each TMD in each dataset) - THOIPA_validation (raw validation data (ROC AUC, etc) for the THOIPA machine learning predictor. Also contains the machine-learning model, training_data, and feature importances) - protein_lists - [list of proteins in homotypic TM dataset, and also individual datasets. includes sequences in fastA format] - figures - DDR2 results and other scanning mutagenesis data - methods - hydrophobicity scales and other data related to methods Data notes: - The following sets of proteins are included in the protein_lists folder - set05 : homotypic TMD dataset (combined ETRA, NMR, X-ray) - set07 : test data for machine learning - set08 : train data for machine learning - folders labelled "old, deprecated data" refer to an older machine-learning model, trained on a slightly modified set05. - the hierarchical data structure in THOIPA_data.7zip should in most cases be self-explanatory. Also, references and code for the processing of each file can be found in [thoipapy software][3] version 1.1.3. Most data can be recreated using the open-source thoipapy software. [1]: http://cbp.wzw.tum.de/index.php?id=49 [2]: https://www.tum.de/en/ [3]: https://github.com/bojigu/thoipapy [4]: https://bitbucket.org/yaoxiaorepos/datoxr [5]: https://github.com/teese/pytoxr [6]: https://www.tngtech.com/en/index.html