This is a data repository for a large study that involved the analysis and prediction of homotypic (self-self) transmembrane domain interactions.
Publication:
- Yao Xiao, Bo Zeng, Nicola Berner, Dmitrij Frishman, Dieter Langosch, Mark George Teese
- Experimental determination and data-driven prediction of homotypic transmembrane domain interfaces
- Computational and Structural Biotechnology Journal
- Volume 18, 2020, Pages 3230-3242
- ISSN 2001-0370
- https://doi.org/10.1016/j.csbj.2020.09.035
Contributors:
- Yao Xiao
- Bo Zeng
- Mark Teese
- Dieter Langosch
- Dmitrij Frishman
Contact:
- Mark Teese
Affiliation:
- [Technical University of Munich][2]
- [TNG Technology Consulting GmbH][6]
Related website with machine-learning tool:
- www.thoipa.org
Related open-source software repositories:
- [THOIPApy code repository][3]
- [datoxr code repository][4]
- [pytoxr code repository][5]
Open Science Foundation Repository Contents:
- data
- THOIPA_data.7zip
- homologues (BLAST data files and alignments)
- interface_predictions (predictions from THOIPA, PREDDIMER, TMDOCK used for validation)
- interface_residues (experimental data on TM homodimer interfaces from NMR, ETRA, and crystal structure experiments)
- residue_properties (data on conservation, polarity, coevolution etc calculated for each residue of each TMD in each dataset)
- THOIPA_validation (raw validation data (ROC AUC, etc) for the THOIPA machine learning predictor. Also contains the machine-learning model, training_data, and feature importances)
- protein_lists
- [list of proteins in homotypic TM dataset, and also individual datasets. includes sequences in fastA format]
- figures
- DDR2 results and other scanning mutagenesis data
- methods
- hydrophobicity scales and other data related to methods
Data notes:
- The following sets of proteins are included in the protein_lists folder
- set05 : homotypic TMD dataset (combined ETRA, NMR, X-ray)
- set07 : test data for machine learning
- set08 : train data for machine learning
- folders labelled "old, deprecated data" refer to an older machine-learning model, trained on a slightly modified set05.
- the hierarchical data structure in THOIPA_data.7zip should in most cases be self-explanatory. Also, references and code for the processing of each file can be found in [thoipapy software][3] version 1.1.3. Most data can be recreated using the open-source thoipapy software.
[1]: http://cbp.wzw.tum.de/index.php?id=49
[2]: https://www.tum.de/en/
[3]: https://github.com/bojigu/thoipapy
[4]: https://bitbucket.org/yaoxiaorepos/datoxr
[5]: https://github.com/teese/pytoxr
[6]: https://www.tngtech.com/en/index.html