Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
Almost all machine learning (ML) is based on representing examples using intrinsic features. When there are multiple related ML problems (tasks) it is possible to alternatively use extrinsic features: predictions made about the examples by ML models learnt on other tasks. We call this transformational ML (TML). TML is closely related to and synergistic with stacking, multi-task learning, and transfer learning. TML is applicable to improving any non-linear ML method. We tested TML using the most important classes on non-linear ML: random forests, gradient boosting machines, support vector machines, k-nearest neighbour, and neural networks. To ensure the generality and robustness of the evaluation we utilised thousands of ML problems from three scientific domains: drug-design, predicting gene expression, and ML algorithm selection. We found that TML significantly improved the predictive performance of all the ML methods in all the domains (2%-50% average improvements); and that TML features generally outperformed intrinsic ones. Use of TML also enhances scientific understanding through explainable ML. In drug design we found that TML gave novel insight into drug target specificity, the relationships between drugs, and the relationships between targets. TML leads to an ecosystem-based approach to ML, where newly encountered learning problems, new examples, new predictions, etc. all synergistically interact to improve performance. TML study output ~50K models. We arrive at this number by counting the all the models we built for all the learners. For the random forest models, we treat each tree as an independent model since it's a decision tree. The size of all random fores models is ~ 100GB uncompressed. TML study also makes available: QSAR data – we extracted data for 2,219 protein targets from ChEMBLE. We used the RDKit to calculate the 1024 bits FCFP4 fingerprint to represent chemical compounds. Meta-learning data from OpenML Study 7, a filtered set of 10,840 evaluations on 351 tasks (datasets) and 53 machine learning methods. TML code (QSARs and Meta-learning) and Gene expression code.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.