Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
Presentation abstract: There is a pressing need to establish best practices for data curation professionals in response to the increasing prevalence and application of machine learning (ML) across disciplines. Broad sharing of ML outputs - which are resource intensive to create, requiring large amounts of training and test data, processing power, and specialized programming knowledge - can make future research more efficient and reusable. However, formal community-accepted guidelines and recommended practices for documenting and sharing ML objects are sparse within library-centric professions and across data repositories. In this talk, we will discuss an ongoing project to better understand current practices for sharing and reuse of ML components (data, code, workflows, etc.). A core part of this project is an in-depth analysis of ML objects from a selection of repositories that specialize in ML research workflows and outputs, as well as several generalist repositories including Figshare and Zenodo. By analyzing the metadata of ML objects extracted via API and web scraping, we aim to address a variety of questions relevant to reusability, such as: What is the most commonly used license for ML components? How often are training datasets, necessary for measuring ML model performance, included with an ML object? Is clear documentation of the software environment used provided? Answers to these questions are merely the first step in better understanding the landscape of ML objects, in the context of reusability. In addition to assessing how ML objects are being shared, we will also leverage the FAIR Principles to identify and classify the minimum viable metadata that makes an ML object and/or project reusable for a standard practitioner. We look forward to feedback from and discussion with the RDAP community.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.