Lightning Talk: May 1st, 11:55am <br/>
Breakout Session 1: May 1st, 1pm<br/>
Breakout Session 2: May 2nd, 11am<br/>
Reproducibility and preservation of research products (data, software, models, etc.) have becoming increasingly difficult in a digital environment. The evolution of tools, libraries, and formats makes it hard to recreate the environment in which research will yield the same results, or work at all. Beyond the technical limitations, there is a lack of documentation for digital research output the standard response “it’s in the paper” fails to take into account valuable information such as versions of software libraries in use. This is simply because that simply takes a lot of effort (and is human error prone), and isn’t seen as a value added to research. This lack of documentation, coupled with the fast paced landscape of research technology, only makes reproducibility more difficult.
This talk will present ReproZip, an open source software developed at New York University. ReproZip allows researchers to create a compendium of their research environment by automatically tracing research processes and identifying all their required dependencies (data files, libraries, configuration files, etc.). After two commands, the researcher ends up with aneat package of their research that they can then share with anyone else, regardless of operating system or configuration. These community members can unzip the package using ReproUnzip, and reproduce the findings regardless of differences in computational environments.
ReproZip automates the process of capturing technical and administrative metadata as well as provenance for entire research processes. This extremely detailed metadata is found in the configuration file, and can be extracted as a json file. This information on its own is extremely valuable for the archival process of research. Additionally, ReproUnzip works on a plugin model which allows for modularity and extensibility in preservation strategies and reproducibility of research.
Right now, users can use Docker or Vagrant to reproduce research across different computational environments. If Docker suddenly goes out of business, it's easy to write a plugin for another container service to ensure ReproZip packages can be unpacked reliably. We believe for these reasons that ReproZip can be of aid not only to researchers in making theirwork reproducible, but to the librarians and archivists making their work preservation ready.