312.1 Checksums on Modern Filesystems, or: On the virtuous consumption of CPU cycles

Alex Garnett; Justin Simpson; Mike Winter

doi:10.17605/OSF.IO/Y4Z3E

Title	Authors

Papers /

312.1 Checksums on Modern Filesystems, or: On the virtuous consumption of CPU cycles

Contributors:

Date created: | Last Updated:

: DOI | ARK

Creating DOI. Please wait...

Create DOI

Category: Communication

Description: Computing checksums to prevent bit rot is accepted wisdom in the digital preservation community. Yet in other domains, this wisdom is approached quite differently. New hashing algorithms continue to be developed in the cryptography community, typically with very different use cases in mind, focusing on encryption and security over integrity or identification. Checksumming is also a key feature of modern filesystems. The implementers of these filesystems concern themselves with block-level integrity, rather than focus on ‘files’ or objects in the way digital preservation systems do. Cloud-based object storage systems also compute checksums, providing integrity guarantees as part of the service. And there is the blockchain - distributed peer to peer systems where hashing is fundamental. How do we reconcile these different approaches to bit-level preservation using checksums? Can we compare the costs, in terms of compute resources or time, of the different approaches? Is there a way to verify the accepted wisdom of the digital preservation community and reconcile this with the diverse and expanding approaches to checksum validation? This paper describes how checksumming functionality is understood and implemented in modern filesystems. A cost analysis is presented, comparing different approaches to data integrity, including pure CPU checksumming with tools such as md5sum, the block-level metadata used by filesystems such as ZFS, and the contrast with data integrity done by cloud service providers’ object storage services. From this analysis we describe the benefits of developing a new standard for mapping the block-level metadata produced by filesystem checksum reporting tools to the file-centered checksum reporting and validation required for adherence to current expectations of accepted digital preservation best practices. By better understanding different approaches to data integrity, it is possible to make better use of the computer hardware dedicated to digital preservation, taking advantage of the increased computational efficiency of filesystem-level checksumming techniques. This work closes a gap between current best practices in digital preservation and in high-performance computing. Sample code paths for working with and validating block checksums are also demonstrated.

License: CC-By Attribution 4.0 International

Projects
Registrations

Results: All Projects Results: My Projects Results: All Registrations Results: My Registrations

Files

Loading files...

Citation

Components

312. Storage Organization and Integrity

Goethals

The two papers in Session 312 explore the issues and topics pertaining to the theme of Storage Organization and Integrity with recent examples of adva...

Select: All components ^*contains supplemental materials for a preprint

Loading projects and components...

Type the following to continue:

Recent Activity

Loading logs...

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.

Create an Account Learn More Hide this message

Main content

Papers /

312.1 Checksums on Modern Filesystems, or: On the virtuous consumption of CPU cycles

Files

Citation

Components

312. Storage Organization and Integrity

Tags

Recent Activity

Start managing your projects on the OSF today.

Main content

Links to this project

Papers /

312.1 Checksums on Modern Filesystems, or: On the virtuous consumption of CPU cycles

Link other OSF projects

Files

Citation

Components

312. Storage Organization and Integrity

Tags

Recent Activity

Start managing your projects on the OSF today.