Data integrity is important in distributed systems. The same
characteristics that make these systems robust (e.g., fault tolerance) make
maintaining data integrity challenging. For this reason, hash functions
play a central role in the algorithms and technologies that power Usenet,
BitTorrent, and Bitcoin and its blockchain. A hash function is a function
that maps arbitrarily sized data to some ideally smaller, unique, and
non-invertable data of fixed size (the importance of these attributes will
be explained). The MD5 hash of the title of this presentation is
23c1d6085d85ae07378da9861e792c34; if the Oxford commas were removed, the
hash would change to 6eed93a3b7dc829f38065518b346ee72. If you were given
both the title and its hash, then you could compute the hash of the title
you received yourself and compare it to that of the hash you received. If
they differed, you would know that there was an error in transmission or
that an intermediate editor rejects clarity and civility. This presentation
will introduce hashes and their variants, these distributed and sometimes
dubious systems, and what can be learned and practically applied in today's
digital repositories for purposes of auditing, identifying, recovering, and
sharing data.