“Cleaning” descriptive metadata is a frequent task in digital library work,
often enabled by scripting or OpenRefine. But what about when the issue at
hand isn’t an odd schema, trailing whitespace, or inconsistent
capitalization; but pervasive racial or gender bias in the descriptive
language? Currently, the work of seeking to remediate the latter tends to
be highly manual and reliant on individual judgment and prioritization,
despite their systemic nature.
This talk will explore what using programming to identify and address such
biases might look like, and argue that seriously considering such an
approach is essential to equitably publishing digital collections on a
large scale. I’ll discuss precedents and challenges for such work, and
share two small experiments to this end in Python: one aided by Wikidata to
replace LCSH terms for indigenous people in the U.S. with more currently
preferred terminology, and another using natural language processing to
identify where women are named as Mrs. [Husband’s First Name] [Husband’s
Last Name].