Electronic Resources & Libraries S54 Confronting the Elephant in the Room: Cleaning and Wrangling Data for Collections and Scholarly Services

doi:10.17605/OSF.IO/A5P3R

Title	Authors

Home

**Session Description:** Have you attended a presentation in which data visualizations answered key questions for stakeholders? You plan to replicate it for your library, only to find that you are missing the intermediate steps of how to get the data into a usable state. This presentation will call the elephant out. It is often stated that 80% of time in an analysis project is devoted to data cleaning. This is certainly a challenge for libraries as the data comes from so many different sources. It is the getting, cleaning, and transforming phase, which feeds into the visualization and modeling phase; yet, we often gloss over this part of our work. This presentation will break down the challenges of collecting and creating collections datasets and merging them together into interactive visualizations. This included gathering and cleaning data, fuzzy merging messy text strings, reshaping data from wide to long format, and making decisions on handling duplicate and missing values. The wrangling of data together into an interactive visualization with data filters adds immense value by enlarging the context of decision-making. This presentation will discuss case studies demonstrating ways that data expertise has elevated our work with collections and in the creation and dissemination of scholarship. Discussing the challenges of data wrangling will make assessment feasible for librarians wanting to review their collections and projects. It will also serve as another call to data providers to provide clean, standardized, and interoperable data. **Session Examples:** Database Usage Visualization: View the visualization at: https://tabsoft.co/2C8dXpE - This has been edited for a public audience; therefore some information such as cost of the resources is not present and the usage data does not include the most recent usage. Thesis & Dissertation Visualization: View the visualization at: https://tabsoft.co/2C6BgQr - See the OSF Component at https://osf.io/csq23/

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.

Create an Account Learn More Hide this message

Main content

Home

Menu

Start managing your projects on the OSF today.

Main content

Links to this project

Home

Menu

Add new wiki page

Page permissions have changed

Wiki page deleted

Connected to the collaborative wiki

Connecting to the collaborative wiki

Collaborative wiki is unavailable

Browser unsupported

Start managing your projects on the OSF today.