This is the cleanup code and the final dataset for the thesis & dissertation visualization project.
View the visualization at: https://public.tableau.com/shared/M8896Y774?:display_count=yes
The dataset tdsFinal.csv includes all master's projects, theses and dissertations submitted to UHCL as of May 2018. Variables include:
bibrecord: The bibliographic record as cataloged by the Neumann Library
itemrecord: The item record as cataloged by the Neumann Library
title: The title of the work
author: The author of the work
callnumber: The call number of the work
note: The bibliographic notes field
subject: subjects in Library of Congress classification as cataloged by the Neumann Library
imprint: publication year
add.author: The additional author field
tot.chkout: The total number of checkouts
type: thesis, graduate project, or dissertation
college: college, updated to current college names
link: the hyperlink to the work
format: print or electronic
chair: chair/advisor
This dataset has gone through significant processing, as documented in the thesisCleaning pdf document included in this FigShare fileset. The thesisCleaning.Rmd file is the Markdown file used to generate the PDF.
Advisor and college names have been extracted from other fields. Chair names have been clustered in OpenRefine to create controlled names (e.g. consistent use of middle initials).