Main content

Date created: | Last Updated:

: DOI | ARK

Creating DOI. Please wait...

Create DOI

Category: Project

Description: The development of computational social sciences has enhanced our ability to use databases of digitized culture to quantify the past. Concomitantly, new historical econometrics tools have allowed the estimation of socio-economic variables further into the past. Together, these historical cultural and socioeconomic data allow an unprecedented capacity to describe the relationship between culture, historical events and socioeconomic dynamics. Here, we focus on the analysis of texts using bags-of-word frequencies, describe potential challenges, and propose a pipeline to improve validity and generalizability. In particular, while the gold standard approach for bags-of-words – the Linguistic Inquiry and Word Count – has been validated with psychometric experimentation in modern participants, it has two main limitations. First, it is limited in the number of variables that we can explore; and second, because it has been validated for modern language users it might not be valid for other historical contexts. Here we offer a complementary approach which ensures the i) historical adequacy of the search terms, ii) the measurements’ internal coherence and iii) external validation vis a vis other tools. We present the pipeline, examples and scripts which might assist junior researchers to develop custom bags-of-words and conduct their own analysis of historical texts.

Files

Loading files...

Citation

Recent Activity

Loading logs...

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.