Main content
Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted From a BERT Text Classifier Match Human Judgments of Genre Typicality?
Date created: | Last Updated:
: DOI | ARK
Creating DOI. Please wait...
Category: Project
Description: Social scientists have long been interested in understanding the extent to which the typicalities of an object in concepts relate to their valuation by social actors. Answering this question has proven to be challenging because precise measurement requires a feature-based description of objects. Yet, such descriptions are frequently unavailable. For example, messages posted by users of social media consist of texts expressed in natural language, rather than neatly organized lists of features. In this paper, we show how deep learning methods can help social scientists address this challenge. We show that training a text categorization model with deep learning produces feature representations of text documents that can be used to measure typicality and that measures of typicality produced in this way correspond closely with human judgements. Our categorization model is based on BERT, a deep-language representation for language understanding that achieved state-of-the-art performance on a number of language understanding tasks and initiated a recent paradigm shift in the natural-language processing community. Direct link to Colab Notebook to compute typicality from text data: https://colab.research.google.com/drive/1nu_pg5ZweFLRR24nrWwNq97SmG24Gdvw?usp=sharing