Language reflects social beliefs. Yet, until recently, quantifying such
beliefs remained impossible. Advances in machine learning, specifically
word embeddings, now make it possible to transform natural language
features (i.e., word concordance) into quantitative indices of beliefs.
Using word embeddings derived from 7 corpora (65+ million words), we
provide the first large-scale comparative test of gender stereotypes that
exist within children’s and adults’ natural linguistic corpora, including
child-produced, child-directed (parent speech, TV/movies, and books), and
adult-produced/-directed text (adult-to-adult speech, TV/movies, and
books). We find strong, pervasive associations of male/female with
work/home, science/arts, math/reading, and bad/good consistently across
*all* corpora. Further analyses of 170 trait words and 82 profession labels
reveal 23-24% of traits/professions are gendered and are also significantly
correlated with real-world gender representation. This approach illustrates
the gains from methodological advancements to provide new insights into
surprisingly early and widespread prevalence of gendered beliefs in
children’s and adult’s natural language.