**Visualization for International Pronouns Day 2021**
In the [Computational Social Science of Emerging Realities Group], we study identity using social media data. Specifically, we have examined the text of Twitter profile bios to [reveal trends in identity change].
Here we describe the rapidly growing prevalence of "pronoun-slash-lists" within Twitter bios of US users. A pronoun-slash-list (PSL) is a list of two or three pronouns separated by slashes, e.g. "he/him." For simplicity, we have limited our inquiry to three common PSLs: she/her, he/him and they/them.
In a longitudinal sample of over a million Twitter users, we observe that the prevalence of each PSL grew significantly from February 2015 to December 2020. (Specific methodological details for this visualization are below, and our preprints and articles contain further information on similar analyses.)
![Prevalence of pronoun lists in US Twitter bios]
The inset measures the same phenomenon in another way. In this case, we measure prevalence of the word "pronouns" within bios. The use of "pronouns" in bio text frequently (but not always) signals discussion of personal pronouns. Use of this word within bios increased seven-fold (from 2 per 10,000 to 14 per 10,000) from 2015 to 2020.
As is plain to see in the visualization, *she/her* has seen the fastest growth in prevalence, followed by *he/him* and *they/them*. Prevalence of the PSLs *she/them* and *he/them* (not shown) is less than *they/them*. In the future, as we develop this project, we will estimate the prevalence of other pronouns and PSL combinations and continue analysis up to the present date.
The sample was a set of 1,353,325 Twitter users. These users satisfied this set of criteria:
- They indicated a US location in the location field of their Twitter bio.
- They tweeted at least once per year each and every year from 2015 through 2020.
The user bios were observed in the 1% random sample of all tweets collected through the Twitter API.
Prevalence was calculated by counting users with matching bios and dividing by total users, then multiplying by 10,000. This procedure expresses prevalence in terms of incidence per 10,000 users. In the inset, we estimate prevalence at annual resolution after sampling the data to exactly one bio per user per year. In the main figure, we estimate prevalence at daily resolution after sampling the data to at most one bio per user per day and inferring presence or absence in unobserved periods.
Matching was performed on the text of the bio using the following regular expressions. Matches were set to be case-insensitive. It is important to be clear that the calculation always counted users and not raw matches. A bio that read "he/him he/him he/him" counted as one user who matched the pattern `\bhe\s*/\s*him\b`, even though the pattern happens to match three times.
For those less familiar with regular expressions, `\b` matches a word boundary, and `\s*` matches any amount of whitespace (including none). Thus, matching text contained a PSL using a slash to separate the indicated pronouns. Any amount of whitespace surrounding the slash was allowed.