Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
# Introduction This is a data and script repository for a specific study, submitted to eLife for peer review. The files contained in this repository contain a compiled data file (`data.txt`), containing bibliographic data and inferred gender for all papers included in our analysis (preprint here: https://arxiv.org/abs/2005.06303). Not all observations are relevant for the analysis, as some journals do not meet the criteria for minimum observations (n >= 5). The cleaned data set, also containing time-groupings, is in `data_final.txt`. The file `Readme.RMD` contains an R Markdown notebook which describes a) how to get from the first to the final data set, and how to perform the regressions and produce the figure for our analysis. ## Running the analysis Download all files to a folder and open as a project in Rstudio. Open the file `Readme.RMD` and either run the individual chunks or knit the file to produce relevant output. ## Data sets ### data.txt Description of columns in `data.txt`: - `journal_id`: consecutive ID of journals - `pmid`: The PubMed ID of the record. - `f_first`: Is the first author a woman? (0: no, 1: yes) - `f_last`: Is the last author a woman? - `f_share`: Proportion of authors who are women, 1 = all authors are women, 0 = all authors are men. - `scase`: Is this a COVID-19 paper [treatment] or not [control], 1: treatment, 0: control - `n_author`: number of authors for this paper. - `n_women`: number of women on the author list for this paper. - `n_men`: number of men on the author list for this paper. - `n_unknown`: number of names on the author list with unknown gender. - `us_first`: is the first author from the USA? (0: no, 1: yes) - `us_last`: is the last author from the USA? (0: no, 1: yes) - `received`: Date when paper was received. - `accepted`: Date when paper was accepted. - `date_pub`: Date when paper was published in print. - `date_epub`: Date when paper was published online. For the four dates, the value '2000-01-01' is a `NA` value. ### data_final.txt Contains the same information as `data.txt`, but only for journals with at least five observations. The '2000-01-01' date has been converted to `NA`. The following two columns were added: - `marchapril`: 1 = the paper is a COVID-19 paper and published in print or online (earliest date chosen) in March or April 2020 - `may`: as above, but May 2020 ### Sensitivity of data PMID, bibliographic data and names of all authors are in the public record. Gender of each author is inferred from the author's first name using Gender-API.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.