The literature folder contains Lens data that has been processed.The data consists of:
- antarctic_affiliation_edited.csv.bz2 A bzip file containing the available Microsoft Academic Graph (MAG) affiliation data. The table has been edited to improve coverage of affiliations per paper using the original author affiliation data where available. The paperid id is the join field.
- antarctic_authors.csv.bz2 An table of author infornation from Microsoft Academic Graph (January 2019 release). The paperid id is the join field.
- fos.csv Fields of Study table of article labels from Microsoft Academic Graph (January 2019 release). Use the paperid as the join field. A single record may attract more than one label.
- literature.csv consisting of the raw literature table plus addiitional columns for use as filters.
- literature.rda. The above for R users.
- textfields.csv A csv file consisting of Lens identifiers (lens_id) and the joined title, abstract, author keywords, fields of study and MeSH fields converted to lowercase for text mining. The field separator in the joined field is "_". Be aware of junk in the MAG data such as #R##R etc and the presence of na_na_na from uniting the text fields.
- literature_add_filters.R. An R script setting out the processing steps to add the filters.
- query.csv, query.rda and query_string.txt The search query used with the Lens. The query is adapted for use in R in query_string.txt by using the OR operator '|' as the separator and word boundaries for phrases "\\\b".