The HTML versions of the R Markdown scripts are also available in interactive format [here](https://hartmast.github.io/Attack_of_the_snowclones/index.html).
Supplementary material for the paper "Attack of the snowclones: A corpus-based analysis of extravagant formulaic patterns".
The "scripts" folder contains R scripts in .Rmd (R Markdown) format, as well as HTML versions of those Markdown files.
The "data" folder contains the datasets that are needed to replicate our analyses:
- COCA2017_total_frequencies.xlsx: total frequencies of all subcorpora in the 2017 update of COCA
- COCA_X_is_are_the_new_Y.xlsx: attestations for *X is/are the new Y* in COCA
- ENCOW_x_is_the_new_y_without_false_hits.xlsx: attestations of *X/are the new Y* in ENCOW
- coca_moa_lemma_frequencies.csv: frequencies of all lemmas attested in *mother of all* across the entire corpus in COCA
- lemmatization.csv: manual lemmatization file used for *X BE the new Y*
- mother_of_all.xml: query results for *mother of all* in COCA (2017)
- mother_of_all_ENCOW.xlsx: query results for *mother of all* in ENCOW
- mother_of_all_with_encow_frequencies.csv: query results for *mother of all* in ENCOW, enriched with the corpus frequency of each lemma
- motherofall_COCA.xlsx: Excel version of the *mother of all* COCA query results
- x_is_the_new_y_encow_frequencies.csv: Query results of *X BE the new Y* in ENCOW, enriched with corpus frequencies for each lemma.
The "word2vec" folder contains the word2vec model trained on the first batch of the downloadable ENCOW data; see the Rmd documents (or the corresponding html sites) for details on how the model was trained.