Repository for the paper **Gualdoni, E., Brochhagen, T., Mädebach, A., & Boleda, G. (2022). Woman or tennis player? Visual typicality and lexical frequency affect variation in object naming. In *Proceedings of the 44th annual conference of the cognitive science society***.
----------------------------
---------------------------
**data**:
- **manynames.tsv**: dataframe of the ManyNames dataset, which can be downlaoded from [here][1]
- **have_context.txt**: list of ids of ManyNames objects depicted with some background (for which it makes sense to assess context typicality). It is the result of the script *scripts/3-label_ctx_objects.py*
- **data models.csv**: this is the result of the script *scripts/6-prepare_data_model.py*. It is the dataframe used to fit the statistical model with the script *model/Rscripts/get-fits.R*
---------
**prototypes**: contains the visual prototypes used to run the analyses. Prototypes are in pickle dictionaries with prototype name as key and feature vector as value.
- **object_prototypes.pkl**
- **context_prototypes.pkl**
-----------
**visual_features**: contains object and context visual features used to run the analyses. The features are in pickle dictionaries with *vg_image_id* corresponding to the object as key and visual features as value.
- **object_features.pkl**
- **context_features.pkl**
----------
**model**: contains the files related to the statistical model presented in the paper.
- **Rscripts**:
- **get-fits.R**: script to fit the model
- **inspect-fits.R**: script to perform leave-one-out evaluation of the model
- **diagnose-fits.R**: script to run model diagnostics
- **Rfits/brms-model.rds**: fits of our model, output the script *Rscripts/get-fits.R*.
- **loos/model.RData**: file of leave-one-out evaluation of our model, output of the script *Rscripts/inspect-fits.R*.
- **analysis-evaluation-log.txt**: logs of model evaluation, output of *inspect-fits.R*
- **diagnostics-log.txt**: logs of model diagnostics, output of *diagnose-fits.R*
----------------
**scripts**: python scripts to run the analyses.
- **1-select_altname.py**: selects, for each object in Manynames, the most frequent alternative name listed in the responses (if applicable), and saves it in the *altname* column
- **2-select_VG_objects.py**: starting from the names used in ManyNames, saves in a json file a list of VisualGenome objects called with that name. From the resulting object lists, we computed object and context prototypes. To run it, you have to download the [VisualGenome dataset][2] (v1.4 - files *objects.json* and *image_data.json*), and specify the directories of the downlaoded files as arguments. You also have to specify a saving directory for json file resulting from the script.
- **3-label_ctx_objects.py**: produces *have_context.txt*, a list of ids of ManyNames objects depicted with some background (for which it makes sense to assess context typicality); takes as argument the directory of [VisualGenome][3] downloaded images
- **4a-object_prototypes.py**, **4b-object_features.py**, **4c-context_prototypes.py**, **4c-context_features.py**: compute prototypes and visual features for objects and contexts. We used Hao Tan's [implementation][4] of Anderson et al.'s (2018) bottom-up attention model. To run the scripts, you have to clone the [github repository of the model][5], and move the scripts in the directory *demo/*. The scripts take as arguments:
- the directory of the json file output of *2-select_VG_objects.py*
- the directory of the model weights (to download following the instructions on the [github page][6])
- the directory of [VisualGenome][7] downloaded images
- a saving directory for prototypes and features
Running these scripts may take hours. We provide the resulting prototypes and features in the folders *prototypes/* and *visual_features*.
- **5-make_frequency_dictionary.py**: produces a pickle dictionary of name frequencies, for the names that appear in ManyNames; takes as arguments:
- directory of pre-processed [SUBTLEX word frequencies][8]
- directory of [SUBTLEX raw data][9]
- a saving directory for the resulting frequency dictionary.
- **6-prepare_data_model**: produces the dataframe used to fit the statistical model. Takes as argument the path to the dictionary of frequencies output of *5-make_frequency_dictionary.py*
- **7-hstogram_proto_similarities**: produces the histogram shown in the paper (Figure 6); takes as argument a saving directory for the image.
[1]: https://github.com/amore-upf/manynames
[2]: https://visualgenome.org/api/v0/api_home.html
[3]: https://visualgenome.org/api/v0/api_home.html
[4]: https://github.com/airsplay/py-bottom-up-attention
[5]: https://github.com/airsplay/py-bottom-up-attention
[6]: https://github.com/airsplay/py-bottom-up-attention
[7]: https://visualgenome.org/api/v0/api_home.html
[8]: https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexus/subtlexus2.zip
[9]: https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexus/subtlexus5.zip