What's in a name? A large-scale computational study on how competition between names affects naming variation

doi:None

Title	Authors

Home

Repository for the paper **What's in a name? A large-scale computational study on how competition between names affects naming variation**. ---------------------------- --------------------------- **DEMO versions** of our analyses in Google Colab (recommended): - Objects: https://colab.research.google.com/drive/1o63vmYlVNTcp7RIvzP4cpijKwLkhJdZs?usp=sharing - Contexts: https://colab.research.google.com/drive/18r7UuVTGMZJ1ay4bmM0HaVe3WICwctuY?usp=sharing We also upload the Jupyter Notebook (offline) versions. ---------------------------- --------------------------- **Data**: - **manynames.tsv**: dataframe of the ManyNames dataset, which can be downlaoded from [here][1]. The column "altname" is added with the script *General_scripts/1-select_altname.py* - **have_context.txt**: list of ids of ManyNames objects depicted with some background (for which it makes sense to assess context typicality). It is the result of the script *General_scripts/3-label_ctx_objects.py* - **data_analysis1.csv**: data used in Analysis I to fit the regression model. It is the result of the script *Analysis1/1-prepare_data_model.py*. The model was fitted with the script *Analysis1/model/Rscripts/get-fits.R* - **data_analysis1_imput2.csv**: data used in Analysis I to fit the regression model with an alternative imputation method. It is the result of the script *Analysis1/1-prepare_data_model.py*. The model was fitted with the script *Analysis1/model/Rscripts/get-fits_imput2.R* - **data_analysis2.csv**: data used in Analysis II to fit the regression models. It is the result of the script *Analysis2/1-index_of_crowdedness.py*. The models were fitted with the scripts *Analysis2/model_unifactorial/Rscripts/get-fits.R* and *Analysis2/model_multifactorial/Rscripts/get-fits.R* --------- **Prototypes**: contains the visual prototypes used to run the analyses. Prototypes are in pickle dictionaries with prototype name as key and feature vector as value. - **object_prototypes.pkl** - **context_prototypes.pkl** ----------- **Features**: contains object and context visual features used to run the analyses. The features are in pickle dictionaries with *vg_image_id* corresponding to the ManyNames image as key and visual features (either object or context) as value. - **object_features.pkl** - **context_features.pkl** ---------- **General_scripts**: python scripts to run the analyses. - **1-select_altname.py**: selects, for each object in Manynames, the most frequent alternative name listed in the responses (if applicable), and saves it in the *altname* column. In case of a tie of name frequency, the choice is random. - **2-select_VG_objects.py**: starting from the names used in ManyNames, saves in a json file a list of VisualGenome objects called with that name. From the resulting object lists, we computed object and context prototypes. To run it, you have to download the [VisualGenome dataset][2] (v1.4 - files *objects.json* and *image_data.json*), and specify the directories of the downlaoded files as arguments. You also have to specify a saving directory for json file resulting from the script. - **3-label_ctx_objects.py**: produces *have_context.txt*, a list of ids of ManyNames objects depicted with some background (for which it makes sense to assess context typicality); takes as argument the directory of the [VisualGenome][3] downloaded images. - **4a-object_prototypes.py**, **4b-object_features.py**, **4c-context_prototypes.py**, **4c-context_features.py**: compute prototypes and visual features for objects and contexts. We used Hao Tan's [implementation][4] of Anderson et al.'s (2018) bottom-up attention model. To run the scripts, you have to clone the [github repository of the model][5], setup the model, and move these scripts in the directory *demo/*. The scripts take as arguments: - the directory of the json file output of *2-select_VG_objects.py* (i.e. the list of VG objects) - the directory of the model weights (to download following the instructions on the [github page][6] (we chose weights obtained after training with names and attributes). - the directory of [VisualGenome][7] downloaded images - a saving directory for prototypes and features Running these scripts may take hours. We provide the resulting prototypes and features in the folders *Prototypes/* and *Features*. - **5-cluster_spread**: for the names in the domain "people", computes the cluster spread (in object and context visual space) and assigns levels of name specificity. The output of the script is the file *Files_for_plots/people_cluster_spread.csv*. The plot *Plots/people_cluster_spread.pdf* is produced by the *script Rscripts_plots/plot-cluster_spread.R* - **6-plot_space**: performs 2D reduction on the prototypes. The output is the file *Files_for_plots/2D_space.csv*. The plots *Plots/object_space.pdf*, *Plots/context_space.pdf*, and *Plots/spaces_compared.pdf* are produced by the *script Rscripts_plots/plot-spaces.R* - **7-plot_example_images**: for the 5 most frequently attested names in Manynames, plots grids of images with very high and very low computationally-derived typicality scores. - **8-closest_prototypes.py**: runs analyses on the visual space, e.g. percentage of objects that have their top name as closest prototype - **9-correlation_analyses.py**: runs analyses of the correlation between the object typicality for the categories in the space and the proportion of speakers producing those names (Appendix D in the paper). ---------------- **Analysis1**: contains the files related to the Analysis I. - **1-prepare_data_model.py**: produces the files *Data/data_analysis1.csv* and *Data/data_analysis1_imput2.csv*, used to fit the regression models. - **2-analysis_prototypes_informativity.py**: computes topname-altname prototypes' similarities and produces the plot *Analysis1/Plots/hist_proto_similarities.png* - **model**: contains the files related to Analysis I regression model - **Rscripts**: - **get-fits.R**: script to fit the model - **get-fits_imput2.R**: script to fit the model with an alternative imputation method for the objects without context - **inspect-fits.R**: script to perform leave-one-out evaluation of the model - **diagnose-fits.R**: script to run model diagnostics - **variance.R**: script to compute the model's R2. - **brms-model.rds**: fits of our model, output the script *Analysis1/Rscripts/get-fits.R*. - **brms-model_imput2.rds**: fits of our model, output the script *Analysis1/Rscripts/get-fits_imput2.R* (that is, with a second imputation method for the images without context) - **indiv-loos/model.RData**: file of leave-one-out evaluation of our model, output of the script *Analysis1/Rscripts/inspect-fits.R*. - **analysis-evaluation-log.txt**: logs of model evaluation, output of the script *Analysis1/Rscripts/inspect-fits.R* - **diagnostics-log.txt**: logs of model diagnostics, output of the script *Analysis1/Rscripts/diagnose-fits.R* - **Plots**: contains a plot of the model estimates (*Analysis1/Plots/model_estimates.png*) and the histogram of topname-altname prototype similarities (*Analysis1/Plots/hist_proto_similarities.png*) - **Rsctipts_plots**: contains the script *plot-fits.R* to obtain the image *Analysis1/Plots/model_estimates.png*. ---------------- **Analysis2**: contains the files related to the Analysis II. - **1-index_of_crowdedness.py**: produces the file *Data/data_analysis2.csv*, used to fit the regression models. - **2-plot_index.py**: used to produce the visualization of the index of crowdedness *Analysis2/Plots/example_index.png* - **model_unifactorial**: contains the files related to Analysis II regression models fitted to assess the best gamma parameter for object and contex indicesof crowdedness. - **Rscripts**: - **get-fits.R**: script to fit the model - **inspect-fits.R**: script to perform leave-one-out evaluation of the model - **diagnose-fits.R**: script to run model diagnostics - **compare-fits.R**: script to run model comparisons - **variance.R**: script to compute the model's R2 - **brms-idx_obj1.rds** (and all the files with the same format): fits of models fitted with indices with different gammas. Output the script *Analysis2/model_unifactorial/Rscripts/get-fits.R*. - **indiv-loos/idx_obj1.RData** (and all the files with the same format): files of leave-one-out evaluation of the corresponding model, output of the script *Analysis2/model_unifactorial/Rscripts/inspect-fits.R*. - **analysis-evaluation-log.txt**: logs of model evaluation, output of the script *Analysis2/model_unifactorial/Rscripts/inspect-fits.R* - **diagnostics-log.txt**: logs of model diagnostics, output of the script *Analysis2/model_unifactorial/Rscripts/diagnose-fits.R* - **loo-comparison-obj.txt** and **loo-comparison-ctx.txt**: ranking of models fitted with index of object/context crowdedness with different gammas. Output of the script *Analysis2/model_unifactorial/Rscripts/compare-fits.R*. - **model_multifactorial**: contains the files related to Analysis II regression models fitted with the best index of object and context crowdedness. - **Rscripts**: - **get-fits.R**: script to fit the model - **inspect-fits.R**: script to perform leave-one-out evaluation of the model - **diagnose-fits.R**: script to run model diagnostics - **compare-fits.R**: script to run model comparisons - **brms-multi.rds**: model fits. Output the script *Analysis2/Rscripts/model_multifactorial/get-fits.R*. - **indiv-loos/idx_multi.RData**: files of leave-one-out evaluation of the model, output of the script *Analysis2/model_multifactorial/Rscripts/inspect-fits.R*. The folder contains also the corresponding files for the unifactorial models fitted with best index of object/context crowdedness, to allow model comparison. - **analysis-evaluation-log.txt**: logs of model evaluation, output of the script *Analysis2/model_multifactorial/Rscripts/inspect-fits.R* - **diagnostics-log.txt**: logs of model diagnostics, output of the script *Analysis2/model_multifactorial/Rscripts/diagnose-fits.R* - **loo-comparison.txt**: ranking of multifactorial and best unifactorial models. Output of the script *Analysis2/model_multifactorial/Rscripts/compare-fits.R*. - **Plots**: contains a plot of the model estimates (*Analysis2/Plots/model_estimates.png*) and the visualization of the index of crowdedness (*Analysis2/Plots/example_index.png*) - **Rsctipts_plots**: contains the script *plot-fits.R* to obtain the image *Analysis2/Plots/model_estimates.png*, and the script *plot-index.R* to obtain the image *Analysis2/Plots/example_index.png*. ---------------- **Norming_study**: contains the files related to our data collection of typicality judgments. - **select_stimuli.py**: script to select objects and contexts to annotate. Outputs *Norming_study/Stimuli/object_stimuli.csv* and *Norming_study/Stimuli/context_stimuli.csv* - **select_checks.py**: script to select the attention checks. Outputs *Norming_study/Stimuli/object_checks.csv* and *Norming_study/Stimuli/context_checks.csv* - **Stimuli**: contains the csv files output of the python scripts. - **ResultsNormingObj**: - **object_annotations.csv**: our collected object typicality judgments - **reliability.R**: script to compute the Cronbach's alpha of our data collection - **Plots/object_corr.png**: plot of human vs computationally-derived object typicality scores. - **ResultsNormingCtx**: - **context_annotations.csv**: our collected context typicality judgments - **reliability.R**: script to compute the Cronbach's alpha of our data collection - **Plots/context_corr.png**: plot of human vs computationally-derived context typicality scores. ---------------- **Files_for_plots**: contains the files used to make the plots stored in *Plots* - **2D_space.csv**: is the dimensionality reduction of our visual space. It is the output of the script *General_scripts/6-plot_spaces.py* and used for the plots *Plots/object_space.pdf* and *Plots/context_space.pdf*. - **people_cluster_spread**: contains information about the cluster spread of the "people" names. It is output of the script *General_scripts/5-cluster_spread.py* and used for the plot *Plots/people_cluster_spread.pdf*. ---------------- **Rscripts_plots**: contains Rscripts to produce plots - **plot_cluster_spread.R**: produces the plot *Plots/people_cluster_spread.pdf* - **plot_cluster_spaces.R**: produces the plots *Plots/object_space.pdf* and *Plots/context_space.pdf* ---------------- **Plots**: contains all the plots resulting from the scripts [1]: https://github.com/amore-upf/manynames [2]: https://visualgenome.org/api/v0/api_home.html [3]: https://visualgenome.org/api/v0/api_home.html [4]: https://github.com/airsplay/py-bottom-up-attention [5]: https://github.com/airsplay/py-bottom-up-attention [6]: https://github.com/airsplay/py-bottom-up-attention [7]: https://visualgenome.org/api/v0/api_home.html [8]: https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexus/subtlexus2.zip [9]: https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexus/subtlexus5.zip

Compare

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.

Create an Account Learn More Hide this message

Main content

Home

Menu

Start managing your projects on the OSF today.

Main content

Links to this project

Home

Menu

Add new wiki page

Page permissions have changed

Wiki page deleted

Connected to the collaborative wiki

Connecting to the collaborative wiki

Collaborative wiki is unavailable

Browser unsupported

Start managing your projects on the OSF today.