Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
# Quantifying the emergence of moral foundational lexicon in child language development Code and data to reproduce the results of the paper "Quantifying the emergence of moral foundational lexicon in child language development". ## Requirements ``` gensim==4.0.1 matplotlib==3.4.2 nltk==3.6.2 numpy==1.20.3 pandas==1.2.4 pylangacq==0.13.3 scipy==1.6.3 seaborn==0.11.1 spacy==3.0.6 statsmodels~=0.13.2 wordcloud~=1.6.0 scikit-learn~=0.24.2 glove~=1.0.0 torch~=1.6.0 claucy~=0.0.1.991 lemminflect~=0.2.2 ``` ## Data 1. Download ``GoogleNews-vectors-negative300.bin.gz`` from [https://code.google.com/archive/p/word2vec/](https://code.google.com/archive/p/word2vec/) and place it in the directory ``data/w2v``. 2. Download ``glove.6B.zip`` from [https://nlp.stanford.edu/projects/glove/](https://nlp.stanford.edu/projects/glove/) and place it in the directory ``data/glove``. 3. Download the SOCIAL CHEM 101 dataset from [https://github.com/mbforbes/social-chemistry-101](https://github.com/mbforbes/social-chemistry-101) and place it in ``data/social-chem``. 4. Run ``python3 src/data/childes_init.py`` to download and store the CHILDES database. If you want access to the intermediary datasets directly, instead of creating them using the code, please send an email with the subject "MFD emergence data" to armzn@cs.toronto.edu. ## Implementation To reproduce the clusters, run ``python3 src/data/cluster_childes.py``. The cluster results to run the rest of the experiments are already provided in the ``GMM_sentences`` folder. To reproduce the frequency of moral utterances, run ``python3 src/data/experiments/sentence_category.py`` To store the CHILDES (W2V) and CHILDES (GloVe) embedding models, run ``python3 src/data/experiments/childes_embeddings.py`` To reproduce the generalizability experiment, run the following scripts to store the test datasets: ``` python3 src/data/experiments/store_data_size.py --method w2v; python3 src/data/experiments/store_data_size.py --method glove; python3 src/data/experiments/store_data_size.py --method childes_w2v; python3 src/data/experiments/store_data_size.py --method childes_glove ``` Next, you can run the ``regression_model`` by adjusting the ``method``, ``identity`` and ``classification type`` as the following: ``` python3 src/data/experiments/regression_model.py --method w2v --identity child --binary --childes_train; python3 src/data/experiments/regression_mfrc.py --method w2v --identity child --binary --childes_train ``` To reproduce the controlling experiment, adjust ``method``, ``identity``, ``classification type``, and ``control set`` and run the following: ``` python3 experiments/control_regression.py --method w2v --identity child --binary; python3 experiments/control_regression_mfrc.py --method w2v --identity child --binary ``` Store the results for visualization by running: ``` python3 experiments/analyize_results.py python3 experiments/analyize_predictions.py --method w2v --identity child --binary ``` ## Visualization To generate the word cloud plot, run ``python3 experiments/mfd_wordcloud.py``. The rest of the display items can be regenerated by running the notebooks in ``notebooks/display_items``.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.