This wiki details the code and data supporting the manuscript "Derivation and validation of the ACT-FAST action concept taxonomy". The folders in this project approximately map onto the studies in the paper, although some data contributes to multiple different studies within the paper. Data and code in this repository also support [this][1] related project - see its Wiki for details.
Data and code dictionary:
* /Text analysis: derivation - Study 1
* actions_60_1.csv - first set of actions selected for Study 4 (confirmatory)
* actions_60_2.csv - second set of actions selected for Study 4 (exploratory)
* actnouns.R - filter nouns down to set used in PCA
* AskReddit.zip - reddit comments
* fullactions1.csv - actions_60_1.csv with definitinos
* fullactions2.csv - actions_60_2.csv with definitions
* init_nouns.csv - initial set of nouns from frequency norms
* init_verbs.csv - initial set of verbs from frequency norms
* loadings_8pcs.csv - noun loadings from PCA
* measure_basis.py - count the frequency of verb-noun co-occurrences in reddit comments
* noun_wordnet.py - cross-reference nouns with wordnet lemmas
* nounset.csv - wordnet synsets for nouns
* occur_final.csv - verb-noun co-occurence table over which PCA was performed
* scores_8pcs.csv - verb scores from PCA
* SUBTLEX-US.csv - subtitle-based word frequency norms
* verb_pca.R - PCA on verb-noun co-occurence
* verbset.csv - wordnet sysnsets for nouns
* wordnet.py - cross-reference verbs with wordnet lemmas
* wordnet_noun_synsets.csv - see nounset
* wordnet_synet.cv - see verbset
* /Text analysis: generalization - Study 2
* cooccur_log.txt - log from co-occurence script
* script_cooccur.py - count verb-noun co-occurrences
* script_log.txt - log for script scraping
* script_scrape.py - scrapes scripts from IMSDb
* scripdata_final.csv - scraped script data
* text_validation.R - analyze generalization from Study 1 reddit PCA to movie scripts
* /Text analysis: classification - Study 3
* classification.R - perform subreddit-of-origin on comments using PCA dimensions and GloVe dimensions.
* (food/music/personalfinance/programming/relationships/sport).csv - raw comments from respective subreddits
* (food/music/personalfinance/programming/relationships/sport)_score.csv - comments scored on PCA dimensions
* (food/music/personalfinance/programming/relationships/sport)_score.csv - comments scored on GloVe dimensions
* score_text.py - use PCA scores (see Study 1) to score reddit comments
* score_text_glove.py - score text based on pre-trained GloVe dimensions (not included due to size, but available here: http://nlp.stanford.edu/data/glove.840B.300d.zip)
* /Exploratory ratings - Study 4 (part 1)
* analyze_exploration.R - correlate ratings of dimension definitions with PCA scores to find matches
* dimension_summaries.csv - summary statistics (N/reliability) for each of the rated dimension definitions
* exp_avgs.csv - averaged (across participant) ratings of 2nd set of actions (see study 1) on dimensions definitions
* exp_cmat.csv - correlation matrix between rated dimensions and PCA scores
* exp_savgs.csv - averaged ratings for just the selected set of matched definitions
* exp_scmat.csv - correlations between PCA scores and rated definitions for just selected definitions
* exploratory_ratings.csv - raw ratings of dimension definitions
* /Confirmatory ratings - Study 4 (part 2)
* analyze_confirmation.R - analyze associations between rated dimensions and PCA scores among the set1 (see study 1) set of verbs
* conf_dimensions - average ratings of the verbs on the dimensions
* conf_dimensions_modified.csv - same as previous with edited text names
* confirmatory_cmat.csv - correlation matrix between average rated dimensions and PCA scores
* confirmatory_dimension_summaries.csv - summary states (N/reliability) for confirmatory rating dimensions
* Confirmatory_ratings.csv - raw rating data from survey
* confirmatory_results.csv - mixed effects (and other statistical) modeling results
* /Behavioral validation - Study 5
* analyze_behavioralvalidation.R - main analysis script for predicting similarity judgments based on dimension ratings
* behavioral.csv - long form behavioral data consisting of judgments about action similarity
* behavioral_cv.Rdata - Rdata file containing cross-validation results
* behavioral_rt.Rdata - Rdata file containing reaction time power analysis results
* conf_dimensions.csv - average dimension ratings from
* /power analysis
*socsim_(pairs/ratings)_1-25.csv - behavioral data from separate study (https://osf.io/gxhkr/) used to conduct power analysis for this study
* /Pairwise similarity - Study 6
* 3pc_ratings.csv - raw ratings of action on three dimensions of person perception
* action_pairs_modified.csv - pairs of actions for similarity rating task
* Action_ratings_-_3pc.qsf - qualtrics survey for person perception ratings
* Action_ratings_-_frequent.qsf - qualtrics survey for 8 defined PCs for frequent verbs
* Action_similarity.qsf - qualtrics survey for pairwise similarity ratings between actions
* analyze_pairwise_similarity.R - main analysis script for predictig similarity ratings based on the proximity between actions on rated dimensions
* avgtaxfreq.csv - average rated (action) dimensions
* avgtraits.csv - average rated (person perception) dimensions
* fverbs.csv - set of frequent verbs
* pairwise_lmer.Rdata - mixed effects modeling results
* pairwise_similarity.csv - raw similarity ratings
* /power analysis
* ratings of similarity between famous people
* ratings of famous people on 13 trait dimensions
* pairwise_power.R - script for using the two files above for determining statistical power of the present study
* taxonomy_ratings.csv - raw ratings on 8 action dimensions
* taxstats.csv - summary statistics on taxonomy ratings
* traitstats.csv - summary statistics on trait ratings
* /Action phrase similarity - Study 7
* Action_phrase_similarity.qsf - qualtrics survey for action phrase ratings
* analyze_phrase_similarity.R - main analysis script for predicting the rated similarity between action phrases based on the rated position of those action phrases on the ACT-FAST dimensions
* (break/build/drive/fight/play/run/see/talk).csv - stimulus files containing action phrases based around each of these verbs
* /pairs - contains all pairwise combinations of the action phrases within each verb condition
* phrase_power.R - power analysis to determine sample size in the curernt study
* phrase_similarity.csv - raw ratings of the similarity between action phrases
* /Action feature prediction - Study 8
* Action_phrase_feature_ratings.qsf - Qualtrics survey for eliciting ratings of 80 action phrases (see study 7) on 16 feature dimensions
* analyze_features.R - predict feature ratings based on the positions of action phrases on the ACT-FAST dimensions
* average_features.csv - average ratings of action phrases on feature dimensions
* feature_correlations.csv - correlations between feature dimensions
* feature_ratings.csv - raw ratings of action phrases on feature dimensions
* feature_reliability.csv - reliability of ratings of each feature dimension
* /fMRI study - Study 9
* /ratings
* aggregated_raw_dimension_ratings.csv - average ratings of actions on the ACT-FAST dimensions from all previous studies
* category_momementsv1_rating.txt - action categories from Moments in Time pretrained temporal relation network (automated annotation) edited to exclude different agents performing the same activity.
* compile_ratings.R - processes and averages ratings of the 280 new actions contained in Sherlock_ratings.csv
* new_actions_sherlock.csv - list of the new actions being rated specifically for sherlock
* processed_annotation_332.csv - reprocessed annotations which take the raw annotations.csv file produced by the temporal relation network, and performing averaging where needed to reduce actions down to the 332 we consider.
* rating_regressors.R produces regressors for the annotations and annotation+rating combinations.
* Sherlock_allratings.csv - combination of all ratings of 332 action classes in Sherlock on ACT-FAST dimensions
* Sherlock_existing_verb.csv - list of the existing actions already rated in previous studies.
* Sherlock_ratings.csv - raw data from online rating study for new actions specific to Sherlock
* Sherlock_ratings_280new.csv - across-participant averages for Sherlock_ratings.csv produced by compile_ratings.R
* /encoding models
* /video annotation
* annotate_video.py - calls on pretrained temporal relation network to annotation actions present in sets of frames created by make_frames.py
* annotations.csv - output of annotate_video.py, consisting of probabilities for each action class across all 3s clips of the movie.
* make_frames.py - takes the 3s video clips from parse_video.py and turns them into a set of still frames (jpgs) using ffmpeg.
* parse_video.py - takes raw Sherlock video file and parses it into 3s long sections. The video file is not shared publicly due to copyright restrictions. Please contact us directly if you require it.
* /voxel selection
* alpha_mask_allbutseg(1-5).nii - voxel masks produced in feature_selection.R
* alpha_seg(1-5).nii - whole brain maps of voxelwise reliability of activity across participants with respect to actions, in terms of Cronbach's alphas. Produced by make_alphamaps_cv.m
* correlate_annotations_cv.m - independently correlates each of 332 action regressors (TR_hrf_regressors_annotation_332.csv) derived from the automated annotation with voxelwise activity in the brain of each of the 17 participants (raw bold data available [here][2]). Produces maps of correlations for each action/participant, which are then used as input for make_alphamaps_cv.m. These maps are not included due to their large size and derivative nature.
* correlate_annotations_sbatch - slurm script for submitting correlate_annotations_cv.m to cluster
* feature_selection.R - performs feature selection in a cross-validation fashion by taking the alpha maps produced by make_alphamaps_cv.m and applying mixture modeling to cluster the voxels. Outputs the mask files.
* make_alphamaps_cv.m - takes the annotations correlation maps produced by correlate_annotations_cv.m and computes the inter-participant Cronbach's alpha at each voxel, then writes the results into the alpha_seg files.
* TR_hrf_regressors_annotation_332.csv - 332 action regressors derived from the automated annotation.
* /studyforrest - analogous files for replicating the Sherlock results in the [Forrest Gump dataset][3]
* /Explanatory figure
* Data and code for explanatory UMAP figure 8
[1]: https://osf.io/5xykq/
[2]: https://dataspace.princeton.edu/jspui/handle/88435/dsp01nz8062179
[3]: https://www.studyforrest.org/