### **This is the landing page to understand the organization of the data. Read first!**
The multi-angle extended three-dimensional activity (META) stimulus set is an instrumented set of recordings of everyday activity. The total duration of the performed activities is approximately 25 hours. Four types of high-level activity sequences were performed by 5 actors in different settings. The activities were recorded with three video cameras and an infrared time-of-flight depth sensor. The archive includes the raw recordings and also a number of derived features, including features of body movement, object identity and location, and data from a large sample of observers segmenting the video recordings into meaningful events.
The META corpus is described in detail in the manuscript, "[The multi-angle extended three-dimensional activity stimulus set: A tool for studying event cognition.](https://psyarxiv.com/r5tju/)"
# [Captured Data](https://osf.io/rpgmx/)
## [Video](https://osf.io/t39cv/)
Each chapter was recorded from three camera angles (C1, C2, Kinect). The original resolution of the video files is 1920x1080 pixels for each of the three cameras. The files shared on OSF have been resized to 960x540 pixels for ease of sharing. Videos are synchronized between camera angles and have been trimmed to begin shortly before the actor enters and end shortly after the actor exits. Videos are organized by actor, chapter type, and camera angle. The naming convention is {actor}.{chapter type}.{chapter}_{camera angle}_trim.mp4, so for example, video *1.2.3_C1_trim.mp4* is actor 1 performing chapter type 2 for the third time from angle C1. (NB: The actors are numbered 1, 2, 3, 4, and 6; actor number 5 left the project early in recording and was replaced with actor number 6.)
| Chapter number | Type|
| --- | ---|
| 1 | Making breakfast |
| 2 | Exercising |
| 3 | Cleaning a room |
| 4 | Bathroom grooming |
## [Skeleton](https://osf.io/zea5k/)
Depth maps over time were recorded with a Microsoft Kinect v2 device and the positions of 25 joints on the body were extracted from the depth image using the native Kinect algorithm. We refer to these joint positions as a "skeleton." We preprocessed the raw skeleton data with the following steps: timings were synchronized to the onset of the video file, tracking of phantom skeletons was removed, and when tracking suffered a momentary lapse and generated a new body ID for the same actor, the data from the two body IDs was merged. These data include 3D coordinates for each of 25 skeletal joints, tracked vs. inferred status of each joint (note that 'tracked' joints may still contain errors), and 2D coordinates of each joint mapped to the pixels of the Kinect color camera. Note that the 2D coordinates are based on the orignial resolution of the kinect camera (1920x1080), and so they must be scaled if using the Kinect videos shared here with a resolution of 960x540. There are joint specific columns for each of 25 joint indices (0 to 24):
| Column | Definition |
| --- | ---|
| sync_time | Time in seconds from onset of video |
| raw_time | Time in seconds from start of Kinect recording |
| body | The ID of the actor's body, which will always be 0 in the preprocessed data |
| J{joint index}_ID | name of joint |
| J{joint index}_Tracked | whether the algorithm thought the joint was tracked or inferred |
| J{joint index}_3D_X | x coordinate of joint in meters relative to depth camera |
| J{joint index}_3D_Y | y coordinate of joint in meters relative to depth camera |
| J{joint index}_3D_Z | z coordinate of joint in meters relative to depth camera |
| J{joint index}_2D_X | x coordinate of joint in pixels relative to original Kinect color camera resolution |
| J{joint index}_2D_Y | y coordinate of joint in pixels relative to original Kinect color camera resolution |
# [Annotations](https://osf.io/3f9d2/)
## [High-level events](https://osf.io/kxdg9/)
[*Event_annotation_timing.csv*](https://osf.io/cxpgh/)
This file contains a judge's ratings of the start and stop time in seconds when the actor performed each scripted action.
| Column | Definition |
| --- | ---|
| run | Specific chapter, in the format {actor}.{chapter type}.{chapter} |
| evnum | position of event within the chapter |
| evname | name of the event |
| startsec | time in seconds at which event started, relative to start of video |
| endsec | time in seconds at which event ended, relative to start of video |
## [Object labels](https://osf.io/sm3fg/)
**object bounding box annotations**
These files contain manually-annotated bounding boxes of object class and location for a subset of frames from each of the three camera angles (C1, C2, Kinect). The annotation was performed on frames taken from videos with the resolution shared here (960 x 540) rather than the original resolution (1920 x 1080). As described in the paper, the frames were extracted starting at 10 seconds, and at a rate of every 20 seconds (except the breakfast and exercise chapters of Actor 1 (20 chapters), for which the frames were extracted starting at 5 seconds and every 10 seconds). These files contain one row for each labelled instance of an object.
| Column | Definition |
| --- | ---|
| filename | A name for the imagefile of the labelled frame (frame images not included on OSF) |
| width | pixel width of labelled frame |
| height | pixel height of labelled frame |
| class | name of labelled object class |
| xmin | pixel left edge of bounding box |
| ymin | pixel bottom edge of bounding box |
| xmax | pixel right edge of bounding box |
| ymax | pixel top edge of bounding box |
| index | time in seconds from start of video for labelled frame |
## [Object tracking](https://osf.io/5gm4u/)
This component contains csv files with the inferred tracking of each object over time between the hand annotations described above, using the [Siam Region Proposal Network algorithm](https://openaccess.thecvf.com/content_cvpr_2018/papers/Li_High_Performance_Visual_CVPR_2018_paper.pdf) (Li et al., 2018). Currently, the shared data for tracked objects is limited to a subset of chapters and camera angles.
Object tracking csv files:
| Column | Definition |
| --- | ---|
| frame | frame of the video in its native frame rate (25 fps for kinect videos and 29.97 for C1 and C2 videos) |
| name | The class of the object, with a number assigned for each instance |
| x | left edge pixel of bounding box |
| y | bottom edge pixel of bounding box |
| w | width of bounding box |
| h | height of bounding box |
| confidence | a measure of the tracking model's confidence, from 0 (min) to 1 (max). |
| ground_truth | this column is not informative |
# [Segmentation Norms](https://osf.io/v562e/)
[raw_data.csv](https://osf.io/8srdx)
Human segmentation data for the activity videos were collected using Amazon Mechanical Turk. 2284 unique workers segmented the movies, such that 30 workers segmented each movie at a fine temporal grain and 30 segmented at a coarse grain, for a total of 9000 viewings.
## [Analysis](https://osf.io/teygu/)
[*corpus_segmentation_exclusions.Rmd*](https://osf.io/dhqw6/)
[*corpus_segmentation_exclusions.html*](https://osf.io/cqe95/)
This script reads in the raw data, calculates exclusion criteria, and outputs data with columns to filter based on exclusion criteria.
---
[*corpus_segmentation_analysis.Rmd*](https://osf.io/nfj8v/)
[*corpus_segmentation_analysis.html*](https://osf.io/hsvrb/)
This script performs the analyses described in the paper.
---
[*parse.mturk.seg.R*](https://osf.io/grxja/)
[*make.bins.s*](https://osf.io/3mgb6/)
[*compare.bins.to.group.minus.sub.s*](https://osf.io/d6jgw/)
Helper functions for completing the analysis.
---
[*all_raw_data.csv*](https://osf.io/8srdx/)
Raw data, with personal IDs removed.
---
[*e148_META_RawSegmentation_clean.csv*](https://osf.io/ghe4t/)
Raw data filtered to sessions included in analysis.
---
[*e148_META_RawSegmentation_with_Exclusions.csv*](https://osf.io/dpyqz/)
Raw data including columns for all exclusion criteria.
| Column | DataType | Definition |
| --- | ---|---|
| comcode | character | a random code that was used to assign credit to participants, not used for analysis |
| startTime | POSIXct | date/time viewing began |
| pracmovie | character | name of the practice movie |
| condition | character | "coarse" or "fine"; type of segmentation that participants were asked to complete |
| repeat | character | "new": first viewing of participant, "repeat": second or later viewing of participant within a week of last session (does NOT re-answer demographics,catch question, segmentation practice, etc), or "repeat_old": second or later viewing of participant more than a week after last session (does re-answer demographics, catch question, segmentation practice etc) |
| movie1 | character | movie watched for this viewing |
| segmentprac | character | segmentation data for practice film if new/repeat_old |
| segment1 | character | segmentation data for movie1 (three columns: label, movie time in s, JavaScript ms time stamp) |
| age | numeric | self report participant age |
| gender | character | self report participant gender |
| location | character | self report participant state |
| ethAmIndian, ethAsian, ethHawaii, ethBlack, ethWhite | logical | TRUE if participant identifies with that ethnicity |
| Hispanic | logical | TRUE if participant identifies as Hispanic |
| feeling | character | catch question; if asked should be "None of the Above", if not asked, is "NA" |
| gradelevel | character | highest degree obtained, free text and not cleaned |
| degree | character | subject of highest degree obtained, free text and not cleaned |
| dateOfDegree | character | date highest degree obtained, free text and not cleaned |
| SubjSource | character | "MTURK" or "WUSTL"; place of participant recruitment Amazon Mechanical Turk or Washington University in St. Louis Department of Psychological and Brain Sciences participant pool |
| workerId | character | unique identifier for each participant |
| viewing | character | paste(workerId, movie1, startTime); unique to each of our observations |
| catch_failure_count | integer | number of times participant failed catch question |
| BOOL_catch | logical | TRUE if viewing excluded for failing catch question |
| Obs_vs_mov | numeric | difference in ms between the length of movie and time spent viewing |
| BOOL_Obs_vs_mv | logical | TRUE if viewing excluded for exceeding obs_vs_mov threshold |
| nboundries | integer | number of boundaries participant marked in this viewing |
| BOOL_nboundries | logical | TRUE if viewing excluded for boundary count |
| BOOL_nboundries_pre | logical | TRUE if viewing excluded for preregistered threshold (0 boundaries) |
| duplicate_count | integer | number of times participant saw the same film |
| BOOL_duplicate_count | logical | TRUE if viewing excluded for watching same film two or more times |
| bad_grain | character | "Original grain" (this viewing matches the grain of participants first viewing) or "Changed grain" (this viewing does NOT match the grain of participants first viewing) |
| BOOL_bad_grain | logical | TRUE if viewing excluded for changing segmentation grain |
| Exclude.Analysis | logical | TRUE if viewing is excluded for analysis |
| Exclude.Preregistration | logical | TRUE if viewing is excluded based on preregistration criteria |
---
[seg_data_analysis_clean.csv](https://osf.io/ctva4)
This file contain cleaned data.
Data with a row for each segmentation time point, in csv format.
| Column | Definition |
| --- | --- |
| MS | JavaScript time stamp of number of milliseconds that have passed since Jan. 1, 1970, logged at each boundary button press. |
| Sec | Current frame of the movie at button press, in seconds from start of movie. |
| workerId | An ID unique to each participant, generated to not contain identifying information. |
| Condition | Coarse or fine, type of segmentation that participants were asked to complete |
| Count | For each segmentation condition of each movie, there were 30 rows in the dataframe sampled to choose the movie for the participant. This is the identifier of the row selected. |
| Movie | Name of the movie file that was segmented |
| startTime | "yyyy-mm-dd hh:mm:ss" of session |
| OnsetMS | JavaScript time stamp of number of milliseconds that have passed since Jan. 1, 1970, logged at onset of movie playing. |
| FinishMS | JavaScript time stamp of number of milliseconds that have passed since Jan. 1, 1970, logged at finish of movie playing. |
## Known Missing Data
The Kinect recording for chapter 3.3.7 was lost.
The C2 video for chapter 2.3.7 ended early.
# [Features for Modeling](https://osf.io/bc9t5/)
**This OSF component contains features that were used to construct [SEM models](https://psyarxiv.com/pt6hx)**
## Features before resampling to 3Hz
This directory contains raw features before resampling for each activity (video). Those features are: object appearance/disappearance (binary), optical flow features (continuous), names and weighted average (1/distance^2) semantic embeddings of 3 objects nearest to the right hand of the actor, average semantic embeddings of all objects in the current frame, pose and motion features.
## Resampled Data: before and after PCA
This directory contains resampled data (e.g. *1.1.1_kinect_sep_09_all_features_resampled.csv*): roughly, the processing script perform horizontally concatenating all features, dropping rows that one of the features have NAs, and then resampling to 3Hz. [Code](https://github.com/mbezdek/extended-event-modeling/blob/main/src/preprocess_features/preprocess_indv_run.py#L220C12-L220C12)
This directory also contains PCA-ed resampled data (e.g. *1.1.1_kinect_sep_09_all_pcs_input_vector.csv*): resampled data contains 253 features in total (2 object appearance/disappearance, 2 optical flow info, 149 pose and motion info, 100 nearest object and all object embeddings). They are projected to a 30 dimensional space. Whitening PCA was applied independently for each type of features (hence 4 PCA component matrices and 4 means). [Code](https://github.com/mbezdek/extended-event-modeling/blob/main/src/train_eval_inference/run_sem_pretrain.py#L236)
## PCA components and means
This directory contain principal components (derived from all videos) and means (derived from all videos). These components and means are for storage purpose, PCA-ed data are in **Resampled Data: before and after PCA**
# [SEM Modeling Results](https://osf.io/39qwz/)
Please refer to this OSF for modeling input and output (results).