## What's this
A behavioral dataset collected from 36 participants in a word-arrangement task we had developed. In this task, each participant arranged 60 nouns or 60 adjectives according to the semantic relationships between them. You can use this dataset for obtaining the structure of human perceived semantic dissimilarity between words.
A quick way to use the perceived semantic dissimilarity of nouns or adjectives is that you read out the "estimate_dissimMat_ltv" variable from each participant's experimental data ("P{%03d}_{Nouns|Adjective}.h5").
## Citing the dataset
If you use this dataset in your work, please cite the following publication:
**Nishida S, Blanc A, Maeda N, Kado M, Nishimoto S. Behavioral correlates of cortical semantic representations modeled by word vectors. PLOS Computational Biology, 17(6): e1009138.** https://doi.org/10.1371/journal.pcbi.1009138
----------
## Experiments
### Participants
36 healthy Japanese people
* 20 females and 16 males
* Age 18–58, mean ± SD = 25.9 ± 10.1 years
* Written informed consent was obtained from all of the participants
### Word arrangement task
The participants performed a word-arrangement task on a PC. This task is considered to be a modified version of the psychological task introduced previously [1].
In each trial, the participants were required to arrange ≤60 words (nouns or adjectives) in a two-dimensional space on a computer screen according to their semantic relationship by mouse drag-and-drop operations. This paradigm has allowed for the efficient collection of the perceptual semantic dissimilarity between words from participants [1].
The words used in this task included 60 nouns and 60 adjectives in Japanese (see Nouns.csv and Adjectives.csv). These words were selected from the vocabulary in the Japanese Wikipedia corpus. Nouns were selected in terms of the following six semantic categories: humans, non-human animals, non-animal natural things, constructs, vehicles, and other artifacts. Ten words were selected for each category. For adjectives, such category-based selection was deemed difficult due to the small number of adjectives in the vocabulary (only 473 words). Nouns and adjectives were separately used in two distinct sessions of the task, which was performed over 2 days.
The words were arranged in a designated circular area on the computer screen (“arena”). The words were initially displayed outside the arena. The participants used mouse drag-and-drop operations in moving each word item into the arena and arranging them. The arrangement of words was according to the participants’ own judgment of semantic similarity and dissimilarity between words. More similar or dissimilar word pairs should be closer or further apart in the arena, respectively. The participants were allowed to move any words within the arena as many times as they wished. Once all words were moved into the arena, the participants could click a button marked “Next” anytime to move on to the next trial.
On the initial trial of each session, the participants have arranged the entire set of 60 words (nouns or adjectives). On subsequent trials, they arranged subsets of those 60 words. After the end of each trial, the rough estimate of the word dissimilarity matrix and the evidence (0–1) for each word pairwise dissimilarity were computed as described previously [1]. Words in the subsets chosen for each trial were determined so as to increase evidence for pairwise dissimilarities of the words whose evidence estimated on the previous trial was the weakest. The session continued until the evidence of every word pair dissimilarity was above a threshold (0.75) or until the total duration of the session approached 1 h.
### Word dissimilarity
A word dissimilarity was evaluated from multiple word-subset arrangements on the word-arrangement task. The dissimilarity for a given word pair was estimated as a weighted average of the scale-adjusted dissimilarity estimates from individual arrangements as has been described previously [1]. These estimates produced word dissimilarity for the entire set of words. In this way, the word dissimilarity was estimated separately for nouns and adjectives.
### References
1. Kriegeskorte N, Mur M. Inverse MDS: Inferring dissimilarity structure from multiple item arrangements. Front Psychol. 2012;3: 1–13. doi:10.3389/fpsyg.2012.00245
----------
## Data organization
### Files
* **participants.csv**: List of participants with their attributes.
* **nouns.csv**: Dictionary of nouns used in the experiment.
* **adjectives.csv**: Dictionary of adjectives used in the experiment.
* **P{%03d}_Nouns.h5**: Experimental data of each participant in noun arrangements (HDF5 format)
* **P{%03d}_Adjectives.h5**: Experimental data of each participant in adjective arrangements (HDF5 format)
### Variables in experimental data
n = number of items in dictionary (=60)
t = number of trials
***position (n x 2)*** : Last position (x,y) of items on the screen
***usedItems (n x t)*** : Indicate if the item is used or not in a trial. (1: used / 0: not used)
***distances ([n\*(n-1)/2] x t)*** : Distance between items saved as a vector
***estimate_dissimMat_ltv ([n\*(n-1)/2] x 1)*** : Estimate dissimilarity matrix as a vector
***evidenceWeight_ltv ([n\*(n-1)/2] x 1)*** : Evidence weight matrix as a vector
***pats_mds_2D (n x 2)*** : "Fit and transform" of a squared form of estimate_dissimMat_ltv by the MDS
***disparities ([n\*(n-1)/2] x 1)*** : lower-triangular form of dissimilarity matrix from MDS
***stress (1)*** : scalar value of stress from MDS
***timeTrial_Start (1 x t)*** : List of trial starting time (elapsed time since the beginning of experiment)
***timeTrial_Stop (1 x t)*** : List of trial finishing time (elapsed time since the beginning of experiment)
***timeTrial_Duration (1 x t)*** : List of trial duration in seconds
***cTrial_objectIs (n)*** : Current trial item list
***minEvidenceWeight (t)*** : List of minimal evidence weight for each trial
***parameters***
* **dictionary**: dictionary used for the experiment (in json format, doesn't contains images used for experiment)
* **time_remaining**: time remaining before the time limit
* **time_elapsed**: time elapsed since the beginning of the experiment
* **nbTrial**: number of trial already done
* **isFinish**: Is the experiment finished (1:yes/0:no)
* **displayMode**: What is the kind of experiment ('Image'/'Text')