# A data-driven analysis of the perceptual and neural responses to natural objects reveals organising principles of human visual cognition
Contains data and scripts for neural encoding analysis of stimulus model and
fMRI data from the THINGS database.
> **Citation:** Watson, D. M., & Andrews, T. J. (2025). A Data-Driven Analysis
> of the Perceptual and Neural Responses to Natural Objects Reveals Organizing
> Principles of Human Visual Cognition. The Journal of Neuroscience, 45(2),
> e1318242024. https://doi.org/10.1523/JNEUROSCI.1318-24.2024
## Obtaining the dataset
All data were obtained from the [THINGS
database](https://things-initiative.org/):
* The stimulus model can be obtained from their
[THINGS-odd-one-out](https://osf.io/f5rn6/) OSF repository.
* The stimulus set and THINGS+ metadata can be obtained from their [THINGS
object concept and object image database](https://osf.io/jum2f) OSF
repository.
* The MRI data can be obtained from their
[THINGS-data](https://doi.org/10.25452/figshare.plus.c.6161151.v1) figshare
repository:
* We obtained MRI parameter estimates from the *THINGS-data: fMRI Single
Trial Responses* in NIFTI format.
* We obtained anatomical images from the *THINGS-data: fMRI BIDS raw
dataset*. We also reconstructed cortical surfaces using Freesurfer 6.0.
## Repository data
> Note: All surface overlays are provided for the *fsaverage5* surface.
* **MRI_embedding_66d.csv** : Spreadsheet contains the 66-dimensional stimulus
model scores for the 720 object concepts included in the MRI dataset.
* **betas_surf** : Contains the group average MRI parameter estimates for each
of the 720 object concepts mapped to the *fsaverage5* surface.
* **encoding** : Contains the outputs of the partial least squares regression
(PLSR) neural encoding model.
* **Top-level** : Outputs of the PLSR model itself.
* **permtest** : Results of the permutation test on the cross-validated
R<sup>2</sup> prediction accuracies. Includes the critical value, null
distribution, and p-statistics (expressed as -log<sub>10</sub>(p)
values). All results follow a FWER correction (via a maximum statistic
permutation approach).
* **fsaverage_sym_upsample** : Results of the intra- and interhemispheric
correlations of the neural component loadings.
* **metadata_corrs** : Analysis of 12 object property ratings from THINGS+
metadata for object concepts in training set. Includes pairwise
correlations between object properties themselves and the corresponding
hierarchical clustering and multidimensional scaling solutions, and
correlations between object properties and latent scores along each PLSR
component.
* **encoding-linear** : Contains outputs of linear regression encoding model,
mapping all 66 stimulus features directly to neural responses.
* **GLM** : Contains the outputs of the GLM analysis, mapping the predicted
latent scores to the measured neural responses for the 240 samples in the
test set.
* **Top-level** : Includes the MRI parameter estimates (see *betas_surf*)
restricted to the 240 test samples, and the results of the correlations
between the GLM and PLSR neural loadings.
* **design_mats** : Contains design and contrast matrices to be used for
the GLM analysis. The design matrix is simply the demeaned latent space
scores.
* **glm.?h** : Results of the GLM analysis for each hemisphere.
* **modelling** : Contains outputs of representational similarity analyses
between PLSR components and DCNNs trained for object recognition. Includes:
* **image_info.csv** : Lists all THINGS images included in MRI experiment.
* **alexnet, vgg16** : Contain representational dissimilarity matrices and
results of representational similarity analyses for Alexnet and VGG16
DCNNs. The DCNN activations themselves are omitted due to having
excessive file sizes, but could be recalculated from the THINGS images
using the scripts provided (see below).
* **MVPA** : Outputs of the MVPA searchlight. Includes:
* The mid-thickness surfaces.
* The decoding accuracy maps from the searchlight.
* The outputs of the permutation test on decoding accuracies. Includes the
critical value, null distribution, and p-statistics (expressed as
-log<sub>10</sub>(p) values). All results follow a FWER correction (via a
maximum statistic permutation approach).
## Repository scripts
Below we detail the analysis scripts in the (approximate) order they need to be
run.
> Many of the scripts make reference to a `basedir` variable - this should be
> set to the top-level data directory.
### setup
> These scripts represent processing stages before and including generating the
> group-level parameter estimates contained in the `data/betas_surf` directory.
> We include these scripts for completeness, but they won't work directly as
> they require the individual-level MRI data and stimulus model obtained from
> the THINGs database (see [Obtaining the dataset](#obtaining-the-dataset)
> section).
* **extract_MRI_stimulus_encoding.py** : Extracts 66-dimensional stimulus model
scores for the 720 object concepts included in the MRI data (see
`data/MRI_embedding_66d.csv`).
* **xfm_betas2surf.sh** : Transforms MRI parameter estimates from each
individual's volume to the *fsaverage5* surface and applies surface-based
spatial smoothing.
* **average_ind_surf_betas.py** : Averages surface-transformed individual
parameter estimates over image repeats within each of the 720 object
concepts.
* **group_average_surf_betas.py** : Takes the averaged individual parameter
estimates within each object concept and further averages them over subjects.
These form the inputs to the neural encoding analyses (see
`data/betas_surf`).
### encoding
Scripts for running the partial least squares regression (PLSR) neural encoding
model.
* **do_plsr.py** : Runs the PLSR neural encoding model.
* **do_permtest.py** : Runs the permutation test of the cross-validated
R<sup>2</sup> prediction accuracies.
* **flip_encoding_RL.sh** and **corr_encodings_interhemi.py** : Scripts are
used for intra- and interhemispheric comparisons of neural component
loadings. The bash script mirrors the right hemisphere on to the left, and
the python script then runs the correlations.
* **corr_plsr-metadata.py** : Runs analysis of object property ratings from
THINGS+ metadata for object concepts in the training set. Calculates pairwise
correlations between object properties (and hierarchical clustering and
multidimensional clustering analyses of this), and correlations between
object properties and PLSR latent scores. Requires access to the
[THINGS+ metadata](https://osf.io/jum2f).
* **do_linear_encoding.py** : Runs the linear encoding model. Maps all 66
stimulus features directly to the neural responses.
### GLM
Scripts for running the GLM analysis, mapping the predicted latent scores to
the measured neural responses for the 240 object concepts in the test set.
* **setup_files.py** : Creates inputs files for GLM analysis:
1. Uses PLSR (fit to training set) to project stimulus model scores for test
set into latent space. Mean centres these, then saves them out to a design
matrix.
2. Creates contrast matrices.
3. Extracts MRI parameter estimates for samples in test set.
* **do_glm.sh** : Uses Freesurfer's `mri_glmfit` command to run GLM analysis.
* **corr_betas_with_encoding.py** : Correlates PLSR neural loadings (from
training set) with the GLM loadings (from test set).
### modelling
Scripts for running representational similarity analyses between PLSR
components and DCNN activations to each object concept.
* **extract_image_list.py** : Extracts list of all images in THINGS stimulus
set that were included in the MRI experiment (see
`data/modelling/image_info.csv`).
* **calc_DCNN.m** : Calculates DCNN layer activations for each image, then
averages activations within each object concept. Can choose from Alexnet or
VGG16, and uses MATLAB's implementations pre-trained for object
identification. Requires access to the images in the [THINGS stimulus
set](https://osf.io/jum2f).
* **calc_RSA.py** : Performs representational similarity analysis between DCNN
activations and PLSR components.
### MVPA
Scripts for performing MVPA decoding searchlight analysis. Requires
[CoSMoMVPA](https://www.cosmomvpa.org/) and
[surfing](https://github.com/nno/surfing) MATLAB toolboxes.
* **create_midthickness_surfaces.sh** : Uses Freesurfer tools to create a
mid-thickness surface for the *fsaverage5* brain, and saves in ASCII format
compatible with CoSMoMVPA.
* **do_MVPA_searchlight.m** : Runs searchlight analysis, including permutation
testing of decoding accuracies.
### utils
General utility scripts used by various other scripts.
* **freesurfer.py** : Python tools for loading and saving surface data from
Freesurfer files.
* **freesurfer_dataset.m** : MATLAB function for loading surface data from
Freesurfer files into a dataset format compatible with CoSMoMVPA.
* **map2freesurfer.m** : MATLAB function takes a CoSMoMVPA surface dataset and
saves out to a Freesurfer file.
## References
* [Hebart et al. (2019, Plos
One)](https://doi.org/10.1371/journal.pone.0223792) : Main reference for
THINGS database.
* [Hebart et al. (2020, Nature Human
Behaviour)](https://doi.org/10.1038/s41562-020-00951-3) : Reference for
behavioural stimulus model.
* [Hebart et al. (2023, eLife)](https://doi.org/10.7554/eLife.82580) :
Reference for neuroimaging data.
* [Stoinski et al. (2023, Behavior Research
Methods)](https://doi.org/10.3758/s13428-023-02110-8): Reference for THINGS+
metadata.
* [Oosterhof et al. (2016, Frontiers in
Neuroinformatics)](https://doi.org/10.3389/fninf.2016.00027) : Reference for
*CoSMoMVPA* toolbox.
* [Oosterhof et al. (2011,
Neuroimage)](https://doi.org/10.1016/j.neuroimage.2010.04.270) : Reference
for *surfing* toolbox.