Noise-trained deep neural networks effectively predict human vision  and its neural responses to challenging images

doi:None

Title	Authors

Home

The data here have been used for the study comparing the object recognition performance of human observers and convolutional neural networks under challenging noisy viewing conditions. They have been collected from 2 behavioral experiments and 1 fMRI experiment. To evaluate CNN performance, MatCovNet (https://www.vlfeat.org/matconvnet/) has been mainly used for the study. Pretrained CNNs are available from the official website. Three additional CNNs used in the study are provided here and their format follows the convention of MatConvNet: 1. VGG19 initialized from MatConvNet pretrained weights and further trained on 16 categories of noise-free images 2. VGG19 initialized from MatConvNet pretrained weights and further trained on 16 categories of noisy images (corrupted by Gaussian and Fourier noise) as well as noise-free images 3. VGG19 initialized from MatConvNet pretrained weights and further trained on 1000 categories of noisy images (corrupted by Gaussian and Fourier noise) as well as noise-free images For those who are interested in replicating the results or evaluating their own deep neural networks, the data to create each figure are provided. The description of the data format is explained below. We cannot provide photographs or images used in the study due to copyright issues, but please reach out to the authors for any request. Further details of the study are described in the published paper by Jang et al. (2021). **Figures_2_4.mat** - Each variable contains recognition performance of 20 human observers, 8 pretrained CNNs, and noise-trained VGG19 across a range of signal-to-signal-plus-noise ratios (SSNRs) given objects in pixelated Gaussian noise or Fourier phase-scrambled noise. Note that the accuracy from human observers was measured by 10 SSNR levels (0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.5, and 0.75), whereas the accuracy from CNNs was measured by 20 SSNR levels from 0.05 to 1.0 with an increment of 0.05. - For pretrained CNNs and noise-trained VGG19, 'category_id' contains the indices of 16 categories (i.e., bear bison, elephant, hamster, hare, lion, owl, tabby cat, airliner, jeep, couch, speedboat, schooner, sports car, table lamp, teapot) based on the category order provided by MatConvNet. The order of 1000 ImageNet categories can be found in meta.classes from any MatConvNet model. 'image_id' contains the indices of individual ImageNet images that can be found in their file names (e.g., ILSVRC2012_val_00000001.JPEG). **Figure_3.mat** - Accuracies of all networks trained on either pixelated Gaussian noise or Fourier phase-scrambled noise using diffferent SSNR levels are provided. **Figure_5.mat** - 'human' contains the behavioral performance of 20 individual human observers including their reported SSNR thresholds, image file names, category choices, and painting maps across 800 images. - 'cnn' and 'noise_trained_cnn' contain the recognition performance of noise-free and noise-trained CNNs across a full range of SSNRs as well as their SSNR thresholds for individual images. Note that NaN was assigned for an image that exhibited a distorted accuracy-by-SSNR curve and therefore was excluded when calculating SSNR thresholds. The details for the criteria are described in the manuscript. They also contain the saliency maps of 800 images across a full range of SSNRs from noise-free and noise-trained CNNs. Those saliency maps were obtained by using layerwise relevance propagation technique (Bach et al., 2015). Also note that the image order differs from that of 'human', so it needs to be adjusted for the direct comparison between humans and CNNs. **Figure_6.mat** - Correlation-based SSNR thresholds and classification-based SSNR threshold for both pixelated Gaussian noise and Fourier phase-scrambled noise are provided. **Figure_7.mat** - The data contains 48 x 48 representation similarity analysis (RSA) matrices from 8 human observers, standard VGG19, and noise-trained VGG19. The stimulus order is 16 clear images, followed by the same 16 images corrupted by Gaussian noise, followed by the same 16 images corrupted by Fourier noise. The order of the 16 images is provided in 'image_order'. - The RSA matrix was obtained from each brain region including V1, V2, V3, V4, LOC, FFA, PPA, Early visual areas, and High visual areas, from individual human observers. Correspondingly, the RSA matrix was obtained from 19 layers of each CNN. - The fMRI responses of 8 individual human observers from 9 ROIs are offered. For each of 48 conditions (3 stimulus types x 16 images; the order is the same as the one above for the RSA matrices), the brain responses are provided as a 2-d array (i.e., the number of voxels x the number of runs). Additionally, the SVM decoding performance of individual human observers are provided for each stimulus type (i.e., clear, Gaussian noise, and Fourier noise, in order). **Figure_8.mat** - Accuracies of four networks (i.e., standard, Gaussian noise-trained, Fourier noise-trained, and Gaussian and Fourier noise-trained networks) are provided across 4 different conditions: Salt-and-pepper noise with 100% contrast and 30% contrast, low-pass filtering, and high-pass filtering. Refer to Geirhos et al. (2018) for details of the method to generate the noise. **Figure_9.mat** - Correct or wrong responses across noise-free and noisy images from standard and noise-trained networks are provided. **SFigure_1.mat** - All confusion matrices from 20 human observers and 8 pretrained CNNs were provided with tested SSNR levels of 0.2, 0.3, 0.5, and 0.75. **SFigure_2.mat** - Correlational similarities of response patterns to noise-free objects and the same objects presented at varying SSNR Levels are provided across all layers from both standard and noise-trained CNNs. **SFigure_3.mat** - Classification accuracies of SVMs tested on response patterns are provided across all layers from both standard and noise-trained CNNs. **SFigure_4.mat** - Three networks are provided (Gaussian noise-trained CNN, Fourier noise-trained CNN, and pretrained CNN) each of which has accuracy scores tested on both Gaussian and Fourier noise. **SFigure_5.mat** - Three networks are provided (AlexNet, VGG-19, and ResNet-152) each of which has accuracy scoress across a full range of SSNRs as well as SSNR thresholds to achieve 50% of accuracy. **SFigure_6.mat** - Top1 and top5 accuracies of pretrained VGG-19 and noise-trained VGG-19 on the 1000-category ImageNet classification task.

Compare

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.

Create an Account Learn More Hide this message

Main content

Home

Menu

Start managing your projects on the OSF today.

Main content

Links to this project

Home

Menu

Add new wiki page

Page permissions have changed

Wiki page deleted

Connected to the collaborative wiki

Connecting to the collaborative wiki

Collaborative wiki is unavailable

Browser unsupported

Start managing your projects on the OSF today.