Main content
Improved Video Emotion Recognition with Alignment of CNN and Human Brain Representations
Date created: | Last Updated:
: DOI | ARK
Creating DOI. Please wait...
Category: Project
Description: The ability to perceive emotions is an important criterion for judging whether a machine is intelligent. To this end, a large number of emotion recognition algorithms have been developed especially for visual information such as video. Most previous studies are based on hand-crafted features or CNN, in which the former fails to extract expressive features and the latter still faces the undesired affective gap. This drives us to think about what if we could incorporate the human emotional perception capability into CNN. In this paper, we attempt to address this question by exploring alignment between representations of neural networks and human brain activity. In particular, we employ a visually evoked emotional brain activity dataset to conduct a jointly training strategy for CNN. In the training phase, we introduce the representation similarity analysis (RSA) to align the CNN with human brain to obtain more brain-like features. Specifically, representation similarity matrices (RSMs) of multiple convolutional layers are averaged with learnable weights and related to the RSM of human brain. In order to obtain emotion-related brain activity, we conduct voxel selection and denoising with a banded ridge model before computing the RSM. Sufficient experiments on two challenging video emotion recognition datasets and multiple popular CNN architectures suggest that human brain activity is promising to provide an inductive bias for CNN towards better performance of emotion recognition. If you need video data, please contact fukaicheng2019@ia.ac.cn.