Abstract:Continuous emotion recognition based on multimodal physiological data plays an important role in many fields. However, it needs more physiological data to train emotion recognition models due to the lack of subjects’ data and subjectivity of emotion, and it is largely affected by homologous subjects’ data. In this study, we propose multiple emotion recognition methods based on facial expressions and EEG. Regarding the modality of facial images, we propose a multi-task convolutional neural network trained by transfer learning to avoid over-fitting induced by small datasets of facial images. With respect to the modality of EEG, we propose two emotion recognition models. The first is a subject-dependent model based on support vector machine, possessing high accuracy when the validation and training data are homogeneous. The second is a cross-subject model for reducing the impact caused by the individual variation and non-stationarity of EEG. It is based on a long short-term memory network, performing stably under the circumstance that validation and training data are heterogeneous. To improve the accuracy of emotion recognition for homogeneous data, we propose two methods for decision-level fusion of multimodal emotion prediction: Weight enumeration and adaptive boost. According to the experiments, when the validation and training data are homogeneous, under the best circumstance, the average accuracy that multimodal emotion recognition models reached in both arousal and valence dimensions were 74.23% and 80.30%; as the validation and training data are heterogeneous, the accuracy that the cross-subject model reached in both arousal and valence dimensions are 58.65% and 51.70%.