Abstract:The intelligent recognition of infant facial expressions can help caregivers to better pay attention to the physical and mental health of infants. Due to the smooth facial lines and weak sharpness of facial features, the inter-class similarity of infants’ facial expressions is higher than that of adults. To address the problem of high inter-class similarity, this study proposes a multi-scale information fusion network. The network is divided into two stages as a whole. In the first stage, the fusion module is applied to fuse local features with global features in the dual dimensions of both spatial and channel domains to enhance the expression ability of features. In the second stage, the self-adaptive deep centre loss is employed to estimate the weights of fused features based on the attentional mechanism, thus guiding the center loss and promoting the intra-class compactness and inter-class separation of infant expression features. The experimental results show that the multi-scale information fusion network achieves a recognition accuracy of 95.46% in the infant facial expressions dataset, reaching 99.07%, 95.88%, and 95.89% in the three evaluation metrics of AUC, recall, and F1 score respectively. The recognition effectiveness is optimal compared with the existing facial expression recognition networks. The generalization experiments of the multi-scale information fusion network are conducted on the public facial expressions dataset, with an accuracy of 89.87%.