Abstract:Sleep staging is highly important for sleep monitoring and sleep quality assessment. High-precision sleep staging can assist physicians in correctly evaluating sleep quality during clinical diagnosis. Although existing studies on automatic sleep staging have achieved relatively reliable accuracy, there are still problems that need to be solved: (1) How can sleep features be extracted from patients more comprehensively? (2) How can effective rules for sleep state transition be obtained from the captured sleep features? (3) How can multimodal data be effectively utilized to improve classification accuracy? To solve the above problems, this study proposes an automatic sleep staging network based on multi-head self-attention. To extract the modal characteristics of EEG and EOG in sleep stages separately, this network uses a parallel two-stream convolutional neural network structure to process the original EEG and EOG data separately. In addition, the model uses a contextual learning module, which consists of a multi-head self-attention module and a residual network, to capture the multifaceted features of the sequences and to learn the correlation and significance between the sequences. Finally, the model utilizes unidirectional LSTM to learn the transition rules for sleep stages. The results of the sleep staging experiments show that the model proposed in this study achieves an overall accuracy of 85.7% on the Sleep-EDF dataset, with an MF1 score of 80.6%. Moreover, its accuracy and robustness are better than those of the existing automatic sleep staging methods. This indicates that the proposed model is valuable for automatic sleep staging research.