Abstract:The traditional sentiment analysis methods based on single-modal data have always had problems such as a single analysis angle and low classification accuracy. The analysis method based on temporal multimodal data provides the possibility to solve these problems. On the basis of the temporal multimodal data between utterances, this study improves the existing multimodal sentiment analysis method and uses the bidirectional gated recurrent unit (Bi-GRU) combined with the intra-modal and cross-modal context attention mechanism for sentiment analysis. The sentiment analysis is finally verified on the MOSI and MOSEI datasets. Experiments show that the method of using temporal multimodal data between utterances and fully integrating intra-modal and cross-modal context information can be applied to sentiment analysis from the perspective of multimodal and temporal features. By doing this, the classification accuracy of sentiment analysis can be effectively improved.