Abstract:At present, since the recognition of most students’ classroom behavior is mainly based on a single frame image and ignores behavior coherence, video information cannot be made full use of to accurately depict students’ classroom behavior. Therefore, this study proposes an improved YOWO algorithm model to effectively employ video information to identify students’ classroom behavior. First, this paper collects teaching videos from real classroom teaching in a university and produces an AVA format video dataset containing five types of students’ classroom behavior. Second, the temporal shift module (TSM) is adopted to enhance the ability of this model to obtain time context information. Finally, a non-local operation module is utilized to improve the ability of the model to extract key location information. The experimental results show that by optimizing the YOWO model, the recognition performance of the network is better. In the classroom behavior dataset, the mAP value of the improved algorithm is 95.7%, 4.6% higher than that of the original YOWO algorithm. The parameter number in the model is reduced by 32.3% at 81.97×106 and the calculation amount is decreased by 9.6% at 22.6 GFLOPs. The detection speed is 24.03 f/s, an increase of about 3 f/s.