Abstract:Recognition of human actions in videos is an important research field in computer vision in recent years. However, existing methods have insufficient representation of video and cannot focus on significant areas within the image. We propose a deep convolutional neural network based on visual attention, which can effectively add a weight to the video representation features, pay attention to the beneficial regions in the features, and achieve more accurate behavior recognition. We conducted experiments on HMDB51 and our own Oilfield-7 dataset to verify the validity of the model proposed for human actions on the oilfield. The experimental results show that the proposed method has certain advantages compared with the two-stream architectures which have achieved excellent performance.