Abstract:This study proposes a neural network named Hierarchical Bilinear-YOLOv3 for human behavior detection due to a large disparity in the same behavior and high resemblance between different behaviors in human behavior detection, as well as problems such as visual angle, occlusion, and incapability of continuous real-time monitoring. YOLOv3 is first designed for prediction on three scales, and certain layers in its feature pyramid networks are used as inputs for Hierarchical Bilinear to capture local feature relationships between layers in the feature maps and predict the results on three scales. The integrated results of both YOLOv3 and Hierarchical Bilinear show that the improved network only adds a few parameters compared to the original one. It improves the detection accuracy of the original algorithm without lowering the detection efficiency and thus is superior to the current behavior detection algorithms.