Abstract:In the scenario of human-computer interaction by the mouth, the light changes, the complexity of the small target detection, and the detection method of none generality factors under different scenarios have brought great difficulties to detect the mouth. In this study, we take the face images with different scenarios as data source and propose a face recognition algorithm based on Faster R-CNN. In this method, multi-scale feature maps are combined in Faster R-CNN framework for detection. Firstly, we introduce a modified multi-scale feature map to effectively utilize multi-resolution information. Then, feature maps need to share the same size, so that element-wise sum operation can be performed. Features with higher resolution and stronger expression ability can be obtained by up-sampling on the output feature map. The detection performance of the small target is improved. In the training experiment, multi-scale training and increasing the number of anchor points are used to enhance the robustness of the network to detect targets of different sizes. Experiments show that the detection accuracy of the mouth is improved by 8%, and it is more adaptable to the environment compared with the original Faster R-CNN.