Abstract:This study proposes a new method for text detection and recognition in complex scenes to eliminate the shortcomings of a complicated text recognition process, poor adaptability, and low accuracy. This method is composed of a text area detection network and a text recognition network. The text area detection network is an improved PSENet. The backbone network of PSENet is changed to ResNeXt-101, and a differentiable binarization operation is added to optimize the segmentation network in the feature extraction process, which not only simplifies post-processing but also improves text detection. The text recognition network is formed by combining a convolutional neural network with a long short-term memory network with aggregate cross-entropy loss. The introduction of aggregate cross-entropy improves the accuracy of text recognition. Furthermore, experimental verification is carried out on two data sets, and the results show that the new method has accuracy as high as 95.6%, which is better than the previous methods. This method can effectively detect and recognize any text instances and has good practicability.