Abstract:As a highly condensed high-level semantic information, the text information of Wujin style Tibetan scripts in natural scenes not only has great research and practical value, but also can be used to assist researchers with text understanding in Tibetan scenes. At present, there are few related studies on the detection and recognition of Wujin style Tibetan scripts in natural scenes. Based on the manually collected image data set of Wujin style Tibetan scripts in natural scenes, this study compares the detection performance of common text detection algorithms on such scripts. The recognition accuracy of the sequence-based text recognition algorithm, CRNN, under different feature extraction networks is also compared on the image data set collected. Examples of recognition failure during the recognition of Wujin style Tibetan scripts in 314 real natural scenes are analyzed as well. Experiments show that the differentiable binary network, DBNet, used in the text detection stage has better detection performance on the test set. The accuracy, recall, and F1 value of this method on the test set reach 0.89, 0.59, and 0.71, respectively; when MobileNetV3 Large is used as the feature extraction network in the text recognition stage, the CRNN algorithm has the highest recognition accuracy of 0.4365 on the test set.