Abstract:Fine-grained image classification is an important branch in the field of deep learning image classification. Since many different classified images are very similar in their features, and there is no particularly distinctive feature can be used to distinguish among them, it makes the classification task of fine-grained image more difficult than that of the general image. Therefore, a traditional image classification method needs to be optimized. Usually, visual and pixel-level features extraction is used in the training of the general image classification. However, direct application of this method to the fine-grained classification is not very suitable, and the effect still needs to be improved, while non-pixel-level features can be used to distinguish. Hence, we propose to combine text and visual information in the image classification, make full use of the features on the images, combine the text detection and recognition algorithms with general image classification methods, and apply it to the fine-grained image classification. In Con-text dataset, the experimental result shows that the accuracy obtained by the proposed algorithm has been significantly improved.