Abstract:Fine-grained image classification is challenging due to the difficulty in the effective learning of discriminative objects in images. Therefore, this study proposes a weakly supervised fine-grained image classification algorithm based on the attention mechanism. This algorithm can accurately locate and identify the semantically sensitive features in fine-grained images. First, on the basis of the classic convolutional neural network, the overall information of an object can be expressed by the linear fusion of features. Then, the discriminative details of the features are further extracted through the visual attention mechanism to obtain a more complete fine-grained feature expression. The proposed algorithm combines linear fusion with the attention mechanism and it can be regarded as a network model of multi-network-branch cooperative training and joint optimization. Thus, the network model can better express the overall and local information. Experiments on three publicly available fine-grained identification datasets show that the proposed method is superior to the baseline method and achieves the advanced classification level.