Abstract:Few-shot semantic segmentation is a computer vision task that involves segmenting potential object categories in query images with a small number of annotated samples. However, existing methods still face two challenges. Firstly, there is a prototype bias problem, resulting in prototypes having less foreground object information and making it difficult to simulate real category statistics. The other issue is feature degradation, which means that the model only focuses on the current category rather than potential categories. This study proposes a new network based on contrastive prototypes and background mining. The main idea of the network is to enable the model to learn more representative prototypes and identify potential categories from the background. Specifically, a specific class learning branch constructs a large and consistent prototype dictionary and then uses InfoNCE loss to make the prototypes more discriminative. On the other hand, the background mining branch initializes background prototypes and uses an attention mechanism between the constructed background prototypes and the dictionary to mine potential categories. Experimental results on the PASCAL-5i and COCO-20i datasets demonstrate excellent performance of the model. Under the 1-shot setting using the ResNet-50 network, 64.9% and 44.2% are achieved, an improvement of 4.0% and 1.9%, respectively, compared to the baseline model.