Abstract:Interactive image segmentation is an important tool for pixel-level annotation and image editing. Most existing methods adopt two-stage prediction: first predicting a rough result, and then refining the previously predicted results in the second stage to obtain more accurate predictions. To ensure the viability of the network model under limited hardware resources, the same network is shared across the two stages. To better propagate labeled information to unlabeled areas, a similarity constraint propagation module is designed. Meanwhile, a simple prototype extraction module is used during training to make forward click vectors highly cohesive, accelerate network convergence, and remove them during inference. At the inference stage, the implementation of intention perception modules to capture details further improves prediction performance. Numerous experiments show that the method is most comparable to the most advanced methods on all popular benchmark tests, demonstrating its effectiveness.