Abstract:To address the shortcomings of existing image tampering detection methods in terms of detection and localization performance as well as robustness, a multi-scale perceptual learning network (MsPL-Net) is proposed. Firstly, to expand the receptive field and address the issue of weak feature robustness resulting from diverse image post-processing and operation types, a hierarchical dense linked multi-scale dilated convolution module (MSDCM) is introduced. This module expands the receptive field to capture multi-scale feature information while preserving the high-resolution representation of input images, seamlessly extracting intricate image details and edge information. Secondly, to solve the problem of blurred tampered edge positions caused by sensitivity to tampering size, an information complementary perception attention module (ICPAM) is proposed, consisting of global attention, local attention, and a gated feature modulator. The global and local attention mechanisms operate in parallel and complement each other: through feature interaction and fusion, the model’s representational capacity is enhanced, leading to improved localization performance. Global attention captures the overall shape, structure, or background information of the image, while local attention focuses on learning the local regions and specific details of the image. The two mechanisms interact and integrate to enhance positioning accuracy. The gated feature modulator employs fine embeddings to filter out irrelevant features and noise responses from the global and local feature maps. This facilitates downstream recognition and learning of abnormal textures, edge changes, and other feature information caused by different tampering techniques. Finally, a novel joint loss function is designed to further enhance the detection performance and localization accuracy of the network. Compared with the latest works, the detection accuracy of the proposed method is improved by 2.3%. In addition, the proposed method demonstrates excellent performance in terms of robustness and generalization, offering more accurate and clear localization.