Abstract:Adding specific perturbations to images can help generate adversarial samples that mislead deep neural networks to output incorrect results. More powerful attack methods can facilitate research on the security and robustness of network models. The attack methods are divided into white-box and black-box attacks, and the transferability of adversarial samples can be used to attack other black-box ones by the results generated by known models. Attacks based on linear integrated gradients (TAIG-S) can generate highly transferable adversarial samples, but they are affected by noise in the linear path, superimposing pixel gradients that are irrelevant to the prediction results, which limits the success rate of attacks. With guided integrated gradients, the proposed Guided-TAIG method uses adaptive adjustment to correct some pixel values with low absolute values on each segment of the integrated path calculation and finds the starting point of the next step within a certain interval, circumventing the accumulation of meaningless gradient noise. The experiments on the ImageNet dataset show that Guided-TAIG outperforms FGSM, C&W, and TAIG-S for white-box attacks on both CNN and Transformer architecture models, produces smaller perturbations, and has better performance for transferable attacks in the black-box mode. This demonstrates the effectiveness of the proposed method.