Abstract:In this study, a multi-branch network that integrates multi-scale frequency features and depth map features trained by generative adversarial network (GAN) is proposed. Specifically, edge texture information in high-frequency features is beneficial to capturing moire patterns. Low-frequency features are more sensitive to color distortion. Depth maps are more discriminative than RGB images from the visual level as auxiliary information. Supervised multi-view contrastive learning is employed to further enhance multi-view feature learning. Moreover, a two-stage bilinear feature fusion method is proposed to effectively integrate multi-branch features from different views. To evaluate the model, ablation experiments, feature fusion comparison experiments, intra-set experiments and inter-set experiments are conducted on four widely used public datasets, namely CASIA-FASD, Replay-Attack, MSU-MFSD, and OULU-NPU. The experiment result shows that the average HTER of the proposed model on the four tested protocols is 5% (20.3% to 15.0%) better than the DFA method in the inter-set evaluation.