(1.四川轻化工大学 自动化与信息工程学院, 宜宾 643002;2.人工智能四川省重点实验室, 宜宾 644002)
Kinship Recognition Based on Self-attention Mechanism
(1.School of Automation and Information Engineering, Sichuan University of Science & Engineering, Yibin 643002, China;2.Sichuan Key Laboratory of Artificial Intelligence, Yibin 644002, China)
Received:December 21, 2022    Revised:January 09, 2023
中文摘要: 目前, 基于局部注意力机制的卷积神经网络(CNNs)用于亲属关系识别特征提取获得了不错的效果, 但基于卷积神经网络的主干模型提升不明显, 同时鲜有研究者使用具有全局信息捕获能力的自注意机制. 因此, 提出使用基于一种无卷积主干特征提取网络的S-ViT模型, 即用具有自全局注意力机制的Vision Transformer作为基础主干特征提取网络, 通过构建孪生网络与具有局部注意力机制的CNN相结合, 扩大传统分类网络, 用于亲属关系识别相关问题的研究. 最终实验结果表明, 相比RFIW2020挑战赛领先的方法, 所提出的方法在亲属关系识别3个任务上获得了良好的效果, 第1个任务中获得了76.8%验证精度排名第二, 第2个和第3个任务中排名第三, 证明了该方法的可行性和有效性, 为亲属关系识别提出了一种新的解决方法.
Abstract:At present, convolutional neural networks (CNNs) based on local attention mechanism have yielded sound results in feature extraction of kinship recognition. However, the improvement of backbone models based on CNNs is not obvious, and few researchers employ self-attention mechanisms with global information capture ability. Therefore, an S-ViT model based on a convolution-free backbone feature extraction network is proposed, which is to adopt Vision Transformer with a self-global attention mechanism as the basic backbone feature extraction network. By constructing a twin network and a CNN with a local attention mechanism, the traditional classification network is expanded for research on related issues of kinship recognition. The final experimental results show that compared with the leading method of the RFIW2020 Challenge, the proposed method has performed well in the three kinship recognition tasks. The first task ranks second with verification accuracy of 76.8%, and the second and third tasks rank third. As a result, the feasibility and effectiveness of the method are improved to propose a new solution to kinship recognition.
