Abstract:With the widespread application of the attention mechanism in object detection, further enhancing the feature extraction ability become the focus of research. A novel attention mechanism is proposed to optimize the feature interaction process and enhance the detection performance. The mechanism eliminates the query operation in traditional self-attention. It employs depth-separable convolution to efficiently extract both local and global information and realizes feature aggregation through the weighted fusion of keys and values. The method effectively reduces the computational complexity and enhances the model’s ability to capture important features. Through validation on five different types of datasets, the experimental results demonstrate that the attention mechanism exhibits excellent performance in handling small target detection, occlusion processing, and complex scenes, significantly improving detection accuracy and efficiency. Visual analysis further verifies its effectiveness in feature extraction.