Abstract:YOLOx-Darknet53 is an improved detection network integrating a basis of you only look once version 3 (YOLOv3) with various tricks added. Nevertheless, it still uses Darknet53 as the backbone network to extract features, so the feature extraction capability of the network is still insufficient. In this study, we acquire a contextual attention (CoA) module by improving the attention mechanism in CoTNet and replace the 3×3 convolution in the residual block of the YOLOx backbone network with the module to obtain a new residual block after attention fusion and thereby strengthen the feature extraction capability of the backbone network. A comparison experiment is conducted on the Pascal VOC2007 data set. The mean average precision AP@[.5:.95] and the AP@0.5 of the network integrating the CoA module are both 1.4 higher than those of the original network. After the backbone network is improved, a non-parameter 3D attention module is added in front of the YOLOx detection head to obtain the final improved detection network. The results of another round of the above comparative experiment show that the AP@[.5:.95] and the AP@0.5 of the final network are respectively 1.6 and 1.5 higher than those of the original network. Therefore, the improved network is more accurate than the original network in detection and can achieve better detection effects in industrial applications.