Abstract:In the domain of 6D object pose estimation, existing algorithms often struggle to achieve precise and robust pose estimation of the target objects. To address this challenge, this study introduces an object 6D pose refinement network that incorporates residual attention, hybrid dilated convolution, and standard deviation information. Firstly, in the Gen6D image feature extraction network, traditional convolutional modules are replaced with hybrid dilated convolution modules to expand the receptive field and enhance the capability to capture global features. Subsequently, within the 3D convolutional neural network, a residual attention module is integrated. This assists in distinguishing the importance of feature channels, hence extracting key features while minimizing the loss of shallow-layer features. Finally, the study introduces standard deviation information into the average distance loss function, enabling the model to discern more pose information of the object. Experimental results demonstrate that the proposed network achieves ADD scores of 68.79% and 56.03% on the LINEMOD dataset and GenMOP dataset, respectively. Compared to the Gen6D network, there is an improvement of 1.78% and 5.64% in the ADD scores, validating the significant enhancement in the accuracy of 6D pose estimation brought about by the proposed network.