Abstract:GSNet relies on graspness to distinguish graspable areas in cluttered scenes, which significantly improves the accuracy of robot grasping pose detection in cluttered scenes. However, GSNet only uses a fixed-size cylinder to determine the grasping pose parameters and ignores the influence of features of different sizes on grasping pose estimation. To address this problem, this study proposes a multi-scale cylinder attention feature fusion module (Ms-CAFF), which contains two core modules: the attention fusion module and the gating unit. It replaces the original feature extraction method in GSNet and uses an attention mechanism to effectively integrate the geometric features inside the four cylinders of different sizes, thereby enhancing the network’s ability to perceive geometric features at different scales. The experimental results on GraspNet-1Billion, a grabbing pose detection dataset for large-scale cluttered scenes, show that after the introduction of the modules, the accuracy of the network’s grasping poses is increased by up to 10.30% and 6.65%. At the same time, this study applies the network to actual experiments to verify the effectiveness of the method in real scenes.