Abstract:The detection of key points of clothing plays an important role in the classification, recommendation, and retrieval of clothing. However, there are a large number of clothing pictures with deformation and complex background in the clothing database, which leads to the poor recognition rate of the existing clothing classification model and the effect of clothing recommendation and retrieval. For this reason, this study proposes a model called Cascaded Stacked Pyramid Network (CSPN) which combines the target detection method with the regression method. First, the costume target area is identified by the Faster R-CNN, and then the Cascaded Pyramid Network (CPN) is constructed based on the multi-level feature map generated by ResNet-101 structure. This model integrates the multi-scale and different-layer clothing image feature, and solves low image recognition accuracy about clothing key points of the deformation and complex background image. Experimental results show that the CSPN model has higher recognition rate on the key points of clothing than the other three models in the DeepFashion dataset.