Abstract:Object detection is widely used in the field of computer vision. In different occasions, we need to use different training set to train the model. However, manually generating label is very time consuming. This study proposed a semi-automatic method to generate labels for dataset, then automatically filter them according to the threshold set by image similarity, lastly retain the required images and corresponding labels as the final dataset. Experiments show that the method can both improve the speed and ensure accuracy rate of generating labels for dataset.