Abstract:The problem of imbalanced datasets has attracted people’s attention since two decades ago, and various solutions have been proposed. Mixup is a popular data synthesis method in recent years, with many variants extended. However, there are not many Mixup variants proposed for imbalanced datasets. This study proposes a Mixup variant, namely Borderline-mixup, to address the classification problem of imbalanced datasets, which uses a support vector machine (SVM) to select boundary samples and increases the probability that the boundary sample is sampled in the sampler. Two boundary samplers are constructed to replace the original random sampler. Extensive experiments have been conducted on 14 UCI datasets and CIFAR10 long-tail datasets. The results show that Borderline-mixup has outperformed Mixup consistently on UCI datasets by up to 49.3% and on CIFAR10 long-tail datasets by about 3%–3.6%. Therefore, the proposed Borderline-mixup is effective in the classification of imbalanced datasets.