Abstract:In the semi-supervised learning process, the performance of the classifier is often degraded and unstable due to the random selection of unlabeled samples. At the same time, the performance of the traditional semi-supervised learning algorithm is not sufficient for the classification problem of high-dimensional data containing only a small number of labeled samples. In order to solve these problems, this study proposes a safe semi-supervised learning algorithm S3LSE, which combines stochastic subspace technology with ensemble technology from the perspective of exploring data sample space and feature space. Firstly, S3LSE decomposes the high-dimensional data set into B feature subsets using random subspace technique, and optimizes each feature subset according to the implicit information among the samples to form B optimal feature subsets. Then, each optimal feature subset is sampled to form G sample subsets, and a safe sample marking method is used in each sample subset. The learning algorithm generates G classifiers and integrates G classifiers, and then integrates B classifiers generated by B optimal feature subsets to realize the classification of high-dimensional data. Finally, a high dimensional data set is used to simulate semi-supervised learning and the experiment result shows that the algorithm has better performance.