本文已被:浏览 1483次 下载 1918次
Received:December 02, 2018 Revised:December 25, 2018
Received:December 02, 2018 Revised:December 25, 2018
中文摘要: 半监督学习过程中,由于无标记样本的随机选择造成分类器性能降低及不稳定性的情况经常发生;同时,面对仅包含少量有标记样本的高维数据的分类问题,传统的半监督学习算法效果不是很理想.为了解决这些问题,本文从探索数据样本空间和特征空间两个角度出发,提出一种结合随机子空间技术和集成技术的安全半监督学习算法(A safe semi-supervised learning algorithm combining stochastic subspace technology and ensemble technology,S3LSE),处理仅包含极少量有标记样本的高维数据分类问题.首先,S3LSE采用随机子空间技术将高维数据集分解为B个特征子集,并根据样本间的隐含信息对每个特征子集优化,形成B个最优特征子集;接着,将每个最优特征子集抽样形成G个样本子集,在每个样本子集中使用安全的样本标记方法扩充有标记样本,生成G个分类器,并对G个分类器进行集成;然后,对B个最优特征子集生成的B个集成分类器再次进行集成,实现高维数据的分类.最后,使用高维数据集模拟半监督学习过程进行实验,实验结果表明S3LSE具有较好的性能.
Abstract:In the semi-supervised learning process, the performance of the classifier is often degraded and unstable due to the random selection of unlabeled samples. At the same time, the performance of the traditional semi-supervised learning algorithm is not sufficient for the classification problem of high-dimensional data containing only a small number of labeled samples. In order to solve these problems, this study proposes a safe semi-supervised learning algorithm S3LSE, which combines stochastic subspace technology with ensemble technology from the perspective of exploring data sample space and feature space. Firstly, S3LSE decomposes the high-dimensional data set into B feature subsets using random subspace technique, and optimizes each feature subset according to the implicit information among the samples to form B optimal feature subsets. Then, each optimal feature subset is sampled to form G sample subsets, and a safe sample marking method is used in each sample subset. The learning algorithm generates G classifiers and integrates G classifiers, and then integrates B classifiers generated by B optimal feature subsets to realize the classification of high-dimensional data. Finally, a high dimensional data set is used to simulate semi-supervised learning and the experiment result shows that the algorithm has better performance.
keywords: high dimensional data semi-supervised learning stochastic subspace ensemble technology classification
文章编号: 中图分类号: 文献标志码:
基金项目:陕西省自然科学基础研究计划(2015JM6347);商洛学院科研项目(14SKY026);商洛学院科技创新团队建设项目(18SCX002);商洛学院重点学科建设项目(学科名:数学)
引用文本:
赵建华,刘宁.面向高维数据的安全半监督分类算法.计算机系统应用,2019,28(5):178-184
ZHAO Jian-Hua,LIU Ning.Safe Semi-supervised Classification Algorithm for High Dimensional Data.COMPUTER SYSTEMS APPLICATIONS,2019,28(5):178-184
赵建华,刘宁.面向高维数据的安全半监督分类算法.计算机系统应用,2019,28(5):178-184
ZHAO Jian-Hua,LIU Ning.Safe Semi-supervised Classification Algorithm for High Dimensional Data.COMPUTER SYSTEMS APPLICATIONS,2019,28(5):178-184