本文已被:浏览 1399次 下载 1865次
Received:May 17, 2017 Revised:June 16, 2017
Received:May 17, 2017 Revised:June 16, 2017
中文摘要: 互联网技术不断发展,新浪微博作为公开的网络社交平台拥有庞大的活跃用户. 然而由于用户数量庞大,且个人信息并不一定真实,造成训练样本打标困难. 本文采用了一种多视图tri-training的方法,构建三个不同的视图,利用这些视图中少量已打标样本和未打标样本不断重复互相训练三个不同的分类器,最后集成这三个分类器实现用户性别判断. 本文用真实用户数据进行实验,发现和单一视图分类器相比,使用多视图tri-training学习训练后的分类器准确性更好,且需要打标的样本更少.
中文关键词: 性别判断 多视图学习 tri-training算法 数据挖掘
Abstract:With the high pace of internet technology, microblog, an opening free social network, has an awful lot of active users. However, the number of sina microblog users is very large and the personal information is not always true, leading to the situation that it is hard to label the user's gender. In this study, multi-view and tri-training learning method are used to solve these problems. First three different views are constructed and three different classifiers are trained with a small number of labeled samples. And then three different classifiers are trained repeatedly by unlabeled samples. Finally, we integrate three classifiers into one to judge the user gender. We use the real user data and find that the classifier using the multi-view and tri-training learning is better than the performance of the single view classifier and needs less labeled data.
文章编号: 中图分类号: 文献标志码:
基金项目:
引用文本:
孙启蕴.基于多视图Tri-Training的微博用户性别判断.计算机系统应用,2018,27(2):240-244
SUN Qi-Yun.Microblog User Gender Recognition with Multi-View and Tri-Training Learning.COMPUTER SYSTEMS APPLICATIONS,2018,27(2):240-244
孙启蕴.基于多视图Tri-Training的微博用户性别判断.计算机系统应用,2018,27(2):240-244
SUN Qi-Yun.Microblog User Gender Recognition with Multi-View and Tri-Training Learning.COMPUTER SYSTEMS APPLICATIONS,2018,27(2):240-244