利用协变量调整控制混杂因子的鲁棒文本分类

doi:10.15888/j.cnki.csa.007161

微信公众号

网站二维码

首页 > 过刊浏览>2020年第29卷第3期 >155-160. DOI:10.15888/j.cnki.csa.007161

PDF HTML阅读 XML下载导出引用引用提醒

利用协变量调整控制混杂因子的鲁棒文本分类
DOI:
                        10.15888/j.cnki.csa.007161
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:山东省社会科学规划研究项目（17CTYJ03）

Robust Text Categorization Using Covariates to Control Confounding Factors

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对目前很多文本分类方法很少控制混杂变量，且分类准确度对数据分布的鲁棒性较低的问题，提出一种基于协变量调整的文本分类方法.首先，假设文本分类中的混杂因子（变量）可在训练阶段观察到，但无法在测试阶段观察到；然后，以训练阶段的混杂因子为条件，在预测阶段计算出混杂因子的总和；最后，基于Pearl的协变量调整，通过控制混杂因子来观察文本特征和分类变量对分类器的精度影响.通过微博数据集和IMDB数据集验证所提方法的性能，实验结果表明，与其他方法相比，所提方法处理混杂关系时，可以得到更高的分类准确度，且对混杂变量具备鲁棒性.

Abstract:

Aiming at the problem that many documents categorization methods seldom control hybrid variables and have low robustness to data distribution, a documents (text) categorization method based on covariate adjustment is proposed. Firstly, it is assumed that the confounding factors (variables) in text categorization can be observed in the training stage, but not in the testing stage. Then, the sum of confounding factors is calculated in the prediction stage under the condition of the confounding factors in the training stage. Finally, based on Pearl's covariate adjustment, the accuracy of text features and classification variables to the classifier is observed by controlling the confounding factors. The performance of the proposed method is verified by microblog data set and IMDB data set. The experimental results show that the proposed method can achieve higher classification accuracy and robustness against mixed variables than other methods.

参考文献

相似文献

引证文献

引用本文

董园园.利用协变量调整控制混杂因子的鲁棒文本分类.计算机系统应用,2020,29(3):155-160

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2019-04-26
最后修改日期:2019-05-21
录用日期:
在线发布日期: 2020-03-02
出版日期: 2020-03-15

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码