本文已被:浏览 1298次 下载 2028次
Received:April 26, 2019 Revised:May 21, 2019
Received:April 26, 2019 Revised:May 21, 2019
中文摘要: 针对目前很多文本分类方法很少控制混杂变量,且分类准确度对数据分布的鲁棒性较低的问题,提出一种基于协变量调整的文本分类方法.首先,假设文本分类中的混杂因子(变量)可在训练阶段观察到,但无法在测试阶段观察到;然后,以训练阶段的混杂因子为条件,在预测阶段计算出混杂因子的总和;最后,基于Pearl的协变量调整,通过控制混杂因子来观察文本特征和分类变量对分类器的精度影响.通过微博数据集和IMDB数据集验证所提方法的性能,实验结果表明,与其他方法相比,所提方法处理混杂关系时,可以得到更高的分类准确度,且对混杂变量具备鲁棒性.
Abstract:Aiming at the problem that many documents categorization methods seldom control hybrid variables and have low robustness to data distribution, a documents (text) categorization method based on covariate adjustment is proposed. Firstly, it is assumed that the confounding factors (variables) in text categorization can be observed in the training stage, but not in the testing stage. Then, the sum of confounding factors is calculated in the prediction stage under the condition of the confounding factors in the training stage. Finally, based on Pearl's covariate adjustment, the accuracy of text features and classification variables to the classifier is observed by controlling the confounding factors. The performance of the proposed method is verified by microblog data set and IMDB data set. The experimental results show that the proposed method can achieve higher classification accuracy and robustness against mixed variables than other methods.
文章编号: 中图分类号: 文献标志码:
基金项目:山东省社会科学规划研究项目(17CTYJ03)
Author Name | Affiliation |
DONG Yuan-Yuan | Qilu Normal University, Jinan 250013, China |
Author Name | Affiliation |
DONG Yuan-Yuan | Qilu Normal University, Jinan 250013, China |
引用文本:
董园园.利用协变量调整控制混杂因子的鲁棒文本分类.计算机系统应用,2020,29(3):155-160
DONG Yuan-Yuan.Robust Text Categorization Using Covariates to Control Confounding Factors.COMPUTER SYSTEMS APPLICATIONS,2020,29(3):155-160
董园园.利用协变量调整控制混杂因子的鲁棒文本分类.计算机系统应用,2020,29(3):155-160
DONG Yuan-Yuan.Robust Text Categorization Using Covariates to Control Confounding Factors.COMPUTER SYSTEMS APPLICATIONS,2020,29(3):155-160