Abstract:Aiming at the problem that many documents categorization methods seldom control hybrid variables and have low robustness to data distribution, a documents (text) categorization method based on covariate adjustment is proposed. Firstly, it is assumed that the confounding factors (variables) in text categorization can be observed in the training stage, but not in the testing stage. Then, the sum of confounding factors is calculated in the prediction stage under the condition of the confounding factors in the training stage. Finally, based on Pearl's covariate adjustment, the accuracy of text features and classification variables to the classifier is observed by controlling the confounding factors. The performance of the proposed method is verified by microblog data set and IMDB data set. The experimental results show that the proposed method can achieve higher classification accuracy and robustness against mixed variables than other methods.