Abstract:Traditional random forest classification algorithm cannot distinguish the strong and weak classifiers by using the majority voting rule, and the value of its hyperparameter needs to be adjusted and optimized. This work studies the application technology of random forest algorithm in text classification and its advantages and disadvantages, and optimizes it. On one hand, optimize the voting method, perform weighted voting by combining classification effect and prediction probability of decision tree. On the other hand, an algorithm combining random search and grid search is proposed to optimize the hyperparameters in random forest. The experimental results in python environment show that the proposed method has sound performance in text classification.