本文已被:浏览 1756次 下载 3124次
Received:November 23, 2018 Revised:December 12, 2018
Received:November 23, 2018 Revised:December 12, 2018
中文摘要: 传统随机森林分类算法采用平均多数投票规则不能区分强弱分类器,而且算法中超参数的取值需要调节优化.在研究了随机森林算法在文本分类中的应用技术及其优缺点的基础上对其进行改进,一方面对投票方法进行优化,结合决策树的分类效果和预测概率进行加权投票,另一方面提出一种结合随机搜索和网格搜索的算法对超参数调节优化.Python环境下的实验结果表明本文方法在文本分类上具有良好的性能.
Abstract:Traditional random forest classification algorithm cannot distinguish the strong and weak classifiers by using the majority voting rule, and the value of its hyperparameter needs to be adjusted and optimized. This work studies the application technology of random forest algorithm in text classification and its advantages and disadvantages, and optimizes it. On one hand, optimize the voting method, perform weighted voting by combining classification effect and prediction probability of decision tree. On the other hand, an algorithm combining random search and grid search is proposed to optimize the hyperparameters in random forest. The experimental results in python environment show that the proposed method has sound performance in text classification.
keywords: random forest text classification weighted voting hyperparametric optimization random search grid searchs
文章编号: 中图分类号: 文献标志码:
基金项目:
引用文本:
刘勇,兴艳云.基于改进随机森林算法的文本分类研究与应用.计算机系统应用,2019,28(5):220-225
LIU Yong,XING Yan-Yun.Research and Application of Text Classification Based on Improved Random Forest Algorithm.COMPUTER SYSTEMS APPLICATIONS,2019,28(5):220-225
刘勇,兴艳云.基于改进随机森林算法的文本分类研究与应用.计算机系统应用,2019,28(5):220-225
LIU Yong,XING Yan-Yun.Research and Application of Text Classification Based on Improved Random Forest Algorithm.COMPUTER SYSTEMS APPLICATIONS,2019,28(5):220-225