本文已被:浏览 1819次 下载 3426次
Received:December 23, 2018 Revised:January 18, 2019
Received:December 23, 2018 Revised:January 18, 2019
中文摘要: C4.5算法是用于生成决策树的一种经典算法,虽然其有很强的噪声处理能力,但当属性值缺失率高时,分类准确率会明显下降,而且该算法在构建决策树时,需要多次扫描、排序数据集、以及频繁调用对数,针对以上缺点,本文提出一种改进的分类算法.采用一种基于朴素贝叶斯定理方法,来处理空缺属性值,提高分类准确率.通过优化精简计算公式,在计算过程中,改进后的计算公式使用四则混合运算代替原来的对数运算,减少构建决策树的运行时间.为了验证该算法的性能,通过对UCI数据库中5个数据集进行实验,实验结果表明,改进后的算法极大的提高了运行效率.
Abstract:C4.5 algorithm is a classical algorithm used to generate decision tree. Although it has strong noise processing ability, the classification accuracy of C4.5 algorithm decreases obviously when the missing rate of attribute value is high, and the algorithm needs to scan many times when constructing decision tree. This paper presents an improved classification algorithm for sorting data sets and calling logarithms frequently. A method based on naive Bayesian theorem is used to deal with the vacant attribute value and improve the classification accuracy. By optimizing and reducing the calculation formula, the improved formula uses four mixed operations to replace the original logarithmic operation, thus reducing the running time of constructing the decision tree. In order to verify the performance of the algorithm, five data sets in UCI database are tested. The experimental results show that the improved algorithm greatly improves the running efficiency.
文章编号: 中图分类号: 文献标志码:
基金项目:福建省科技厅自然科学基金(2017J01406)
引用文本:
韩存鸽,叶球孙.决策树分类算法中C4.5算法的研究与改进.计算机系统应用,2019,28(6):198-202
HAN Cun-Ge,YE Qiu-Sun.Research and Improvement of C4.5 Algorithm in Decision Tree Classification Algorithm.COMPUTER SYSTEMS APPLICATIONS,2019,28(6):198-202
韩存鸽,叶球孙.决策树分类算法中C4.5算法的研究与改进.计算机系统应用,2019,28(6):198-202
HAN Cun-Ge,YE Qiu-Sun.Research and Improvement of C4.5 Algorithm in Decision Tree Classification Algorithm.COMPUTER SYSTEMS APPLICATIONS,2019,28(6):198-202