本文已被:浏览 807次 下载 1297次
Received:May 19, 2021 Revised:June 14, 2021
Received:May 19, 2021 Revised:June 14, 2021
中文摘要: FSSD (fast and efficient subgroup set discovery)是一种子群发现算法, 旨在短时间内提供多样性模式集, 然而此算法为了减少运行时间, 选择域数量少的特征子集, 当特征子集与目标类不相关或者弱相关时, 模式集质量下降. 针对这个问题, 提出一种基于集成特征选择的FSSD算法, 它在预处理阶段使用基于ReliefF (Relief-F)和方差分析的集成特征选择来获得多样性和相关性强的特征子集, 再使用FSSD算法返回高质量模式集. 在UCI数据集、全国健康和营养调查报告(NHANES)数据集上的实验结果表明, 改进后的FSSD算法提高了模式集质量, 归纳出更有趣的知识. 在NHANES数据集上, 进一步分析模式集的特征有效性和阳性预测值.
Abstract:Fast And Efficient Subgroup Set Discovery (FSSD) is a subgroup discovery algorithm that aims to provide a diverse set of patterns in a short period of time. However, in order to reduce the running time, this algorithm selects a feature subset with a small number of domains. When the feature subset is irrelevant or weakly related to the target class, the quality of the pattern set decreases. To solve this problem, this paper proposes a FSSD algorithm based on ensemble feature selection. In the preprocessing stage, it uses ensemble feature selection based on ReliefF (Relief-F) and analysis of variance to obtain feature subset with diversity and strong correlation, and then uses FSSD algorithm to return high-quality pattern set. The experimental results on the UCI datasets and the National Health and Nutrition Examination Survey (NHANES) dataset show that the improved FSSD algorithm improves the quality of the pattern set, thereby summarizing more interesting knowledge. Furthermore, the feature validity and positive predictive value of the pattern set were further analyzed on the NHANES dataset.
文章编号: 中图分类号: 文献标志码:
基金项目:福建省自然科学基金(2018J01794)
引用文本:
张崟,何振峰.基于集成特征选择的FSSD算法.计算机系统应用,2022,31(3):275-281
ZHANG Yin,HE Zhen-Feng.FSSD Algorithm Based on Ensemble Feature Selection.COMPUTER SYSTEMS APPLICATIONS,2022,31(3):275-281
张崟,何振峰.基于集成特征选择的FSSD算法.计算机系统应用,2022,31(3):275-281
ZHANG Yin,HE Zhen-Feng.FSSD Algorithm Based on Ensemble Feature Selection.COMPUTER SYSTEMS APPLICATIONS,2022,31(3):275-281