Abstract:Fast And Efficient Subgroup Set Discovery (FSSD) is a subgroup discovery algorithm that aims to provide a diverse set of patterns in a short period of time. However, in order to reduce the running time, this algorithm selects a feature subset with a small number of domains. When the feature subset is irrelevant or weakly related to the target class, the quality of the pattern set decreases. To solve this problem, this paper proposes a FSSD algorithm based on ensemble feature selection. In the preprocessing stage, it uses ensemble feature selection based on ReliefF (Relief-F) and analysis of variance to obtain feature subset with diversity and strong correlation, and then uses FSSD algorithm to return high-quality pattern set. The experimental results on the UCI datasets and the National Health and Nutrition Examination Survey (NHANES) dataset show that the improved FSSD algorithm improves the quality of the pattern set, thereby summarizing more interesting knowledge. Furthermore, the feature validity and positive predictive value of the pattern set were further analyzed on the NHANES dataset.