本文已被:浏览 797次 下载 2013次
Received:June 25, 2020 Revised:July 27, 2020
Received:June 25, 2020 Revised:July 27, 2020
中文摘要: 当前, 电网中含有海量的多源信息数据, 但是由于数据体量大、种类多、维度高, 难以实现高效有效的数据检索. 因此本文根据实际电力运行系统的数据结构及多源数据库样本分析, 提出了一种基于互信息的改进决策树算法作为数据挖掘内核, 并提出适用于电力系统的并行处理架构, 可实现多源数据的快速、有效信息检索, 并有效处理实时数据. 在搜索时根据代表性特征子集直接从多源信息原始数据提取信息, 判断索引信息量并排序形成决策树模型, 通过Spark MapReduce Python数据分解并行检索实现多源数据同时提取, 缩短检索时间. 本文以某区域电网数据库为算例进行模拟验证, 结果表明: 该方法可以实现配电网的多源异构信息提取, 有效避免重复数据, 满足在线工程决策要求.
Abstract:At present, the power grid contains a large number of multi-source information data, but due to the large size of the data types and high multi-dimensions, it is difficult to achieve effective data retrieval.According to the data structure of actual power operation system and multi-source database sample analysis, an improved decision tree algorithm based on mutual information is proposed as the kernel of data mining, and a parallel processing architecture suitable for power system is put forward, which can retrieve multi-source data fast and efficiently. The information is directly extracted from the original data of multi-source information according to the representative feature subset during searching. The index information is judged and sorted to form the decision tree model, and multi-source data is extracted simultaneously through Spark MapReduce Python data decomposition and parallel retrieval, so as to shorten the retrieval time. Taking a regional power grid database as an example to simulate and verify, the results show that the method can realize multi-source heterogeneous information extraction of power distribution network, effectively avoid duplicate data, and meet the requirements of online engineering decision.
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(51877152)
引用文本:
柯强,陈志华,胡经伟,陈焕军,邳志旺,张晗,周雪松.基于改进决策树的配电网多源数据快速检索.计算机系统应用,2021,30(2):97-102
KE Qiang,CHEN Zhi-Hua,HU Jing-Wei,CHEN Huan-Jun,PI Zhi-Wang,ZHANG Han,ZHOU Xue-Song.Fast Multi-Source Data Retrieval Method for Distribution Network Based on Improved Decision Tree.COMPUTER SYSTEMS APPLICATIONS,2021,30(2):97-102
柯强,陈志华,胡经伟,陈焕军,邳志旺,张晗,周雪松.基于改进决策树的配电网多源数据快速检索.计算机系统应用,2021,30(2):97-102
KE Qiang,CHEN Zhi-Hua,HU Jing-Wei,CHEN Huan-Jun,PI Zhi-Wang,ZHANG Han,ZHOU Xue-Song.Fast Multi-Source Data Retrieval Method for Distribution Network Based on Improved Decision Tree.COMPUTER SYSTEMS APPLICATIONS,2021,30(2):97-102