###
计算机系统应用英文版:2021,30(2):97-102
本文二维码信息
码上扫一扫!
基于改进决策树的配电网多源数据快速检索
(1.国网黄冈供电公司 经济技术研究所, 黄冈 438701;2.天津楚能电力技术有限公司, 天津 300392;3.天津理工大学 电气电子工程学院, 天津 300384)
Fast Multi-Source Data Retrieval Method for Distribution Network Based on Improved Decision Tree
(1.Economic and Technical Research Institute, State Grid Huanggang Power Supply Company, Huanggang 438701, China;2.Tianjin Chuneng Electric Power Technology Company, Tianjin 300392, China;3.School of Electrical and Electronic Engineering, Tianjing University of Technology, Tianjin 300384, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 797次   下载 2013
Received:June 25, 2020    Revised:July 27, 2020
中文摘要: 当前, 电网中含有海量的多源信息数据, 但是由于数据体量大、种类多、维度高, 难以实现高效有效的数据检索. 因此本文根据实际电力运行系统的数据结构及多源数据库样本分析, 提出了一种基于互信息的改进决策树算法作为数据挖掘内核, 并提出适用于电力系统的并行处理架构, 可实现多源数据的快速、有效信息检索, 并有效处理实时数据. 在搜索时根据代表性特征子集直接从多源信息原始数据提取信息, 判断索引信息量并排序形成决策树模型, 通过Spark MapReduce Python数据分解并行检索实现多源数据同时提取, 缩短检索时间. 本文以某区域电网数据库为算例进行模拟验证, 结果表明: 该方法可以实现配电网的多源异构信息提取, 有效避免重复数据, 满足在线工程决策要求.
中文关键词: 决策树  并行计算  信息检索  多源异构
Abstract:At present, the power grid contains a large number of multi-source information data, but due to the large size of the data types and high multi-dimensions, it is difficult to achieve effective data retrieval.According to the data structure of actual power operation system and multi-source database sample analysis, an improved decision tree algorithm based on mutual information is proposed as the kernel of data mining, and a parallel processing architecture suitable for power system is put forward, which can retrieve multi-source data fast and efficiently. The information is directly extracted from the original data of multi-source information according to the representative feature subset during searching. The index information is judged and sorted to form the decision tree model, and multi-source data is extracted simultaneously through Spark MapReduce Python data decomposition and parallel retrieval, so as to shorten the retrieval time. Taking a regional power grid database as an example to simulate and verify, the results show that the method can realize multi-source heterogeneous information extraction of power distribution network, effectively avoid duplicate data, and meet the requirements of online engineering decision.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(51877152)
引用文本:
柯强,陈志华,胡经伟,陈焕军,邳志旺,张晗,周雪松.基于改进决策树的配电网多源数据快速检索.计算机系统应用,2021,30(2):97-102
KE Qiang,CHEN Zhi-Hua,HU Jing-Wei,CHEN Huan-Jun,PI Zhi-Wang,ZHANG Han,ZHOU Xue-Song.Fast Multi-Source Data Retrieval Method for Distribution Network Based on Improved Decision Tree.COMPUTER SYSTEMS APPLICATIONS,2021,30(2):97-102