Fast Multi-Source Data Retrieval Method for Distribution Network Based on Improved Decision Tree
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [14]
  • |
  • Related
  • | | |
  • Comments
    Abstract:

    At present, the power grid contains a large number of multi-source information data, but due to the large size of the data types and high multi-dimensions, it is difficult to achieve effective data retrieval.According to the data structure of actual power operation system and multi-source database sample analysis, an improved decision tree algorithm based on mutual information is proposed as the kernel of data mining, and a parallel processing architecture suitable for power system is put forward, which can retrieve multi-source data fast and efficiently. The information is directly extracted from the original data of multi-source information according to the representative feature subset during searching. The index information is judged and sorted to form the decision tree model, and multi-source data is extracted simultaneously through Spark MapReduce Python data decomposition and parallel retrieval, so as to shorten the retrieval time. Taking a regional power grid database as an example to simulate and verify, the results show that the method can realize multi-source heterogeneous information extraction of power distribution network, effectively avoid duplicate data, and meet the requirements of online engineering decision.

    Reference
    [1] 薛禹胜, 赖业宁. 大能源思维与大数据思维的融合(一)大数据与电力大数据. 电力系统自动化, 2016, 40(1): 1–8. [doi: 10.7500/AEPS20151208005
    [2] 高鹏翔. 基于多源数据融合的配电网运行故障特征信息提取技术研究[硕士学位论文]. 北京: 华北电力大学(北京), 2019.
    [3] 杨挺, 翟峰, 赵英杰, 等. 泛在电力物联网释义与研究展望. 电力系统自动化, 2019, 43(13): 9–20, 53. [doi: 10.7500/AEPS20190418015
    [4] 任锦标. 基于数据仓库及决策树算法的电网事故事件信息智能检索方法研究. 集成电路应用, 2019, 36(12): 86–87
    [5] 曲朝阳, 孙立擎, 潘峰, 等. 基于流形排序的电网截面数据检索. 科学技术与工程, 2016, 16(15): 239–244. [doi: 10.3969/j.issn.1671-1815.2016.15.043
    [6] 龙禹, 吴尚远, 高骞, 等. 基于B+树的电力大数据混合索引设计与实现. 自动化与仪器仪表, 2018, (9): 67–69
    [7] 黄华林, 庞欣婷. 基于Hadoop的数据资源管理平台设计. 计算机应用与软件, 2018, 35(7): 329–333. [doi: 10.3969/j.issn.1000-386x.2018.07.059
    [8] 杜红军, 李巍, 张文杰, 等. 基于云计算技术的电力大数据分布式检索系统. 电网与清洁能源, 2018, 34(9): 19–24. [doi: 10.3969/j.issn.1674-3814.2018.09.004
    [9] Quinlan RJ. Induction of decision trees. Machine Learning, 1986, 1(1): 81–106
    [10] Quinlan RJ. C4.5: Programs for machine learning. San Mateo: Morgan Kaufmann Publish, 1993.
    [11] Shafer JC, Agrawal R, Mehta M. SPRINT: A scalable parallel classifier for data mining. Proceedings of the 22th International Conference on Very Large Data Bases. Bombay, India. 1996. 544–555.
    [12] Rastogi R, Shim K. PUBLIC: A decision tree classifier that integrates building and pruning. Data Mining and Knowledge Discovery, 2000, 4(4): 315–344. [doi: 10.1023/A:1009887311454
    [13] Tang PS, Tang XL, Tao ZY, et al. Research on feature selection algorithm based on mutual information and genetic algorithm. Proceedings of the 2014 11th International Computer Conference on Wavelet Active Media Technology and Information Processing. Chengdu, China. 2014. 403–406.
    [14] Ding C, Peng HC. Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology, 2003, 3(2): 185–205. [doi: 10.1142/S0219720005001004
    Related
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

柯强,陈志华,胡经伟,陈焕军,邳志旺,张晗,周雪松.基于改进决策树的配电网多源数据快速检索.计算机系统应用,2021,30(2):97-102

Copy
Share
Article Metrics
  • Abstract:954
  • PDF: 2294
  • HTML: 1315
  • Cited by: 0
History
  • Received:June 25,2020
  • Revised:July 27,2020
  • Online: January 29,2021
Article QR Code
You are the first1025880Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063