基于Spark的油藏数据挖掘与分析

doi:10.15888/j.cnki.csa.005985

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年7月27日 23:32 星期日

首页 > 过刊浏览>2017年第26卷第8期 >9-15. DOI:10.15888/j.cnki.csa.005985

PDF HTML阅读 XML下载导出引用引用提醒

基于Spark的油藏数据挖掘与分析
DOI:
                        10.15888/j.cnki.csa.005985
                    
CSTR:
                        
                    
作者:
                        武志军武志军
中国石油大学(华东) 计算机与通信工程学院, 青岛 266580
在期刊界中查找
在百度中查找
在本站中查找
夏盛瑜夏盛瑜
中国石油大学(华东) 计算机与通信工程学院, 青岛 266580
在期刊界中查找
在百度中查找
在本站中查找
王鹏王鹏
中国石油大学(华东) 计算机与通信工程学院, 青岛 266580
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

Reservoir Data Mining and Analysis Based on Spark

Author:

WU Zhi-Jun
WU Zhi-Jun
Computer and Communication Engineering, China University of Pertroleum, Qingdao 266580, China
在期刊界中查找
在百度中查找
在本站中查找
XIA Sheng-Yu
XIA Sheng-Yu
Computer and Communication Engineering, China University of Pertroleum, Qingdao 266580, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Peng
WANG Peng
Computer and Communication Engineering, China University of Pertroleum, Qingdao 266580, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

为了方便油藏数据特征的分析和石油的勘探开发过程，本文利用Spark并行计算框架分析油藏数据，并通过数据挖掘算法分析油藏属性之间的潜在关系，对油藏的不同层段进行了分类和预测.本文的主要工作包括：搭建Spark分布式集群和数据处理、分析平台，Spark是流行的大数据并行计算框架，相对传统的一些分析方法和工具，可以实现快速、准确的数据挖掘任务；根据油藏数据的特点建立多维异常检测函数，并新增渗孔比判别属性Pr；在处理不平衡数据时，针对逻辑回归分类提出交叉召回训练模型，并优化代价函数，针对决策树，提出KR-SMOTE对小类别样本进行过采样扩充，这两种方法都可以有效处理数据不平衡问题，提高分类精度.

关键词:Spark;数据挖掘;异常点检测;不平衡数据;分类

Abstract:

In order to improve the analysis of reservoir properties and oil exploration and development process, this paper analyzes data and finds relationships between reservoir properties using Spark parallel computing framework and data mining algorithm, and classifies and predicts different reservoir segments. The main work in this paper includes: building the Spark distributed clustering and data processing and analysis platform, Spark being a popular big data parallel computing framework, which can achieve fast and accurate data mining tasks compared with some traditional analysis methods and tools; establishing a multidimensional outlier detection function according to the characteristics of reservoir data and adding a new discriminant attribute Pr; proposing a cross-recall training model and optimized cost function for logistic regression classification in dealing with the imbalanced data. KR-SMOTE is used to oversample for decession tree classification that both improve the classification precision.

Key words:Spark;data mining;outlier detection;imbalanced data;classification

引用本文

武志军,夏盛瑜,王鹏.基于Spark的油藏数据挖掘与分析.计算机系统应用,2017,26(8):9-15

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2016-12-09
最后修改日期:
录用日期:
在线发布日期: 2017-10-31
出版日期:

微信公众号

网站二维码

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码