﻿ 基于EM和GMM的朴素贝叶斯岩性识别
 计算机系统应用  2019, Vol. 28 Issue (6): 38-44 PDF

1. 中国地质大学(北京) 数理学院, 北京 100083;
2. 中国石油长庆油田公司第四采气厂, 西安 710016;
3. 中国地质大学(北京) 地球物理与信息技术学院, 北京 100083

Naive Bayesian Lithology Recognition Based on EM and GMM
ZHAO Ming1, JIN Da-Quan2, ZHANG Yan3, GAO Shi-Chen1, ZHONG Ting-Ting1
1. School of Science, China University of Geosciences, Beijing 100083, China;
2. Fourth Gas Production Plant, Petro China Changqing, Xi’an 710016, China;
3. School of Geophysics and Information Technology, China University of Geosciences, Beijing 100083, China
Abstract: Naive Bayesian classifier can be applied to lithologic identification. The Gaussian distribution is often used to fit the probability distribution of continuous attributes, but it is not effective for complex logging data. To solve this problem, a hybrid Gaussian probability density estimation based on EM algorithm is proposed. Logging data of the lower ancient gas Wells in the block 41-33 of Sudong are selected as training samples, and data of 44-45 Wells are selected as test samples. The experiment uses the mixed Gaussian model based on EM algorithm, to estimate the probability density of logging data variables at first, and then applies it to the Naive Bayes classifier for the lithology identification. Finally, the fitting effect of the single Gaussian distribution function was used as the comparison. The results reveal that the mixed Gaussian model has a better fitting effect and the performance of the Naive Bayes classifier for the lithology identification could be improved through this way.
Key words: probability density estimation     EM algorithm     Naive Bayesian classification     lithology identification

1 引言

2 朴素贝叶斯 2.1 贝叶斯方法

 $P(h|D) = \frac{{P(h)P(D|h)}}{{P(D)}}$ (1)

2.2 朴素贝叶斯

 $P(c|x) = \frac{{P(c)P(x|c)}}{{P(x)}} = \frac{{P(c)}}{{P(x)}}\prod\limits_{i = 1}^d {P({x_i}|c)}$ (2)

 $h(x) = \arg \max P(c)\prod\limits_{i = 1}^d {P({x_i}|c)}$ (3)

3 概率密度估计 3.1 高斯混合模型

 $f(x) = \sum\limits_{i = 1}^m {{\varepsilon _i} * Guass({\mu _i},{\sigma _i})}$ (4)

3.2 EM算法

EM算法以极大似然估计为基本思想, 采用迭代的方法进行参数估计. EM算法的流程可以分为E步骤和M步骤. 首先要初始化分布参数 $\theta$ ；然后重复E、M步骤直到收敛[911]:

E步骤: 根据参数 $\theta$ 初始值或上一次迭代所得参数值来计算出隐性变量的后验概率(即隐性变量的期望), 作为隐性变量的估计值:

 ${Q_i}({{\rm{z}}^{(i)}}): = p({z^{(i)}}|{x^{(i)}};\theta ).$ (5)

M步骤: 将似然函数最大化以获得新的参数值:

 $\theta : = \arg \mathop {\max }\limits_\theta \sum\limits_i {\sum\limits_{{z^{(i)}}} {{Q_i}({z^{(i)}})\log \frac{{p({x^{(i)}},{z^{(i)}};\theta )}}{{{Q_i}({z^{(i)}})}}} }$ (6)
4 实例分析

 $2{P_e} = \int\limits_{ - \infty }^{{x_0}} {p(x|{\omega _2})dx + } \int\limits_{{x_0}}^{ + \infty } {p(x|{\omega _1})dx}$ (7)
 图 1 白云岩、泥岩AC估计效果对比

 图 2 由两个等概率类别的贝叶斯分类器形成的R1和R2两区域的例子

 图 3 高斯概率密度估计效果对比

 图 4 混合高斯概率密度估计效果对比

5 总结

 图 5 测试集岩性识别结果

 [1] 周志华. 机器学习. 北京: 清华大学出版社, 2016. 147–154. [2] 彭兴媛, 刘琼荪. 不同类变量下属性聚类的朴素贝叶斯分类算法. 计算机应用, 2011, 31(11): 3072-3074. [3] 金展, 范晶, 陈峰, 等. 基于朴素贝叶斯和支持向量机的自适应垃圾短信过滤系统. 计算机应用, 2008, 28(3): 714-718. [4] 李晶辉, 张小刚, 陈华, 等. 一种改进隐朴素贝叶斯算法的研究. 小型微型计算机系统, 2013, 34(7): 1654-1658. DOI:10.3969/j.issn.1000-1220.2013.07.041 [5] 王玮, 陈恩红, 王煦法. 基于贝叶斯方法的知识发现. 小型微型计算机系统, 2000, 21(7): 703-705. DOI:10.3969/j.issn.1000-1220.2000.07.009 [6] 秦锋, 任诗流, 程泽凯, 等. 基于属性加权的朴素贝叶斯分类算法. 计算机工程与应用, 2008, 44(6): 107-109. DOI:10.3778/j.issn.1002-8331.2008.06.033 [7] 钟金琴, 辜丽川, 檀结庆, 等. 基于分裂EM算法的GMM参数估计. 计算机工程与应用, 2012, 48(34): 28-32, 59. DOI:10.3778/j.issn.1002-8331.1206-0419 [8] 徐定杰, 沈忱, 沈锋. 混合高斯分布的变分贝叶斯学习参数估计. 上海交通大学学报, 2013, 47(7): 1119-1125. [9] Hobolth A, Jensen JL. Statistical inference in evolutionary models of DNA sequences via the EM algorithm. Statistical Applications in Genetics and Molecular Biology, 2005, 4(1): 18. [10] Taheri S, Mammadov M. Learning the naive Bayes classifier with optimization models. International Journal of Applied Mathematics and Computer Science, 2013, 23(4): 787-795. DOI:10.2478/amcs-2013-0059 [11] Jiang LX, Wang DH, Cai ZH, et al. Survey of improving naive Bayes for classification. Proceedings of the 3rd International Conference on Advanced Data Mining and Applications. Harbin, China. 2007. 134–145. [12] 袁照威, 段正军, 张春雨, 等. 基于马尔科夫概率模型的碳酸盐岩储集层测井岩性解释. 新疆石油地质, 2017, 38(1): 96-102. [13] 高世臣, 张丹. 多参数概率融合法在叠前地震储层预测中的应用--以苏里格气田苏194区块为例. 油气地质与采收率, 2015, 22(6): 61-67. DOI:10.3969/j.issn.1009-9603.2015.06.011 [14] Theodoridis S, Koutroumbas K. 模式识别. 李晶皎, 王爱侠, 王骄, 译. 北京: 电子工业出版社, 2010: 8–10.