﻿ 基于粗糙集的慢性病变分级方法
 计算机系统应用  2018, Vol. 27 Issue (12): 268-273 PDF

Classification Method of Chronic Lesions Based on Rough Sets
HU Jian-Qiang, WANG Yuan
School of Computer and Information Engineering, Xiamen University of Technology, Xiamen 361024, China
Foundation item: National Natural Science Foundation of China (61872436); Natural Science Foundation of Fujian Province (2018J01570); Xiamen Science and Technology Foundation (3502Z20173035); Next Generation of Internet Technology of CERNET Innovation Program (NGI20160708)
Abstract: Because of physiological monitoring data has time continuity, inaccuracy, and fuzziness, the traditional classification algorithm is difficult to be used directly. In view of the above problems, a classification method of chronic lesions based on rough sets is proposed. First, the physiological monitoring data are discretized based on fusion of correlation and Chi-merge statistics. Then, this method uses the attribute reduction algorithm based on the compatibility matrix to remove the redundant attributes of the data. Finally, classification rules are mined based on batch and incremental data, and intelligent classification of chronic diseases can be realized by applying the above rules based on MapReduce framework. Experiments show that the method has a high recognition rate, which is helpful for the individual to fully understand the health risks.
Key words: cloud health monitoring     intelligent classification     physiological monitoring data discretization

1 方法原理 1.1 融合相关度和Chi-merge统计量的属性离散化

 \begin{aligned}{V_{{c_i}}} =&\{ [v_{{c_i}}^{\min ({x_1})},v_{{c_i}}^{\max ({x_1})}],[v_{{c_i}}^{\min ({x_2})},v_{{c_i}}^{\max ({x_2})}],\cdots,\\&[v_{{c_i}}^{\min ({x_n})}\!,\!v_{{c_i}}^{\max ({x_n})}\!]\!\} \end{aligned} (1)

 $g(a,b) = \left\{ {\begin{array}{*{20}{l}}0 & {{b^{\max }} \le {a^{\min }},{a^{\max }} \le {b^{\min }}}\\{({a^{\max }} - {b^{\min }})/L} & {{a^{\min }} < {b^{\min }} \le {a^{\max }} \le {b^{\max }}}\\1 & {{a^{\min }} < {b^{\min }} \le {b^{\max }} \le {a^{\max }}}\\{({b^{\max }} - {a^{\min }})/L} & {{b^{\min }} < {a^{\min }} \le {b^{\max }} < {a^{\max }}}\\1 & {{b^{\min }} \le {a^{\min }} < {a^{\max }} \le {b^{\max }}}\end{array}} \right.$ (2)

 ${\chi ^2} = \sum\limits_{i = 1}^2 {\sum\limits_{j = 1}^p {\frac{{{{({A_{ij}} - {E_{ij}})}^2}}}{{{E_{ij}}}}} }$ (3)

① 根据对象 $\scriptstyle {x_i}$ 在属性 $\scriptstyle {c_j}$ 的取值特点(区间数长度和相邻区间的交叉

② 对象 $\scriptstyle U = \{ {x_1},{x_2},\cdots,{x_n}\}$ 属性 $\scriptstyle {c_j}$ 上的取值排序, 记为 $\scriptstyle {U^{({c_j})}} =$ $\scriptstyle \{ {x_1}^{({c_j})},{x_2}^{({c_j})},\cdots,{x_n}^{({c_j})}\}$ ;

③ 遍历 $\scriptstyle {U^{({c_j})}}$ 比较相邻对象 $\scriptstyle {x_i}^{({c_j})}$ ( $\scriptstyle 1 \leqslant i \leqslant n$ )

$\scriptstyle g(x_i^{({c_j})},x_{i + 1}^{({c_j})}) > {g_j}$ , 继续遍历;

$\scriptstyle g(x_i^{({c_j})},x_{i + 1}^{({c_j})}) \leqslant {g_j}$ $\scriptstyle f(x_i^{({c_j})}) = f(x_{i + 1}^{({c_j})})$ , 决策相同继续遍历;

$\scriptstyle g(x_i^{({c_j})},x_{i + 1}^{({c_j})}) \leqslant {g_j}$ $\scriptstyle f(x_i^{({c_j})}) \ne f(x_{i + 1}^{({c_j})})$ , 添加断点( $\scriptstyle x_i^{({c_j})},x_{i + 1}^{({c_j})}$ 区间最小值);

④ 按遍历取断点从小到大排序实现连续数据离散化; 选取显著性水平并结合自由度( $\scriptstyle r(d)-1$ ), 确定 $\scriptstyle {c_j}$ 统计量阈值 $\scriptstyle \chi _j^2$ ;

⑤ 计算连续数据离散化相邻区间 $\scriptstyle {\chi ^2}$ 统计量, 如果大于 $\scriptstyle \chi _j^2$ , 则相邻区间合并;

⑥ 按⑤循环迭代合并直至满足: 相邻区间 $\scriptstyle {\chi ^2}$ 统计量大于阈值 $\scriptstyle \chi _j^2$ 或最终保留区间个数.

1.2 基于相容矩阵的属性约简算法

 ${\beta _c} = \frac{{\left| {\mathop \cup \limits_{{Y_j} \in U/D} {C_\alpha }{Y_j}} \right|}}{{\left| U \right|}}$ (4)

 $\begin{array}{l}{M_c} = {({r_{ij}})_{\left| U \right| \times \left| U \right|}} = \\\left\{ {\begin{array}{*{20}{l}}\begin{array}{l}0\;\;\;\;\;\exists c \in C,f({x_i},c) \ne f({x_j},c) \wedge f({x_i},c) \ne \\\;\;\;\;\;\;\;* \wedge f({x_j},c) \ne *,f({x_i},D) \ne f({x_j},D)\end{array}\\{1\;\;\;\;\;\;\;{{\text{其他}}}}\end{array}} \right.\\1 \le i,j \le \left| U \right|\end{array}$ (5)

 ${M_P} \cap {M_Q} = {({r_{ij}})_{\left| U \right| \times \left| U \right|}} \cap {({r_{ij}})_{\left| U \right| \times \left| U \right|}} = {(\min ({r_{ij}},{r_{ij}}))_{\left| U \right| \times \left| U \right|}}.$

① 采用相容矩阵并基于启发式算法[13]将不完备决策表 $\scriptstyle S{{ = (U,R,V,f)}}$ 转化为完备数据集;

② 将条件属性 $\scriptstyle C = \{ {c_1},{c_2},\cdots,{c_m}\}$ 所有组合形式通过二进制编码表示;

③ 计算编码j= 1, 2, $\cdots,\scriptstyle {2^m} - 1$ 所对应的属性组合 $\scriptstyle {C_j}$ 在参数 $\scriptstyle \alpha$ 条件下的分类质量 $\scriptstyle {\beta _{{C_j}}}$

 $\scriptstyle {\beta _{{C_j}}} = \frac{{\left| {\mathop \cup \limits_{{Y_j} \in U/D} {C_\alpha }{Y_j}} \right|}}{{\left| U \right|}}$ (6)

④ 如果 $\scriptstyle {\beta _{{C_j}}} = {\beta _C}$ , 则用属性组合 $\scriptstyle {C_j}$ 代替属性集 $\scriptstyle C$ , 即 $\scriptstyle C = {C_j}$ ; 重复步骤②;

⑤ 删除冗余及全为零的行, 形成最小约简属性集 $\scriptstyle C$ .

1.3 批量和增量相结合的分类规则挖掘

 $C({x_1}),C({x_2}),\cdots,C({x_n}) \to {D_1}({x_1}),{D_2}({x_2}),\cdots,{D_{r(d)}}({x_n}),$

 $str(C,D) = \frac{{\left| {{C_U} \cap {D_U}} \right|}}{{\left| U \right|}}$ (7)
 $cer(C,D) = \frac{{\left| {{C_U} \cap {D_U}} \right|}}{{\left| {{C_U}} \right|}}$ (8)
 $\operatorname{cov} (C,D) = \frac{{\left| {{C_U} \cap {D_U}} \right|}}{{\left| {{D_U}} \right|}}$ (9)

① 根据生理监测数据确定对象的条件属性 $\scriptstyle C$ 和决策属性 $\scriptstyle D$ , 决策表 $\scriptstyle S{{ = (U,R,V,f)}}$ , 选取生理监测数据集 $\scriptstyle Dat{a_0}$ ;

② 对 $\scriptstyle Dat{a_0}$ 进行预处理, 融合相关度和Chi-merge统计量的属性离散化连续属性值; 基于相容矩阵约简属性, 得到数据集 $\scriptstyle Dat{a_C}$ ;

③ 调用基于粗糙集分类算法[14]得到一组最优的分类规则集并保存在分类规则库中;

④ 读取增量数据集并进行预处理(融合相关度和Chi-merge统计量的属性离散化), 得到 $\scriptstyle Dat{a_C}$ ; 如果增量数据集为空, 跳转到⑥;

⑤ 用当前分类规则库中的规则对 $\scriptstyle Dat{a_C}$ 进行分类, 如果达到预期效果(规则强度、确定度和覆盖度), 则将 $\scriptstyle Dat{a_C}$ $\scriptstyle \Delta Dat{a_C}$ 合并 $\scriptstyle Dat{a_C} = Dat{a_C} \cup \Delta Dat{a_C}$ , 跳转到④; 否则执行③;

⑥ 输出当前分类规则集.

2 实验

 图 1 云健康监测系统

 $\begin{array}{l}{R_1}:f(x,{c_1}) \ge 57 \wedge f(x,{c_2}) \ge 26.4 \wedge f(x,{c_4}) \ge 0\\ \wedge f(x,{c_5}) = 1 \wedge f(x,{c_6}) \ge 0.89 \wedge f(x,{c_7}) \ge (60,90]\\ \wedge f(x,{c_8}) = (6.98,9.16] \wedge f(x,{c_9}) \ge (115,148] \to\\ f(x,d) = H\end{array}$
 $\begin{array}{l}{R_2}:f(x,{c_1}) \ge 57 \wedge f(x,{c_2}) \ge 26.4 \wedge f(x,{c_3}) \ge 0.83\\ \wedge f(x,{c_4}) \ge 1 \wedge f(x,{c_5}) = 1 \wedge f(x,{c_6}) \ge 0.89\\ \wedge f(x,{c_7}) \ge (60,90] \wedge f(x,{c_8}) = (6.98,9.16]\\ \wedge f(x,{c_9}) \ge (115,148] \to f(x,d) = H\end{array}$
 $\begin{array}{l}{R_3}:f(x,{c_1}) \ge 54 \wedge f(x,{c_2}) \ge 23.6 \wedge f(x,{c_4}) = 0\\ \wedge f(x,{c_5}) = 1 \wedge f(x,{c_6}) \ge 0.91 \wedge f(x,{c_7}) \ge (60,94]\\ \wedge f(x,{c_8}) = (6.65,7.12] \wedge f(x,{c_9}) \ge (127,164] \to\\ f(x,d) = L\end{array}$

${x_1}$ 数据运用规则 ${R_1}$ : 57岁, 男性, 身体质量指数BMI值为26.9(中国标准: [24, 27.9], 偏胖), 有吸烟史、喜欢甜食, 血氧饱和度SpO2值92%(中国标准: <94%, 供养欠缺), 心率(中国标准: [60, 100], 标准), 血糖(FBG)值范围(6.98,7.48] (中国标准: 空腹全血血糖FBG≥6.7毫摩尔/升两次可诊断为糖尿病); 收缩压(132, 136] (中国标准: ≥140 mmHg, 高血压), 经分类规则判定为高等风险(H). 对 ${x_7}$ 数据运用规则 ${R_3}$ :54岁, 男性, 身体质量指数BMI值为24.1, 有吸烟史、喜欢甜食, 血氧饱和度SpO2值91%, 心率(76,83], 血糖(FBG)值范围(6.7,7.06]; 收缩压(145,157], 经分类规则判定为低等风险(L), 医生综合诊断后更倾向于高血压(低等风险L). 该方法识别准确率89.23%、误识率10.77%, 其根本原因在于生理监测数据具有时间连续性、非精确性、模糊性等特性.

3 结论与展望

 [1] Xia HN, Asif I, Zhao XP. Cloud-ECG for real time ECG monitoring and analysis. Computer Methods and Programs in Biomedicine, 2013, 110(3): 253-259. DOI:10.1016/j.cmpb.2012.11.008 [2] Banos O, Villalonga C, Damas M, et al. PhysioDroid: Combining wearable health sensors and mobile devices for a ubiquitous, continuous, and personal monitoring. The Scientific World Journal, 2014, 2014: 490824. [3] Wang SL, Chen YL, Kuo AMH, et al. Design and evaluation of a cloud-based mobile health information recommendation system on wireless sensor networks. Computers & Electrical Engineering, 2016, 49: 221-235. [4] Su JH, Wang BW, Hsiao YC, et al. Personalized rough-set-based recommendation by integrating multiple contents and collaborative information. Information Sciences, 2010, 180(1): 113-131. DOI:10.1016/j.ins.2009.08.005 [5] Firoozabadi R, Gregg RE, Babaeizadeh S. Intelligent use of advanced capabilities of diagnostic ECG algorithms in a monitoring environment. Journal of Electrocardiology, 2017, 50(5): 615-619. DOI:10.1016/j.jelectrocard.2017.04.013 [6] Zhou P, Li ZC, Wang F, et al. Portable wireless ECG monitor with fabric electrodes. Chinese Journal of Biomedical Engineering, 2016, 25(4): 172-178. [7] 张翼, 宾光宇, 吴水才. 一种可穿戴式多参数心脏活动监测设备的设计. 中国医疗设备, 2018, 33(3): 18-21. DOI:10.3969/j.issn.1674-1633.2018.03.005 [8] Majumder S, Mondal T, Deen MJ. Wearable sensors for remote health monitoring. Sensors, 2017, 17(1): 130. DOI:10.3390/s17010130 [9] 樊建平, 张元亭, 王磊. 实施海云工程 实现低成本健康. 集成技术, 2012, 1(2): 4-9. [10] 邹娜. 基于Android手机的泛在人体健康监护系统数据挖掘程序的研究与设计[硕士学位论文]. 武汉: 华中科技大学, 2012. [11] 胡建强. 一种联接" 健康云”的家庭健康监护系统设计. 厦门大学学报(自然科学版), 2016, 55(1): 114-120. [12] 谢宏, 程浩忠, 牛东晓. 基于信息熵的粗糙集连续属性离散化算法. 计算机学报, 2005, 28(9): 1570-1574. DOI:10.3321/j.issn:0254-4164.2005.09.021 [13] 陈昊, 杨俊安, 庄镇泉. 变精度粗糙集的属性核和最小属性约简算法. 计算机学报, 2012, 35(5): 1011-1017. [14] Margret AS, Clara MLJ, Jeevitha P, et al. Design of a diabetic diagnosis system using rough sets. Cybernetics and Information Technologies, 2013, 13(3): 124-139. DOI:10.2478/cait-2013-0030