﻿ 基于全局和局部标签相关性的MIMLSVM改进算法
 计算机系统应用  2019, Vol. 28 Issue (4): 131-138 PDF

Improved MIMLSVM Algorithm Based on Global and Local Label Correlations
LI Cun-He, ZHANG Zhen-Kai
School of Computer & Communication Engineering, China University of Petroleum, Qingdao 266580, China
Foundation item: Natural Science Foundation of Shandong Province (ZR2014FQ018)
Abstract: Multi-Instance Multi-Label (MIML) learning is new machine learning framework where an example is described by multiple instances and associated with multiple classes of labels. A method of designing MIML algorithm is to identify its equivalence in the traditional supervised learning framework, using multi-instance learning or multi-label learning as the bridge, then using an algorithm for training and modeling. But in the degradation process, there will be the loss of information, then affecting the classification accuracy. This study improves the MIMLSVM algorithm by using a multi-label algorithm GLOCAL that considers both global and local correlations. The experimental results show that the improved algorithm has achieved sound classification results.
Key words: Multi-Instance Multi-Label (MIML)     locally     globally     degradation     MIMLSVM     GLOCAL

1 相关工作

2 MIMLSVM算法及改进算法 2.1 MIMLSVM算法

MIMLSVM算法是以多标记学习为桥梁, 将多示例多标记学习问题退化为传统监督学习问题求解. 首先, MIMLSVM算法将每个多示例多标记样本 $(X_i,Y_i)$ 转化为一个单示例多标记样本 $(\phi (X_i),Y_i)$ . 其中, 函数 $\phi$ 是基于构造性聚类[25]将每个示例包 $X_i$ 转化为一个示例 $Z_i$ . 该聚类过程将集合 $\Lambda = \{ {X_1},\cdots,{X_m}\}$ 中的每个包看作一个原子对象, 基于Hausdorff距离度量包之间的距离并利用k-medoids算法将集合 $\Lambda$ 划分为若干个聚类. 此时 $Z_i$ 的每一维即为包 $X_i$ 与各聚类中心的距离, 这样, 多示例多标记样本集就转换成了单示例多标记样本集, 此过程完成后, MIMLSVM算法利用MLSVM算法[26]对转换得到的多标记学习问题进行求解. MLSVM算法将多标签学习问题分解为多个独立的二元分类问题, 从而为每一个类别标记建立一个SVM分类器, 事实上, 示例 ${z_i}$ 参与了每一个分类器的构建, 示例 ${Z_i}$ 对应的标签集合 ${Y_i}$ 包含 $y$ 时, 即 $y \in {Y_i}$ $\phi ({Z_i},y) = + 1$ , 否则 $\phi ({Z_i},y) = - 1$ . 至此, 已经将多标记学习转化为了传统的监督学习, 可以使用SVM进行训练分类了. 对于未见标记的示例包, 利用构造性聚类转化为单个的示例, 然后交给训练出来的 $\left| y \right|$ 个分类器, 每个分类器在该示例上会有不同输出, 用输出为正的分类器对应的类别来标记该示例即可. 但是注意到, 如果未见标记的示例在每个分类器上的输出都为负值, 那该示例将会变得不可分, 即没有类别标记与之对应. 为了避免这种情况的出现, 这里采用T-Criterion准则[27]来进行预测, 即当所有分类器的输出均是负值时, 用输出值“最大”的分类器对应的类别来标记该示例.

2.2 MIMLSVM算法的改进算法

GLOCAL算法[28]是Yue Zh、Kwok JT和Zhou ZH在研究多标记学习时提出的一种全新的多标记算法, 通过学习潜在标签表示和优化标签流形, 同时处理全标签和缺失标签情况, 并且同时利用全局和局部标签相关性. GLOCAL算法之所以能够取得成功, 主要归咎于以下四点: (1) 它使用标签矩阵的低秩结构来获得更紧凑和更抽象的潜在标签表示, 这也为丢失标签恢复提供了自然的解决方案(2.2.1中介绍); (2) 它利用全局和局部标签相关性, 因此标签分类器可以利用来自所有标签的信息2.2.2中介绍); (3) 它直接从数据中学习标签相关性, 而不需要通过在关联矩阵中进行普通且困难的人工分辨(2.2.3中介绍); (4) 将上述问题综合为一个联合学习问题, 采用高效交替最优化策略进行优化(2.2.4中介绍).

2.2.1 基本模型

 $\tilde Y \simeq UV$ (1)

 $\min u,v,w\left\| {{\prod _{\Omega} }(Y - UV)} \right\|_F^2 + \lambda \left\| {V - {W^T}X} \right\|_F^2 + {\lambda _2}R(U,V,W)$ (2)

2.2.2 利用全局和局部相关性

 $\sum\limits_{i,j}^{} {{{\left[ {{S_o}} \right]}_{i,j}}} \left\| {{f_{i,: - }}{f_{j,:}}} \right\|_2^2$ (3)

 \begin{aligned} \mathop {\min }\limits_{U,V,W} &\left\| {{\prod _{\Omega} }(Y - UV)} \right\|_F^2 + \lambda \left\| {V - {W^{\rm{T}}}X} \right\|_F^2 + {\lambda _2}R(U,V,W)\\ &+ {\lambda _3}tr(F_0^{\rm{T}}{L_0}{F_0}) + \displaystyle\sum\limits_{m = 1}^g {{\lambda _4}tr(F_m^{\rm{T}}{L_m}{F_m})} \end{aligned} (4)

 \begin{aligned} \mathop {\min }\limits_{U,V,W} & \left\| {{\prod _{\Omega} }(Y - UV)} \right\|_F^2 + \lambda \left\| {V - {W^{\rm{T}}}X} \right\|_F^2\\ & + \displaystyle\sum\limits_{m = 1}^g {(\dfrac{{{\lambda _3}{n_m}}}{n}tr(F_0^{\rm{T}}{L_m}{F_0}) + {\lambda _4}tr(F_m^{\rm{T}}{L_m}{F_m}))} \\ &+ {\lambda _2}R(U,V,W) \end{aligned} (5)
2.2.3 学习标签相关性

$J = \left[ {{J_{ij}}} \right]$ 为指标矩阵, 当 $(i,j) \in \Omega$ ${J_{ij}} = 1$ , 否则为0. ${\prod _{\Omega} }(Y - UV)$ 可以重写为Hadamard乘积 $J \circ$ $(Y - UV)$ . 结合拉普拉斯矩阵的分解和 ${Z_m}$ 的对角约束, 我们得到了最优化问题:

 \left\{ \begin{aligned} \mathop {\min }\limits_{U,V,W,Z} & \left\| {J \circ (Y - UV)} \right\|_F^2 + \lambda \left\| {V - {W^{\rm{T}}}X} \right\|_F^2 \\ &+\displaystyle\sum\limits_{m = 1}^g {(\frac{{{\lambda _3}{n_m}}}{n}tr(F_0^{\rm{T}}{Z_m}Z_m^{\rm{T}}{F_0}) + {\lambda _4}tr(F_m^{\rm{T}}{Z_m}Z_m^{\rm{T}}{F_m}))} \\ & +{\lambda _2}R(U,V,W)\\ {\rm{s}}{\rm{.t.}}&\;\;diag({Z_m}Z_m^{\rm{T}}) = 1,\;{{m}} = 1,2,\cdots,{{g}}\;. \end{aligned}\right. (6)

2.2.4 通过交替最小化进行优化学习

2.2.4.1 更新 ${Z_m}$

 $\left\{ \begin{array}{l} \mathop {\min }\limits_{{Z_m}} \frac{{{\lambda _3}{n_m}}}{n}tr(F_0^{\rm{T}}{Z_M}Z_m^{\rm{T}}{F_0}) + {\lambda _4}tr(F_m^{\rm{T}}{Z_m}Z_m^{\rm{T}}{F_m})\\ {\rm{s}}{\rm{.t}}{\rm{.}}\;\;\;{{d}}iag({Z_m}Z_m^{\rm{T}}) = 1,\;\;{{m}} \in \{ 1,\cdots,g\} \end{array}\right.$ (7)

${Z_{m,j:}} \leftarrow {Z_{m,j:}}/\left\| {{Z_{m,j:}}} \right\|$ , 其中 ${Z_{m,j,:}}$ ${Z_m}$ 的第 $j$ 行.

2.2.4.2 更新 $V$

 $\mathop {\min }\limits_V \left\| {J \circ (Y - UV)} \right\|_F^2 + \lambda \left\| {V - {W^{\rm{T}}}X} \right\|_F^2 + {\lambda _2}\left\| V \right\|_F^2$ (8)

 $\begin{array}{l} \mathop {\min }\limits_{{v_i}} {\left\| {diag({j_i}){y_i} - diag({j_i})U{v_i}} \right\|^2} + \lambda {\left\| {{v_i} - {W^{\rm{T}}}{x_i}} \right\|^2} + {\lambda _2}{\left\| {{v_i}} \right\|^2} \end{array}$

2.2.4.3 更新 $U$

 \begin{aligned} \mathop {\min }\limits_U &\left\| {J \circ (Y - UV)} \right\|_F^2 + {\lambda _2}\left\| U \right\|_F^2\\ & +\displaystyle\sum\limits_{m = 1}^g {(\frac{{{\lambda _3}{n_m}}}{n}tr(F_0^{\rm{T}}{Z_m}Z_m^{\rm{T}}{F_0}) + {\lambda _4}tr(F_m^{\rm{T}}{Z_m}Z_m^{\rm{T}}{F_m}))} \end{aligned} (9)

 \begin{aligned} \nabla U &= (J \circ (UV - Y)){V^{\rm{T}}} + {\lambda _2}U\\ &+ \displaystyle\sum\limits_{m = 1}^g {{Z_i}} Z_i^TU(\frac{{{\lambda _3}{n_m}}}{n}{W^{\rm{T}}}{X_m}W + {\lambda _4}{W^{\rm{T}}}X{X^{\rm{T}}}W) \end{aligned}
2.2.4.4 更新 $W$

 \begin{aligned} \mathop {\min }\limits_W \lambda & \left\| {V - {W^{\rm{T}}}X} \right\|_F^2 + {\lambda _2}\left\| W \right\|_F^2\\ & + \displaystyle\sum\limits_{m = 1}^g {(\frac{{{\lambda _3}{n_m}}}{n}tr(F_0^{\rm{T}}{Z_m}Z_m^{\rm{T}}{F_0}) + {\lambda _4}tr(F_m^{\rm{T}}{Z_m}Z_m^{\rm{T}}{F_m}))} \end{aligned} (10)

 \begin{aligned} \nabla {{w}} = & \lambda X({X^{\rm{T}}}W - {V^{\rm{T}}}) + {\lambda _2}W \\ &+\displaystyle\sum\limits_{m = 1}^g {(\frac{{{\lambda _3}{n_m}}}{n}X{X^{\rm{T}}} + {\lambda _4}{X_m}X_m^{\rm{T}})} W{U^{\rm{T}}}{Z_m}Z_m^{\rm{T}}U \end{aligned}

MIMLSVM算法是针对多示例多标记框架设计的算法, 在基于构造性聚类将每一个包转化为一个示例之后就变成了单示例多标记问题, GLOCAL算法针对的是单示例多标签的分类问题, 因此可以使用GLOCAL算法对原来的MIMLSVM算法进行改进, 替代原来的MLSVM算法得到MIMLSVM-GLOCAL算法, 像前文提到的那样, GLOCAL算法并没有仅仅从局部或者全局一方面出发, 而是两者都考虑在内去解决相关问题, 并且也可以在标签缺失的情况下去解决问题. MIMLSVM-GLOCAL的算法描述如表1所示.

3 实验

4 结束语

 [1] Zhou ZH, Zhang ML, Huang SJ, et al. Multi-instance multi-label learning. Artificial Intelligence, 2012, 176(1): 2291-2320. DOI:10.1016/j.artint.2011.10.002 [2] Zhou ZH, Zhang ML, Huang SJ, et al. MIML: A framework for learning with ambiguous objects. arXiv: 0808.3231v2, 2008. [3] Zhou ZH, Zhang ML. Multi-instance multi-label learning with application to scene classification. Proceedings of the 19th International Conference on Neural Information Processing Systems. Canada. 2006. 1609–1616 [4] 丁世飞, 齐丙娟, 谭红艳. 支持向量机理论与算法研究综述. 电子科技大学学报, 2011, 40(1): 2-10. [5] Kwok JT, Zhou ZH, Xu L. Machine learning. Kacprzyk J, Pedrycz W. Springer Handbook of Computational Intelligence. Berlin, Heidelberg: Springer, 2015. 495–522. [6] Maron O, Lozano-Pérez T. A framework for multiple-instance learning. Proceedings of 1997 Conference on Advances in Neural Information Processing Systems. Denver, CO, USA. 1998. 570–576. [7] Zhang Q, Goldman SA. EM-DD: An improved multiple-instance learning technique. Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. Vancouver, Canada. 2001. 1073–1080. [8] Ruffo G. Learning single and multiple instance decision trees for computer security applications[Ph. D. thesis]. Torino, Italy: University of Turin, 2000. [9] Zhang ML, Zhou ZH. Adapting RBF neural networks to multi-instance learning. Neural Processing Letters, 2006, 23(1): 1-26. DOI:10.1007/s11063-005-2192-z [10] 刘大有. 知识科学中的基本问题研究. 北京: 清华大学出版社, 2006. [11] Zhang ML, Zhou ZH. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8): 1819-1837. DOI:10.1109/TKDE.2013.39 [12] Boutell MR, Luo JB, Shen XP, et al. Learning multi-label scene classification. Pattern Recognition, 2004, 37(9): 1757-1771. DOI:10.1016/j.patcog.2004.03.009 [13] Fürnkranz J, Hüllermeier E, Mencía E, et al. Multilabel classification via calibrated label ranking. Machine Learning, 2008, 73(2): 133-153. DOI:10.1007/s10994-008-5064-8 [14] Read J, Pfahringer B, Holmes G, et al. Classifier chains for multi-label classification. Machine Learning, 2011, 85(3): 333-359. DOI:10.1007/s10994-011-5256-5 [15] Huang KH, Lin HT. Cost-sensitive label embedding for multi-label classification. Machine Learning, 2017, 106(9–10): 1725-1746. DOI:10.1007/s10994-017-5659-z [16] Yeh CK, Wu WC, Ko WJ, et al. Learning deep latent space for multi-label classification. Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, CA, USA. 2017. 2838–2844. [17] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer, 2001. [18] Zhang ML, Zhou ZH. M3MIML: A maximum margin method for multi-instance multi-label learning. Proceedings of the 2008 8th IEEE International Conference on Data Mining. Pisa, Italy. 2008. 688–697. [19] Xu X, Frank E. Logistic regression and boosting for labeled bags of instances. Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Sydney, Australia. 2004. 272–281. [20] Gärtner T, Flach PA, Kowalczyk A, et al. Multi-instance kernels. Proceedings of the 19th International Conference on Machine Learning. San Francisco, CA, USA. 2002. 179–186. [21] Cheung PM, Kwok JT. A regularization framework for multiple-instance learning. Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, PA, USA. 2006. 193–200. [22] Smola AJ, Vishwanathan SVN, Hofmann T. Kernel methods for missing variables. Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics. Barbodos. 2005. 325–332. [23] Cristianini N, Shawe-Taylor J. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge, UK: Cambridge University Press, 2000. [24] Huang SJ, Zhou ZH. Multi-label learning by exploiting label correlations locally. Proceedings of the 26th AAAI Conference on Artificial Intelligence. Toronto, Ontario, Canada. 2012. 949–955. [25] Zhou ZH, Zhang ML. Solving multi-instance problems with classifier ensemble based on constructive clustering. Knowledge and Information Systems, 2007, 11(2): 155-170. DOI:10.1007/s10115-006-0029-3 [26] Boutell MR, Luo JB, Shen XP, et al. Learning multi-label scene classification. Pattern Recognition, 2004, 37(9): 1757-1771. DOI:10.1016/j.patcog.2004.03.009 [27] Andrews S, Tsochantaridis I, Hofmann T. Support vector machines for multiple-instance learning. Advances in Neural Information Processing System. 2002. 577–584. [28] Zhu Y, Kwok JT, Zhou ZH. Multi-label learning with global and local label correlation. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(6): 1081-1094. DOI:10.1109/TKDE.2017.2785795 [29] Belkin M, Niyogi P, Sindhwani V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. The Journal of Machine Learning Research, 2006, 7: 2399-2434. [30] Melacci S, Belkin M. Laplacian support vector machines trained in the primal. The Journal of Machine Learning Research, 2011, 12: 1149-1184. [31] Luo DJ, Ding C, Huang H, et al. Non-negative Laplacian embedding. Proceedings of the 2009 9th IEEE International Conference on Data Mining. Miami, FL, USA. 2009. 337–346. [32] Wang H, Huang H, Ding C. Image annotation using multi-label correlated green’s function. Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. Kyoto, Japan. 2009. 2029–2034. [33] Chung FRK. Spectral Graph Theory. Providence, RI: American Mathematical Society, 1997. [34] Boumal N, Mishra B, Absil PA, et al. Manopt, a Matlab toolbox for optimization on manifolds. The Journal of Machine Learning Research, 2014, 15(1): 1455-1459.