﻿ 协同深度学习推荐算法研究
 计算机系统应用  2019, Vol. 28 Issue (1): 169-175 PDF

Research on Collaborative Deep Learning Recommendation Algorithm
FENG Chu-Ying, SITU Guo-Qiang, NI Wei-Long
School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
Abstract: For the problem that when the user score is not enough, the recommender system significantly reduces the data sparse recommendation performance, a Collaborative In Deep Learning algorithm (CIDL) is proposed. The algorithm firstly conducts the deep learning on a large amount of data, and then performs collaborative filtering on the rating (feedback) matrix to arrive at a recommendation item for the user. This study uses real movie data to test and to compare it with the other four excellent algorithms. It is proved that CIDL can effectively solve the problem of reduced performance due to data sparseness and improve the accuracy of the recommendation.
Key words: recommender systems     deep learning     collaborative filtering     text mining     stacked denoising autoencoders

1 相关工作

2 本文算法 2.1 符号及公式说明

2.2 算法流程

(1)对数据进行预处理

 图 1 算法流程图

(2)训练SDAE自动编码器输出特征矩阵

 ${\tilde R^{\left( {{u}} \right)}}{\rm{ = }}\left( {{{\tilde R}_{{{u1}}}}, \cdots ,{{\tilde R}_{{{um}}}}} \right) \in {\tilde R^{{n}}}$ (1)

 ${\hat R^{\left( {{u}} \right)}}{{ = f}}\left( {W{'_1}g\left( {{W_1}{{\tilde R}^{\left( u \right)}} + {b_1}} \right) + b{'_1}} \right)$ (2)

 ${E_U} = \frac{1}{{2m}}\sum\limits_{u = 1}^m \parallel {R^{(u)}} - {\hat R^{(u)}}{\parallel ^2} + \frac{\lambda }{2}\parallel {W_1}{\parallel ^2} + \frac{\lambda }{2}\parallel W{'_1}{\parallel ^2}$ (3)

(3)使用CIDL算法生成新的评分矩阵

2.3 堆叠降噪自动编码器

 图 2 SDAE原理图

 $\mathop {\min }\limits_{{\rm{\{ }}{W_l}{\rm{\} ,\{ }}{b_l}{\rm{\} }}} \parallel {X_c} - {X_L}\parallel _F^2 + \lambda \sum\limits_l {\parallel {W_l}\parallel _F^2}$ (4)

2.4 广义贝叶斯SDAE

XL生成干净输入XC是贝叶斯SDAE生成过程的一部分, 而从XC生成噪声破坏输入X0是一种人工噪声注入过程, 可帮助SDAE学习更强大的特征表示. 若假设干净输入XC和损坏输入X0, 可定义以下生成过程:

(1)对于SDAE网络中的每一层l

1)对于权重矩阵NWl的每一列n

 ${W_{l,*n}} \sim {\cal{N}}(0,\lambda _W^{ - 1}{I_{{K_l}}})$ (5)

2)绘制偏向量

 ${b_l} \sim {\cal{N}}(0,\lambda _{{W}}^{ - 1}{I_{{K_l}}})$ (6)

3)对于Xl的每一行j

 ${X_{l,j*}} \sim {\cal{N}}(\delta ({X_{l - 1,j*}},{W_i} + {b_l}),\lambda _{{S}}^{ - 1}{I_{{K_l}}})$ (7)

(2)对于每个j, 绘制一个干净的输入

 ${X_{C,j*}} \sim {\cal{N}}({X_{L,j*}},\lambda _n^{ - 1}{I_J})$ (8)

2.5 协同深度学习

 图 3 CIDL原理图

CIDL的生成过程如下:

(1) 对于SDAE网络的每一层l, 操作与前一部分SDAE中步骤相同.

(2) 对于每个j, 绘制一个干净的输入, 操作与SDAE中步骤相同.

1)设置一个潜在的项目设置向量:

 ${\varepsilon _j} \sim {\cal{N}}(0,\lambda _v^{ - 1}{I_K})$ (9)

2)并将潜在项目矢量设置为:

 ${v_j} = {\varepsilon _j} + X_{\frac{L}{2},j*}^{\rm T}$ (10)

3)为每个用户设置一个潜在用户向量:

 ${u_i} \sim {\cal{N}}(0,\lambda _u^{ - 1}{I_K})$ (11)

4)为每个用户项目对(i, j)绘制评分Ri, j:

 ${R_{i,j}} \sim {\cal{N}}(u_i^{\rm T}{v_j},C_{ij}^{ - 1})$ (12)

2.6 最大后验估计

 $\begin{split}{\cal{L}} =& - \displaystyle\frac{{{\lambda _u}}}{2}\displaystyle\sum\limits_i {||{u_i}} ||_2^2 - \displaystyle\frac{{{\lambda _w}}}{2}\displaystyle\sum\limits_l {(||{W_l}||_F^2} + ||{b_l}||_2^2)\\& - \displaystyle\frac{{{\lambda _v}}}{2}\displaystyle\sum\limits_j {||{v_j} - X_{\frac{L}{2},j*}^{\rm T}} ||_2^2 - \displaystyle\frac{{{\lambda _n}}}{2}\displaystyle\sum\limits_j {||{X_{L,j*}} - {X_{c,j*}}} ||_2^2)\\& - \displaystyle\frac{{{\lambda _s}}}{2}\displaystyle\sum\limits_l {\displaystyle\sum\limits_j {||\delta ({X_{l - 1,j*}}{W_l} + {b_l}) - {X_{l,j*}}||_2^2} } \\& - \displaystyle\sum\limits_{i,j} {\displaystyle\frac{{{C_{i,j}}}}{2}} {({R_{ij}} - u_i^{\rm T}{v_j})^2}\end{split}$ (13)

λS趋近于无限, 此似然将会变为:

 $\begin{split}{\cal{L}}= &- \displaystyle\frac{{{\lambda _u}}}{2}\displaystyle\sum\limits_i {||{u_i}||_2^2} - \displaystyle\frac{{{\lambda _w}}}{2}\displaystyle\sum\limits_l {(||{W_l}} ||_F^2 + ||{b_l}||_2^2)\\& - \displaystyle\frac{{{\lambda _v}}}{2}\displaystyle\sum\limits_j {||{v_j} - {f_e}{{({X_{0,j*,}}{W^ + })}^{\rm T}}||_2^2} \\& - \displaystyle\frac{{{\lambda _n}}}{2}\displaystyle\sum\limits_j {||{f_r}} ({X_{0,j*,}}{W^ + }) - {X_{c,j*}}||_2^2\\& - \displaystyle\sum\limits_{i,j} {\displaystyle\frac{{{C_{ij}}}}{2}{{({R_{ij}} - u_i^{\rm{T}}{v_j})}^2}} \end{split}$ (14)

λnv的比率接近正无穷时, CIDL将退化为两步模型, 其中使用SDAE学习的潜在表示被直接放入协同主题回归模型中. 另一个临界点发生在λnv趋近于零时SDAE的解码器基本失去功能. 通过实验证实可得, 在这两种情况下, 模型预测性能都会大大提高.

 ${u_i} \leftarrow {{\rm{(}}V{C_i}{V^{\rm T}}{\rm{ + }}{\lambda _u}{I_k}{\rm{)}}^{{\rm{ - 1}}}}V{C_i}{R_i}$ (15)
 ${v_j} \leftarrow {(U{C_i}{U^{\rm T}} + {\lambda _v}{I_K})^{ - 1}}(U{C_j}{R_j} + {\lambda _v}{f_e}{({{{X}}_{0, j*}}, {{{W}}^ + })^{\rm T}})$ (16)

 $\begin{split}\nabla {w_l}{\cal{L}} =& - {\lambda _w}{W_l}\\ &- {\lambda _v}\displaystyle\sum\limits_j {\nabla {w_l}{f_e}{{({X_{0,j*}},{W^ + })}^{\rm T}}({f_e}{{({X_{0,j*}},{W^ + })}^{\rm T}} - {v_j})} \\ &- {\lambda _n}\displaystyle\sum\limits_j {\nabla {w_l}{f_r}({X_{0,j*}},{W^ + })({f_r}({X_{0,j*}},{W^ + }) - {X_{c,j*}})} \end{split}$ (17)
 $\begin{split}\nabla {b_l}{\cal{L}} = & - {\lambda _w}{b_l}\\& - {\lambda _v}\displaystyle\sum\limits_j {\nabla {b_l}{f_e}{{({X_{0,j*}},{W^ + })}^{\rm T}}({f_e}{{({X_{0,j*}},{W^ + })}^{\rm T}} - {v_j})} \\& - {\lambda _n}\displaystyle\sum\limits_j {\nabla {b_l}{f_r}({X_{0,j*}},{W^ + })({f_r}({X_{0,j*}},{W^ + }) - {X_{c,j*}})} \end{split}$ (18)

Input: 评分矩阵 R, 特征向量维度K, 学习率 η, 比例参数α, 正则参数λU,λV,λQ

Output: U, V

(1) 初始化: 用一个较小的值随机构造U, V $\scriptstyle Q$ ;

(2) while (error on validation set decrease);

(3) $\scriptstyle {\nabla _{U{\cal{L}}}} = I\left( {{U^{\rm T}}V - R} \right)V + \alpha \left( {{U^{\rm T}}Q - H} \right)Q + {\lambda _U}U$

$\scriptstyle {\nabla _{V{\cal{L}}}} = {[I({U^{\rm T}}V - R)]^{\rm T}}U + {\lambda _V}V$

$\scriptstyle {\nabla _{Q{\cal{L}}}} = \alpha {({U^{\rm T}}Q - H)^{\rm T}}U + {\lambda _Q}Q$

(4) $\scriptstyle {\rm{set}}\;\eta = 0.1$

(5) $\begin{array}{*{20}{l}}\scriptstyle {{\rm{while}}{(_{\cal{L}}}(U - \eta {\nabla _{U{\cal{L}}}},V - \eta {\nabla _{V{\cal{L}}}}{\rm{ }},Q - \eta {\nabla _{Q{\cal{L}}}}){ > _{\cal{L}}}(U,V,Q){\rm{ }})}\end{array}$

(6) $\scriptstyle {\rm{set}}\;\eta = {\rm{ }}\eta/2$

(7) $\scriptstyle {\rm{End while}}$

(8) $\scriptstyle U = U - \eta {\nabla _{U{\cal{L}}}}$

$\scriptstyle V = V - \eta {\nabla _{V{\cal{L}}}}$

$\scriptstyle Q = Q - \eta {\nabla _{V{\cal{L}}}}$

(9) $\begin{array}{*{20}{l}}\scriptstyle {{\rm{End while}}} \end{array}$

(10) $\begin{array}{*{20}{l}}\scriptstyle {{\rm{Return}}\;U,V}\end{array}$

3 实验和分析 3.1 评估计划

${{@M = }}\displaystyle\frac{{\text{在}{{M}}\text{中用户喜欢的项目数}}}{\text{用户喜欢的项目总数}}$

3.2 基线和实验设置

3.3 平均查全率分析

 图 4 各算法在P=1时的对比图

 图 5 各算法在P=10时的对比图

λn趋近于无限大时, λnv将会趋近正无穷, 此时CIDL将退化为两个单独的模型. 在这种情况下, SDAE将以无监督方式学习潜在项目表示, 然后将其直接放入CTR的简化版本中. 这种情况下, 贝叶斯SDAE和基于矩阵分解的协同过滤组件之间没有交互作用, 预测性能将大大提高. 对于另一个极端, 当λn无限小时, λn/λv将趋近于近零, 此时CIDL退化为贝叶斯SDAE分量解码器基本消失的情况. 此时贝叶斯SDAE组件基本消失, 编码器将通过简单矩阵分解学习潜在项目向量. 如图6所示, 预测性能随着λn变化而变化. 当λn<0.1时, 查全率@M已经非常接近PMF的结果(或甚至比PMF结果还要差).

 图 6 在不同Δn的情况下各算法的比较

4 结束语

 [1] Sarwar B, Karypis C, Konstan J, et al. Item-based collaborative filtering recommendation algorithms. Proceedings of the 10th International Conference on World Wide Web. Hong Kong, China. 2014. 285–295. [2] 丁雪涛. 基于协同关系主题回归模型的推荐算法研究[硕士学位论文]. 北京: 清华大学, 2013. [3] Kalchbrenner N, Grefenstette E, Blunsom P. A convolutional neural network for modelling sentences. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, MD, USA. 2014. 655–665. [4] Vincent P, Larochelle H, Lajoie I, et al. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research, 2010, 11: 3371-3408. [5] 孟祥武, 刘树栋, 张玉洁, 等. 社会化推荐系统研究. 软件学报, 2015, 26(6): 1356-1372. DOI:10.13328/j.cnki.jos.004831 [6] Bobadilla J, Ortega F, Hernando A, et al. Recommender systems survey. Knowledge-Based Systems, 2013, 46: 109-132. DOI:10.1016/j.knosys.2013.03.012 [7] Sevil SG, Kucuktunc O, Duygulu P, et al. Automatic tag expansion using visual similarity for photo sharing websites. Multimedia Tools and Applications, 2010, 49(1): 81-99. DOI:10.1007/s11042-009-0394-5 [8] Wang H, Li WJ. Relational collaborative topic regression for recommender systems. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(5): 1343-1355. DOI:10.1109/TKDE.2014.2365789 [9] 杨帅, 王鹃. 基于堆栈降噪自编码器改进的混合推荐算法. 计算机应用, 2018, 38(7): 1866-1871. [10] 王妍, 唐杰. 基于深度学习的论文个性化推荐算法. 中文信息学报, 2018, 32(4): 114-119. DOI:10.3969/j.issn.1003-0077.2018.04.014 [11] 武玲梅, 陆建波, 刘春霞. 基于降噪自动编码器的推荐算法. 计算机与现代化, 2018(3): 78-82. DOI:10.3969/j.issn.1006-2475.2018.03.015 [12] 霍欢, 郑德原, 高丽萍, 等. 栈式降噪自编码器的标签协同过滤推荐算法. 小型微型计算机系统, 2018, 39(1): 7-11. DOI:10.3969/j.issn.1000-1220.2018.01.003 [13] 王宪保, 何文秀, 王辛刚, 等. 基于堆叠降噪自动编码器的胶囊缺陷检测方法. 计算机科学, 2016, 43(2): 64-67. [14] Zhou K, Zha HY. Learning binary codes for collaborative filtering. Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Beijing, China. 2012. 498–506. [15] 许普乐, 王杨, 黄亚坤, 等. 大数据环境下基于贝叶斯推理的中文地名地址匹配方法. 计算机科学, 2017, 44(9): 266-271. [16] 余冲. 基于深度学习的协同过滤模型研究[硕士学位论文]. 深圳: 深圳大学, 2017. [17] Zhang W, Zhuang JY, Yong X, et al. Personalized topic modeling for recommending user-generated content. Frontiers of Information Technology & Electronic Engineering, 2017, 18(5): 708-718. [18] Guo YW, Cheng HR, Tang MS, et al. Kernel based collaborative topic regression for tag recommendation. Proceedings of the 2016 International Conference on Education, Sports, Arts and Management Engineering. 2016. 113–117. [19] 梁淑芬, 刘银华, 李立琛. 基于LBP和深度学习的非限制条件下人脸识别算法. 通信学报, 2014, 35(6): 154-160. DOI:10.3969/j.issn.1000-436x.2014.06.020 [20] Singh AP, Gordon GJ. Relational learning via collective matrix factorization. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Las Vegas, NV, USA. 2008. 650–658. [21] Chen TQ, Zhang WN, Lu QX, et al. SVDFeature: A toolkit for feature-based collaborative filtering. The Journal of Machine Learning Research, 2014, 13: 3619-3622. [22] van den Oord A, Dieleman S, Schrauwen B. Deep content-based music recommendation. Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, NV, USA. 2013. 2643–2651.