﻿ 基于聚类变分自编码器的协同过滤算法
 计算机系统应用  2019, Vol. 28 Issue (9): 162-167 PDF

Clustering Variational Autoencoder for Collaborative Filtering
HAN Hao-Xian, YE Chun-Ming
Business School, University of Shanghai for Science and Technology, Shanghai 200093, China
Foundation item: National Natural Science Foundation of China (71840003); Science and Technology Development Program of University of Shanghai for Science and Technology (2018KJFZ043)
Abstract: Aiming at the data sparsity problem of collaborative filtering model, a variational autoencoder with clustering latent variable is proposed to process the implicit feedback data. The deep generative model can not only learn the feature distribution of latent variable, but also complete the clustering of features. The original data is reconstructed by multinomial likelihood, the parameters are estimated by Bayesian inference, and the regularization parameter is introduced into the model. By adjusting its size, it can avoid excessive regularization and make the model fit better. A nonlinear probability model has a better ability to model the prediction of missing scores. Experimental results on three data sets of MovieLens show that the proposed algorithm has better recommended performance than the other advanced baselines.
Key words: recommendation system     collaborative filtering     deep generative model     Variational Auto Encoder (VAE)     clustering

1 引言

(1) 与以往对用户和内容的特征进行聚类的方法不同, 本文直接将隐变量特征设定为带有聚类效果的二元变量, 将聚类统一到算法的整体框架中;

(2) 在大规模数据上对四种模型进行了实验, 对其性能进行了评价和对比, 并且对正则项的超参数进行了研究, 避免了过度正则化.

2 变分自编码器

 $\log {p_\theta }({x_u}\left| {{z_u}} \right.) = \sum\limits_i {{x_{ui}}\log {\pi _i}} ({f_\theta }({z_u}))$ (1)
 图 1 变分自编码器结构图

 $\begin{split}&KL({q_\phi }({z_u}\left| {{x_u}} \right.)\left\| {{p_\theta }({z_u}\left| {{x_u}} \right.)} \right.) = \\ &\quad{E_{{q_\phi }({z_u}\left| {{x_u}} \right.)}}[\log {q_\phi }({z_u}\left| {{x_u}} \right.) - \log {p_\theta }({z_u}\left| {{x_u}} \right.)]\end{split}$ (2)

 $\log {p_\theta }({x_u}) \ge L({x_u};\theta ,\phi )$ (3)

 $\begin{split}L({x_u};\theta ,\phi ) =& {E_{{q_\phi }({z_u}\left| {{x_u}} \right.)}}[\log {p_\theta }({x_u}\left| {{z_u}} \right.)] \\ &-KL({q_\phi }({z_u}\left| {{x_u}} \right.)\left\| {{p_\theta }({z_u})} \right.) \end{split}$ (4)

 $z = \mu + \varepsilon *\sigma ,\varepsilon \sim N(0,I)$ (5)
3 推荐算法 3.1 构建点击矩阵

3.2 构建CVAE算法

 ${q_\phi }({z_u},{y_u}\left| {{x_u}} \right.) = {q_\phi }({y_u}\left| {{z_u}} \right.){q_\phi }({z_u}\left| {{x_u}} \right.)$ (6)
 ${p_\theta }({x_u}\left| {{z_u},{y_u}} \right.) = {p_\theta }({x_u}\left| {{z_u}} \right.)$ (7)

 $\begin{split}&KL({q_\phi }({z_u},{y_u}\left| {{x_u}} \right.)\left\| {{p_\theta }({z_u},{y_u}\left| {{x_u}} \right.)} \right.) = \\ &\sum\limits_y {\iint {{q_\phi }({y_u}\left| {{z_u}} \right.){q_\phi }({z_u}\left| {{x_u}} \right.)\ln }} \frac{{{q_\phi }({y_u}\left| {{z_u}} \right.){q_\phi }({z_u}\left| {{x_u}} \right.)}}{{{p_\theta }({x_u}\left| {{z_u}} \right.){p_\theta }({z_u}\left| {{y_u}} \right.)}}dzdx\end{split}$ (8)

 $\begin{split}&L({x_u};\theta ,\phi ) = {E_{{q_\phi }({z_u}\left| {{x_u}} \right.)}}[\log {p_\theta }({x_u}\left| {{z_u}} \right.)] \\ &-\sum\limits_y {{q_\phi }({y_u}\left| {{z_u}} \right.)\log \frac{{{q_\phi }({z_u}\left| {{x_u}} \right.)}}{{{p_\theta }({z_u}\left| {{y_u}} \right.)}}} - KL({q_\phi }({y_u}\left| {{z_u}} \right.)\left\| {{p_\theta }({y_u})} \right.)\end{split}$ (9)

3.3 引入正则化系数

 $\begin{split}L({x_u};\theta ,\phi ) =& {\rm{ - }}{E_{{q_\phi }({z_u}\left| {{x_u}} \right.)}}[\log {p_\theta }({x_u}\left| {{z_u}} \right.)]\\ &+\beta \sum\limits_y {{q_\phi }({y_u}\left| {{z_u}} \right.)\log \frac{{{q_\phi }({z_u}\left| {{x_u}} \right.)}}{{{p_\theta }({z_u}\left| {{y_u}} \right.)}}} \\ &+\beta KL({q_\phi }({y_u}\left| {{z_u}} \right.)\left\| {{p_\theta }({y_u})} \right.)\end{split}$ (10)

3.4 SDG训练与预测

CAVE的随机梯度下降算法(SDG)以一个训练样本 ${x_u}$ 和其重构数据 $x_u^{'}$ 计算梯度 ${\nabla _\theta }L$ ${\nabla _\phi }L$ , 再对批量数据的梯度求均值, 利用该值更新网络的参数:

 $\theta {\rm{ = }}\theta {\rm{ - }}\alpha \frac{{\displaystyle \sum {{\nabla _\theta }L} }}{n}$ (11)
 $\phi {\rm{ = }}\phi {\rm{ - }}\alpha \frac{{\displaystyle \sum {{\nabla _\phi }L} }}{n}$ (12)

4 实验研究 4.1 数据集及实验环境

4.2 评价方法

 $\operatorname{Re} call@K = \frac{{\displaystyle \sum\nolimits_{k = 1}^K {h[w(k) \in {I_u}]} }}{{\min (K,\left| {{I_u}} \right|)}}$ (13)
 $NDCG@K = {Z_K}\sum\limits_{k = 1}^K {\frac{{{2^{h[w(k) \in {I_u}]}} - 1}}{{\log (k + 1)}}}$ (14)

4.3 实验参数设置与基线 4.3.1 基线

DAE[4]: 降噪自编码的训练过程中, 输入的数据有一部分是“损坏”的, 能够对“损坏”的原始数据编码、解码, 然后尽可能接近原始数据地预测打分矩阵的缺失值.

SDAE: 栈式降噪自编码器就是在数据部分“损坏”的基础上多个自编码器相接, 以完成逐层特征提取的任务, 最后得到的特征作为分类器的输入, 完成推荐项目的概率预测.

WMF[11]: 加权矩阵分解, 这是一种线性的、低秩的矩阵分解模型.

SLIM[12]: 稀疏线性模型, 该方法是基于物品相似度的推广形式.

CDAE[13]: 协同降噪自编码器通过向输入添加每个用户的潜在因子来表示用户偏好, 同时在隐变量层加入了偏置表示.

DAE和SDAE在ML-100k和ML-1M上的评价结果由本文实验得出; WMF、SLIM和CDAE在ML-20M上的实验数据源于文献[4].

4.3.2 参数

4.4 实验结果与分析

 图 2 β值对结果的影响

 图 3 类别个数对结果的影响

 图 4 NDCG@100指标的迭代

5 结束语

 [1] Salakhutdinov R, Mnih A, Hinton G. Restricted Boltzmann machines for collaborative filtering. Proceedings of the 24th International Conference on Machine Learning. Corvalis, OR, USA. 2007. 791-798. [2] Strub F, Mary J. Collaborative filtering with stacked denoising autoencoders and sparse inputs. Proceedings of 2015 NIPS Workshop on Machine Learning for eCommerce. Montreal, Canada. 2015. [3] Cheng HT, Koc L, Harmsen J, et al. Wide & deep learning for recommender systems. Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. Boston, MA, USA. 2016. 7-10. [4] Liang DW, Krishnan RG, Hoffman MD, et al. Variational autoencoders for collaborative filtering. arXiv: 1802. 05814, 2018. [5] He XN, Liao LZ, Zhang HW, et al. Neural collaborative filtering. Proceedings of the 26th International Conference on World Wide Web. Perth, Australia. 2017. 173-182. [6] 霍欢, 郑德原, 高丽萍, 等. 栈式降噪自编码器的标签协同过滤推荐算法. 小型微型计算机系统, 2018, 39(1): 7-11. DOI:10.3969/j.issn.1000-1220.2018.01.003 [7] 李晓菊, 顾君忠, 程洁. 基于变分循环自动编码器的协同推荐方法. 计算机应用与软件, 2018, 35(9): 258-263, 280. DOI:10.3969/j.issn.1000-386x.2018.09.046 [8] Kingma DP, Welling M. Auto-encoding variational bayes. arXiv: 1312. 6114, 2013. [9] 苏剑林. 变分自编码器(四): 一步到位的聚类方案. https://spaces.ac.cn/archives/5887, [2018-09-17]. [10] Higgins I, Matthey L, Pal A, et al. β-VAE: Learning basic visual concepts with a constrained variational framework. Proceedings of 2017 International Conference on Learning Representations. Toulon, France. 2017. 1–13. [11] Hu YF, Koren Y, Volinsky C. Collaborative filtering for implicit feedback datasets. Proceedings of the 20088th IEEE International Conference on Data Mining. Pisa, Italy. 2009. 263-272. [12] Ning X, Karypis G. SLIM: Sparse linear methods for top-N recommender systems. Proceedings of the 2011 IEEE 11th International Conference on Data Mining. Vancouver, Canada. 2011. 497-506. [13] Wu Y, Dubois C, Zheng AX, et al. Collaborative denoising auto-encoders for top-N recommender systems. Proceedings of the 9th ACM International Conference on Web Search and Data Mining. San Francisco, CA, USA. 2016. 153-162. [14] Sedhain S, Menon AK, Sanner S, et al. AutoRec: Autoencoders meet collaborative filtering. Proceedings of the 24th International Conference on World Wide Web. Florence, Italy. 2015. 111-112.