﻿ 卷积优化的变分自编码聚类方法
 计算机系统应用  2020, Vol. 29 Issue (10): 222-227 PDF

1. 福建师范大学 数学与信息学院, 福州 350117;
2. 数字福建环境监测物联网实验室, 福州 350117

Clustering Method Based on VAE with Convolution Optimization
YAN Xiao-Ming1,2
1. College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350117, China;
2. Digital Fujian Internet-of-Things Laboratory of Environmental Monitoring, Fujian Normal University, Fuzhou 350117, China
Abstract: The traditional Variational AutoEncoder (VAE) takes the flattened sample as input data directly. When the sample is image data, the effect of learning by this method is weakly. In this study, VAE with the convolution optimization is proposed to preprocess image data with multiple convolution networks of variable layers. Each convolution network sets different parameters to process the input data, then splices the results of different layers as the input of VAE. Clustering is implemented through the distance between the category label distribution of original dataset and the category distribution of each sample is calculated by adding a category encoder. The experimental results show that the convolution optimization method proposed in this study improves the clustering accuracy compared with the non-optimal VAE, increases the quality of the generated image and the diversity of the generated samples in the edge and shape.
Key words: convolution     Variational AutoEncoder (VAE)     clustering     clustering accuracy

1 损失函数

 $\log q\left( {{\textit{z}}|{x^{(i)}}} \right) = \log N\left( {{\textit{z}};{\mu ^{(i)}},{\sigma ^{2(i)}}I} \right)$ (1)

 $reconstruction\_loss = \dfrac{1}{n}\sum\limits_{i = 1}^n {{{({x^{(i)}} - {{\hat x}^{(i)}})}^2}}$ (2)

 $\begin{split} kl\_loss =& KL\left( {N\left( {\mu ,{\sigma ^2}} \right)\left\| {N(0,1)} \right.} \right)\\ =& \frac{1}{2}\left( { - \log {\sigma ^2} + {\mu ^2} + {\sigma ^2} - 1} \right) \end{split}$ (3)

 $KL(q(y|x)\left\| {p(y)) = \int {q(y|x)\ln \frac{{q(y|x)}}{{p(y)}}dy} } \right.$ (4)

 $\int {q(y|x)\ln \dfrac{{q(y|x)}}{{p(y)}}dy} \approx \dfrac{1}{k}\sum\limits_{j = 1}^k {\ln \dfrac{{q({y^j}|{x^j})}}{{p({y^j})}}}$ (5)

 $category\_loss = \frac{1}{k}\left(\sum\limits_{j = 1}^k {\ln (q({y^j}|{x^j}))} - \sum\limits_{j = 1}^k {\ln (p({y^j}))}\right)$ (6)

 $category\_loss = \frac{1}{k}\sum\limits_{j = 1}^k {\ln (q({y^j}|{x^j}))}$ (7)
2 卷积优化的变分自编码器

 图 1 带卷积优化的VAE聚类模型

1) 计算数据集的多卷积层拼接数据.

2) 构造全连接网络, 根据公式7求得样本的类别损失 $\scriptstyle category\_loss$ .

3) 构造两个全连接网络, 拟合样本 $\scriptstyle{x^{(i)}}$ 所属高斯分布的均值 ${\scriptstyle \mu ^{(i)}}$ 和方差 $\scriptstyle \log {\sigma ^{2(i)}}$ .

4) 根据式(3)求 $\scriptstyle kl\_loss$ .

5) 从3)得到的高斯分布中采样, 构造全连接网络根据式(2)计算 $\scriptstyle reconstruction\_loss$ .

6) 令总损失 $\scriptstyle loss$ 为2),4),5)步中3个损失之和, 应用梯度下降最小化 $\scriptstyle loss$ .

7) 返回2), 直到达到指定的迭代次数.

8) 通过反卷积操作得到指定的生成样本, 并计算聚类准确率.

3 实验与结果分析

Fashion_MNIST和MNIST的图像有着同样的长宽值, 但是图像中的服饰比手写数字的面积大, 即样本中的非零元的个数多于MNIST, 导致了当实验中设置了相同参数的情况下, 其总损失值更大, 这也是该数据集的聚类准确率小于MNIST数据集的主要原因. 卷积网络对该数据集中图像样本的处理需要更多的卷积层数, 在实验中, 隐层变量维度为60, 两个卷积网络的卷积层数为5, 编码器的神经元总数为512时, 获得了最好的聚类正确率68%, 随后也出现了过拟合的情况. 传统的变分自编码器对Fashion_MNIST数据集的聚类实验中, 在隐层维数为25, 编码器神经元总数为200时就达到了聚类准确率为55%的峰值, 在服饰数据集上, 本文方法也得到了更好的聚类准确率.

 图 2 传统VAE与改进VAE在包类别上多样性对比

 图 3 传统VAE与改进VAE在短袖类别上多样性对比

 图 4 传统VAE与改进VAE在长靴类别上多样性对比

4 结语

 [1] Kingma DP, Welling Max. Auto-encoding variational bayes. https://arxiv.org/pdf/1312.6114.pdf, 2019. [2] Fabius O, van Amersfoort JR. Variational recurrent auto-encoders. arXiv: 1412.6581, 2014. [3] 郑欣悦, 黄永辉. 基于VAE和注意力机制的小样本图像分类方法. 计算机应用与软件, 2019, 36(10): 168-174. DOI:10.3969/j.issn.1000-386x.2019.10.030 [4] 曾旭禹, 杨燕, 王淑营, 等. 一种基于深度学习的混合推荐算法. 计算机科学, 2019, 46(1): 126-130. DOI:10.11896/j.issn.1002-137X.2019.01.019 [5] Xie JY, Girshick R, Farhadi A. Unsupervised deep embedding for clustering analysis. Proceedings of the 33rd International Conference on Machine Learning. New York, NY, USA. 2016. 478–487. [6] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556, 2019. [7] Szegedy C, Liu W, Jia YQ, et al. Going deeper with convolutions. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Boston, MA, USA. 2015. 1–9. [8] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA. 2016. 770–778. [9] LeCun Y, Cortes C, Burges CJC. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/. 2019. [10] Xiao H, Rasul K, Vollgraf R. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv: 1708.07747, 2017.