计算机系统应用  2021, Vol. 30 Issue (1): 122-128 PDF

Image Inpainting Based on New Encoder and Similarity Constraint
LIN Zhu, WANG Min
College of Computer and Information, Hohai University, Nanjing 211100, China
Abstract: The existing image repair methods have some problems such as obvious trace, semantic discontinuity, unclear, etc. To solve these problems, this study proposes an image repair method based on a new encoder and context-aware loss. In this paper, the generative adversarial network is adopted as the basic network architecture. In order to fully learn the image features and get clearer repair results, SE-ResNet is introduced to extract the effective features of the image. At the same time, the joint context-aware loss training generating network is proposed to constrain the similarity of local features, so that the repaired image is closer to the original and more real and natural. Experiments on multiple public datasets in this paper prove that the proposed method can repair the damaged images better.
Key words: generative adversarial network     image inpainting     residual network     contextual loss

(1)在生成网络和全局上下文以及局部上下文判别网络部分添加了基于SE-ResNet的残差块更好的提取特征.

(2)增加了上下文感知损失网络以辅助约束局部高频特征的相似性来修复图像.

1 相关内容

Yu等人[10]提出一种端到端的图像修复模型, 通过采用一种堆叠型的生成网络确保与周边颜色以及纹理的连贯性, 同时引入了注意力模块从距离较远的区域提取近似待修复区域的特征.

Liu等人[11]提出通过在卷积过程中更新掩膜并使用更新的掩膜值归一化卷积核的权重值, 保证卷积核能够专注于有效的像素值.

Yu等人[12]通过引入门控卷积, 学习一种特征通道的动态选择机制, 以提高色彩的一致性, 同时提出一种高效的判别器SN-PatchGAN用于辅助修复随机缺失的图像.

2 网络结构 2.1 SE-ResNet

2.2 生成网络

 ${L_{\rm {adv}}} = - {E_{x \sim {p_r}(x)}}D(G(M \odot x))$ (1)
 图 1 生成网络结构图

 ${L_{\rm {res}}} = - {E_{x \sim {p_r}(x)}}[{\left\| {M \odot (x - G(M \odot x))} \right\|_2}]$ (2)

 ${L_{ {CX}}} = - \log [CX(\Phi (x),\Phi (G(M \odot x)))]$ (3)

 ${L_{\rm {res}}} + {\lambda _1}{L_{\rm {adv}}} + {\lambda _2}{L_{{CX}}}$ (4)
2.3 判别网络

 图 2 判别网络结构图

 ${L_{\rm {dis}}} = - {E_{x \sim {p_r}}}[\log (D(x)) + \log (1 - D(G(x)))]$ (5)
2.4 上下文感知损失

 ${L_{ {CX}}} = - \log [CX(\Phi (x),\Phi (G(M \odot x)))]$ (6)

 $CX(x,y) = CX(X,Y) = \frac{1}{N}\sum\limits_j {\mathop {\max }\limits_i } C{X_{ij}}$ (7)

 $C{X_{ij}} = {w_{ij}}\Bigg/\sum\limits_k {{w_{ik}}}$ (8)

 ${w_{ij}} = \exp \left( {\dfrac{{1 - {d_{\rm {similar}}}}}{h}} \right)$ (9)

 ${d_{\rm {similar}}} = \frac{{{d_{ij}}}}{{{{\min }_k}{d_{ik}} + \varepsilon }}$ (10)

${d}_{ij}$ 归一化, 其中 ${d}_{ij}$ ${x}_{i}$ ${y}_{j}$ 的余弦距离. 上述 ${d}_{ij}$ 计算公式为:

 ${d_{ij}} = \left( {1 - \frac{{({x_i} - {\mu _y}) \cdot ({y_j} - {\mu _y})}}{{{{\left\| {{x_i} - {\mu _y}} \right\|}_2}{{\left\| {{y_j} - {\mu _y}} \right\|}_2}}}} \right)$ (11)

 ${\mu _y} = \frac{1}{N}\sum\limits_j {{y_j}}$ (12)

3 实验 3.1 数据集

3.2 训练过程

3.3 SE-ResNet的效果分析

 图 3 添加SE-ResNet残差块与否对比图

3.4 上下文感知损失的效果分析

 图 4 采用上下文感知损失与否对比图

3.5 与现有方法的比较

 图 5 与文献[3]方法的中心缺失修复效果对比图

 图 6 与文献[3]方法的随机缺失修复效果对比图

 图 7 边缘检测图对比

4 结束语

 [1] Darabi S, Shechtman E, Barnes C, et al. Image melding: Combining inconsistent images using patch-based synthesis. ACM Transactions on Graphics, 2012, 31(4): 82. [2] Pathak D, Krahenbühl P, Donahue J, et al. Context encoders: Feature learning by inpainting. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 2536–2544. [3] Iizuka S, Simo-Serra E, Ishikawa H, et al. Globally and locally consistent image completion. ACM Transactions on Graphics, 2017, 36(4): 107. [4] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA. 2018. 7132–7141. [5] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 770–778. [6] Mechrez R, Talmi I, Zelnik-Manor L, et al. The contextual loss for image transformation with non-aligned data. Proceedings of the 15th European Conference on Computer Vision. Munich, Germany. 2018. 800–815. [7] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556, 2014. [8] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge, MA, USA. 2014. 2672–2680. [9] Song YH, Yang C, Shen YJ, et al. SPG-Net: Segmentation prediction and guidance network for image inpainting. arXiv: 1805.03356, 2018. [10] Yu JH, Lin Z, Yang JM, et al. Generative image inpainting with contextual attention. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA. 2018. 5505–5514. [11] Liu HY, Jiang B, Xiao Y, et al. Coherent semantic attention for image inpainting. Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Republic of Korea. 2019. 4169–4178. [12] Yu JH, Lin Z, Yang JM, et al. Free-form image inpainting with gated convolution. Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Republic of Korea. 2019. 4471–4480. [13] Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv: 1511.07122, 2015. [14] Liu ZW, Luo P, Wang XG, et al. Large-scale celebfaces attributes (CelebA) dataset. http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html, 2018. [15] Huang GB, Ramesh M, Berg T, et al. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Amherst: University of Massachusetts, 2007. [16] 佟雨兵, 张其善, 祁云平. 基于PSNR与SSIM联合的图像质量评价模型. 中国图象图形学报, 2006, 11(12): 1758-1763. DOI:10.11834/jig.2006012307 [17] Horé A, Ziou D. Image quality metrics: PSNR vs. SSIM. Proceedings of the 20th International Conference on Pattern Recognition. Istanbul, Turkey. 2010. 2366–2369.