基于自监督学习与多尺度时空特征融合的视频质量评估
作者:
基金项目:

国家自然科学基金(62002172, 62276139, U2001211)


Self-supervised Learning and Multi-scale Spatio-temporal Feature Fusion for Video Quality Assessment
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [27]
  • | | | |
  • 文章评论
    摘要:

    面对视频质量评估领域标记数据不足的问题, 研究者开始转向自监督学习方法, 旨在借助大量未标记数据来学习视频质量评估模型. 然而现有自监督学习方法主要聚焦于视频的失真类型和视频内容信息, 忽略了视频随时间变化的动态信息和时空特征, 这导致在复杂动态场景下的评估效果不尽人意. 针对上述问题, 提出了一种新的自监督学习方法, 通过播放速度预测作为预训练的辅助任务, 使模型能更好地捕捉视频的动态变化和时空特征, 并结合失真类型预测和对比学习, 增强模型对视频质量差异的敏感性学习. 同时, 为了更全面捕捉视频的时空特征, 进一步设计了多尺度时空特征提取模块等以加强模型的时空建模能力. 实验结果显示, 所提方法在LIVE、CSIQ以及LIVE-VQC数据集上, 性能显著优于现有的基于自监督学习的方法, 在LIVE-VQC数据集上, 本方法在PLCC指标上平均提升7.90%, 最高提升17.70%. 同样, 在KoNViD-1k数据集上也展现了相当的竞争力. 这些结果表明, 本文提出的自监督学习框架有效增强视频质量评估模型的动态特征捕捉能力, 并在处理复杂动态视频中显示出独特优势.

    Abstract:

    Faced with insufficient labeled data in the field of video quality assessment, researchers begin to turn to self-supervised learning methods, aiming to learn video quality assessment models with the help of large amounts of unlabeled data. However, existing self-supervised learning methods primarily focus on video distortion types and content information, while ignoring dynamic information and spatiotemporal features of videos changing over time. This leads to unsatisfactory evaluation performance in complex dynamic scenes. To address these issues, a new self-supervised learning method is proposed. By taking playback speed prediction as an auxiliary pretraining task, the model can better capture dynamic changes and spatiotemporal features of videos. Combined with distortion type prediction and contrastive learning, the model’s sensitivity to video quality differences is enhanced. At the same time, to more comprehensively capture the spatiotemporal features of videos, a multi-scale spatiotemporal feature extraction module is further designed to enhance the model’s spatiotemporal modeling capability. Experimental results demonstrate that the proposed method significantly outperforms existing self-supervised learning-based approaches on the LIVE, CSIQ, and LIVE-VQC datasets. On the LIVE-VQC dataset, the proposed method achieves an average improvement of 7.90% and a maximum improvement of 17.70% in the PLCC index. Similarly, it also shows considerable competitiveness on the KoNViD-1k dataset. These results indicate that the proposed self-supervised learning framework effectively enhances the dynamic feature capture ability of the video quality assessment model and exhibits unique advantages in processing complex dynamic videos.

    参考文献
    [1] Wang Z, Rehman A. Begin with the end in mind: A unified end-to-end quality-of-experience monitoring, optimization and management framework. Proceedings of the 2017 SMPTE Annual Technical Conference and Exhibition. SMPTE, 2017. 1–11.
    [2] Kim W, Kim J, Ahn S, et al. Deep video quality assessor: From spatio-temporal visual sensitivity to a convolutional neural aggregation network. Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018. 219–234.
    [3] Xu MN, Chen JM, Wang HQ, et al. C3DVQA: Full-reference video quality assessment with 3D convolutional neural network. Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Barcelona: IEEE, 2020. 4447–4451.
    [4] Soundararajan R, Bovik AC. Video quality assessment by reduced reference spatio-temporal entropic differencing. IEEE Transactions on Circuits and Systems for Video Technology, 2013, 23(4): 684–694.
    [5] Wu W, Li QY, Chen ZZ, et al. Semantic information oriented no-reference video quality assessment. IEEE Signal Processing Letters, 2021, 28: 204–208.
    [6] Chen P, Li L, Ma L, et al. RIRNet: Recurrent-in-recurrent network for video quality assessment. Proceedings of the 28th ACM International Conference on Multimedia. Seattle: ACM, 2020. 834–842.
    [7] Huang DJ, Kao YT, Chuang TH, et al. SB-VQA: A stack-based video quality assessment framework for video enhancement. Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Vancouver: IEEE, 2023. 1613–1622.
    [8] You JY, Lin Y. Efficient Transformer with locally shared attention for video quality assessment. Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP). Bordeaux: IEEE, 2022. 356–360.
    [9] Wu HN, Liao L, Wang AN, et al. Towards robust text-prompted semantic criterion for in-the-wild video quality assessment. arXiv:2304.14672, 2023.
    [10] Lin LQ, Wang Z, He JC, et al. Deep quality assessment of compressed videos: A subjective and objective study. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(6): 2616–2626.
    [11] Kou TC, Liu XH, Sun W, et al. StableVQA: A deep no-reference quality assessment model for video stability. Proceedings of the 31st ACM International Conference on Multimedia. Ottawa: ACM, 2023. 1066–1076.
    [12] 施文娟, 孙彦景, 左海维, 等. 基于视频自然统计特性的无参考移动终端视频质量评价. 电子与信息学报, 2018, 40(1): 143–150.
    [13] 姚军财, 申静, 黄陈蓉. 基于多层BP神经网络的无参考视频质量客观评价. 自动化学报, 2022, 48(2): 594–607.
    [14] Zhang Z, Wu H, Ji Z, et al. Q-Boost: On visual quality assessment ability of low-level multi-modality foundation models. Proceedings of the 2024 IEEE International Conference on Multimedia and Expo Workshops. Niagara Falls: IEEE, 2023. 1–6.
    [15] Tu ZZ, Wang YL, Birkbeck N, et al. UGC-VQA: Benchmarking blind video quality assessment for user generated content. IEEE Transactions on Image Processing, 2021, 30: 4449–4464.
    [16] Wang JL, Jiao JB, Liu YH. Self-supervised video representation learning by pace prediction. Proceedings of the 16th European Conference on Computer Vision. Glasgow: Springer, 2020. 504–521.
    [17] Chen PF, Li LD, Wu JJ, et al. Contrastive self-supervised pre-training foひ??扩牤?o quality assessment. IEEE Transactions on Image Processing, 2022, 31: 458–471.
    [18] Mitra S, Soundararajan R. Multiview contrastive learning for completely blind video quality assessment of user generated content. Proceedings of the 30th ACM International Conference on Multimedia. Lisboa: Association for Computing Machinery, 2022. 1914–1924.
    [19] Madhusudana PC, Birkbeck N, Wang YL, et al. CONVIQT: Contrastive video quality estimator. IEEE Transactions on Image Processing, 2023, 32: 5138–5152.
    [20] Jiang SJ, Sang QB, Hu ZY, et al. Self-supervised representation learning for video quality assessment. IEEE Transactions on Broadcasting, 2023, 69(1): 118–129.
    [21] Madhusudana PC, Birkbeck N, Wang YL, et al. Image quality assessment using contrastive learning. IEEE Transactions on Image Processing, 2022, 31: 4149–4161.
    [22] Yang L, Zhang R Y, Li L, et al. SimAM: A simple, parameter-free attention module for convolutional neural networks. Proceedings of the 38th International Conference on Machine Learning. PMLR, 2021.
    [23] Liu Z, Lin YT, Cao Y, et al. Swin Transformer: Hierarchical vision Transformer using shifted windows. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021. 9992–10002.
    [24] van den Oord A, Li YZ, Vinyals O. Representation learning with contrastive predictive coding. arXiv:1807.03748, 2018.
    [25] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 770–778.
    [26] Wu HN, Chen CF, Hou JW, et al. FAST-VQA: Efficient end-to-end video quality assessment with fragment sampling. Proceedings of the 17th European Conference on Computer Vision. Tel Aviv: Springer, 2022. 538–554.
    [27] Video Quality Experts Group. Final report from the video quality experts group on the validation of objective models of video quality assessment. Proceedings of the 2000 VQEG Meeting. Ottawa: VQEG, 200
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

于莉,王思拓,陈亚当,高攀,孙玉宝.基于自监督学习与多尺度时空特征融合的视频质量评估.计算机系统应用,2025,34(3):51-61

复制
分享
文章指标
  • 点击次数:81
  • 下载次数: 948
  • HTML阅读次数: 19
  • 引用次数: 0
历史
  • 收稿日期:2024-08-23
  • 最后修改日期:2024-09-19
  • 在线发布日期: 2024-12-09
文章二维码
您是第10784823位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号