基于自监督学习与多尺度时空特征融合的视频质量评估

doi:10.15888/j.cnki.csa.009784

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月16日 7:34 星期三

首页 > 过刊浏览>2025年第34卷第3期 >51-61. DOI:10.15888/j.cnki.csa.009784

PDF HTML阅读 XML下载导出引用引用提醒

基于自监督学习与多尺度时空特征融合的视频质量评估
DOI:
                        10.15888/j.cnki.csa.009784
                    
CSTR:
                        32024.14.csa.009784
                    
作者:
                        于莉于莉
南京信息工程大学 计算机学院、网络空间安全学院, 南京 210044
在期刊界中查找
在百度中查找
在本站中查找
王思拓王思拓
南京信息工程大学 计算机学院、网络空间安全学院, 南京 210044
在期刊界中查找
在百度中查找
在本站中查找
陈亚当陈亚当
南京信息工程大学 计算机学院、网络空间安全学院, 南京 210044
在期刊界中查找
在百度中查找
在本站中查找
高攀高攀
南京航空航天大学 计算机科学与技术学院, 南京 211106
在期刊界中查找
在百度中查找
在本站中查找
孙玉宝孙玉宝
南京信息工程大学 计算机学院、网络空间安全学院, 南京 210044
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(62002172, 62276139, U2001211)

Self-supervised Learning and Multi-scale Spatio-temporal Feature Fusion for Video Quality Assessment

Author:

YU Li
YU Li
School of Computer Science & School of Cyber Science and Engineering, Nanjing University of Information Science & Technology, Nanjing 210044, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Si-Tuo
WANG Si-Tuo
School of Computer Science & School of Cyber Science and Engineering, Nanjing University of Information Science & Technology, Nanjing 210044, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Ya-Dang
CHEN Ya-Dang
School of Computer Science & School of Cyber Science and Engineering, Nanjing University of Information Science & Technology, Nanjing 210044, China
在期刊界中查找
在百度中查找
在本站中查找
GAO Pan
GAO Pan
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
在期刊界中查找
在百度中查找
在本站中查找
SUN Yu-Bao
SUN Yu-Bao
School of Computer Science & School of Cyber Science and Engineering, Nanjing University of Information Science & Technology, Nanjing 210044, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

面对视频质量评估领域标记数据不足的问题, 研究者开始转向自监督学习方法, 旨在借助大量未标记数据来学习视频质量评估模型. 然而现有自监督学习方法主要聚焦于视频的失真类型和视频内容信息, 忽略了视频随时间变化的动态信息和时空特征, 这导致在复杂动态场景下的评估效果不尽人意. 针对上述问题, 提出了一种新的自监督学习方法, 通过播放速度预测作为预训练的辅助任务, 使模型能更好地捕捉视频的动态变化和时空特征, 并结合失真类型预测和对比学习, 增强模型对视频质量差异的敏感性学习. 同时, 为了更全面捕捉视频的时空特征, 进一步设计了多尺度时空特征提取模块等以加强模型的时空建模能力. 实验结果显示, 所提方法在LIVE、CSIQ以及LIVE-VQC数据集上, 性能显著优于现有的基于自监督学习的方法, 在LIVE-VQC数据集上, 本方法在PLCC指标上平均提升7.90%, 最高提升17.70%. 同样, 在KoNViD-1k数据集上也展现了相当的竞争力. 这些结果表明, 本文提出的自监督学习框架有效增强视频质量评估模型的动态特征捕捉能力, 并在处理复杂动态视频中显示出独特优势.

关键词:视频质量评估;自监督学习;多任务学习;播放速度预测;多尺度

Abstract:

Faced with insufficient labeled data in the field of video quality assessment, researchers begin to turn to self-supervised learning methods, aiming to learn video quality assessment models with the help of large amounts of unlabeled data. However, existing self-supervised learning methods primarily focus on video distortion types and content information, while ignoring dynamic information and spatiotemporal features of videos changing over time. This leads to unsatisfactory evaluation performance in complex dynamic scenes. To address these issues, a new self-supervised learning method is proposed. By taking playback speed prediction as an auxiliary pretraining task, the model can better capture dynamic changes and spatiotemporal features of videos. Combined with distortion type prediction and contrastive learning, the model’s sensitivity to video quality differences is enhanced. At the same time, to more comprehensively capture the spatiotemporal features of videos, a multi-scale spatiotemporal feature extraction module is further designed to enhance the model’s spatiotemporal modeling capability. Experimental results demonstrate that the proposed method significantly outperforms existing self-supervised learning-based approaches on the LIVE, CSIQ, and LIVE-VQC datasets. On the LIVE-VQC dataset, the proposed method achieves an average improvement of 7.90% and a maximum improvement of 17.70% in the PLCC index. Similarly, it also shows considerable competitiveness on the KoNViD-1k dataset. These results indicate that the proposed self-supervised learning framework effectively enhances the dynamic feature capture ability of the video quality assessment model and exhibits unique advantages in processing complex dynamic videos.

Key words:video quality assessment (VQA);self-supervised learning;multi-task learning;playback speed prediction;multi-scale

引用本文

于莉,王思拓,陈亚当,高攀,孙玉宝.基于自监督学习与多尺度时空特征融合的视频质量评估.计算机系统应用,2025,34(3):51-61

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-08-23
最后修改日期:2024-09-19
录用日期:
在线发布日期: 2024-12-09
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码