Self-supervised Learning and Multi-scale Spatio-temporal Feature Fusion for Video Quality Assessment
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Faced with insufficient labeled data in the field of video quality assessment, researchers begin to turn to self-supervised learning methods, aiming to learn video quality assessment models with the help of large amounts of unlabeled data. However, existing self-supervised learning methods primarily focus on video distortion types and content information, while ignoring dynamic information and spatiotemporal features of videos changing over time. This leads to unsatisfactory evaluation performance in complex dynamic scenes. To address these issues, a new self-supervised learning method is proposed. By taking playback speed prediction as an auxiliary pretraining task, the model can better capture dynamic changes and spatiotemporal features of videos. Combined with distortion type prediction and contrastive learning, the model’s sensitivity to video quality differences is enhanced. At the same time, to more comprehensively capture the spatiotemporal features of videos, a multi-scale spatiotemporal feature extraction module is further designed to enhance the model’s spatiotemporal modeling capability. Experimental results demonstrate that the proposed method significantly outperforms existing self-supervised learning-based approaches on the LIVE, CSIQ, and LIVE-VQC datasets. On the LIVE-VQC dataset, the proposed method achieves an average improvement of 7.90% and a maximum improvement of 17.70% in the PLCC index. Similarly, it also shows considerable competitiveness on the KoNViD-1k dataset. These results indicate that the proposed self-supervised learning framework effectively enhances the dynamic feature capture ability of the video quality assessment model and exhibits unique advantages in processing complex dynamic videos.

    Reference
    Related
    Cited by
Get Citation

于莉,王思拓,陈亚当,高攀,孙玉宝.基于自监督学习与多尺度时空特征融合的视频质量评估.计算机系统应用,,():1-11

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:August 23,2024
  • Revised:September 19,2024
  • Adopted:
  • Online: December 09,2024
  • Published:
Article QR Code
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063