Abstract:Videos captured in low illumination environments often carry problems such as low contrast, high noise, and unclear details, which seriously affect computer vision tasks such as target detection and segmentation. Most of the existing low-light video enhancement methods are constructed based on convolutional neural networks. Since convolution cannot make full use of the long-range dependencies between pixels, the generated video often suffers from loss of details and color distortion in some regions. To address the above problems, this study proposes a Siamese low-light video enhancement network coupling local and global features. The model obtains local features of video frames through a deformable convolution-based local feature extraction module and designs a lightweight self-attention module to capture the global features of video frames. Finally, the extracted local and global features are fused by a feature fusion module, which guides the model to generate enhanced videos with more realistic colors and details. The experimental results show that the proposed method can effectively improve the brightness of low-light videos and generate videos with richer colors and details. It also outperforms the methods proposed in recent years in evaluation metrics such as peak signal-to-noise ratio and structural similarity.