Abstract:Currently, video analysis is usually based on video frames, but video frames usually have a lot of redundancy, so the extraction of key frames is crucial. The existing traditional manual extraction methods usually have the phenomena of missing frames, redundant frames and so on. With the development of deep learning, compared with traditional manual extraction methods, deep convolution network can greatly improve the ability of image feature extraction. Therefore, this study proposes a method to extract key frames by combining the depth feature extraction of video frame with the traditional manual feature extraction method. First, the convolutional neural network was used to extract the depth features of video frames, then the content features were extracted based on the traditional manual method, and finally the content features and depth features were fused to extract the key frames. The experimental results show that the proposed method has better performance than the previous key frame extraction method.