Abstract:In this study, Convolution Neural Network (CNN) is applied to video comprehension, and a driver fatigue detection algorithm based on multi-facial feature fusion is proposed. In the study, Multi-Task Cascaded Convolutional Neural Networks (MTCNN) is used to locate the driver's mouth and left eye. CNN is used to extract the static features from the driver's mouth and left-eye image, combined with the dynamic features that CNN extracted from the mouth and left eye optical flow to train for classification. The experimental results show that this algorithm with an accuracy rate of 87.4% is better than only use the static image for driver fatigue detection and it can well distinguish between yawning and speech actions that are similar in static images.