Abstract:The paper briefly introduces the history and application of speech recognition, traditional speech recognition techniques, and current research progress. Traditional speech recognition relies on statistics-based methods and sound spectrum features to train Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) hybrid model. Nowadays, speech recognition models are mainly based on deep learning. Generally, Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) can effectively extract features to establish acoustic models. Further research depends on end-to-end techniques to avoid error transmission among models, and these techniques mainly include Connectionist Temporal Classification (CTC) and attention. The latest models and methods highlight attention, which are trying to integrate it with CTC to achieve better results. Finally, combined with the authors’ understanding, the paper summarizes the existing problems and future development in speech recognition.