At present, most facial expression recognition research uses a convolutional neural network (CNN) to extract facial features and classify them. The disadvantage of CNN is that its network structure is complex and consumes substantial computing resources. In response, this study uses the Mixer Layer network structure based on multilayer perceptron (MLP) for facial expression recognition. Data augmentation and transfer learning methods are employed to solve the problem of insufficient data set samples, and Mixer Layer networks with different layers are built. According to experimental comparison, the recognition accuracy of the 4-layer Mixer Layer network on CK+ and JAFFE data sets reach 98.71% and 95.93% respectively, and that of the 8-layer Mixer Layer network on Fer2013 data set is 63.06%. The experimental results show that the Mixer Layer networks without a convolution structure exhibit sound learning and generalization abilities in facial expression recognition tasks.