Aiming at the problem of convolutional layer parameter redundancy and low operation efficiency in convolutional neural network, a convolution neural network (CNN) model compression method based on statistical analysis is proposed in this paper. On the premise of ensuring a good ability of convolutional neural network to process information, the well-trained convolution neural network model is compressed by pruning the convolution kernels which have less influence on the whole model in the convolution layer, meanwhile, reducing the parameters of CNN without losing the model accuracy so as to reduce the amount of computation. Experiments show that the proposed method can effectively compress the convolution neural network model while maintaining a good performance.
1 Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25. 2012, 60(2):1097-1105.
2 Girshick R. Fast R-CNN. Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile. 2015. 1440-1448.
3 Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA. 2015. 3431-3440.
4 Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks. Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA. 2014. 1725-1732.
6 Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Computer Science, 2014.
7 Szegedy C, Liu W, Jia YQ, et al. Going deeper with convolutions. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA. 2015. 1-9.
8 He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 770-778.
9 Russakovsky O, Deng J, Su H, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3):211-252.[DOI:10.1007/s11263-015-0816-y]
10 Silver D, Schrittwieser J, Simonyan K, et al. Mastering the game of go without human knowledge. Nature, 2017, 550(7676):354-359.[DOI:10.1038/nature24270]
11 Johnson R, Zhang T. Semi-supervised convolutional neural networks for text categorization via region embedding. Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada. 2015. 919-927.
12 Guo XJ, Chen L, Shen CQ. Hierarchical adaptive deep convolution neural network and its application to bearing fault diagnosis. Measurement, 2016, 93:490-502.[DOI:10.1016/j.measurement.2016.07.054]
14 Cun LY, Denker JS, Solla SA. Optimal brain damage. In:Jordan MI, LeCun Y, Solla SA, eds. Advances in Neural Information Processing Systems 2. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 1990. 598-605.
15 Hassibi B, Stork DG. Second order derivatives for network pruning:Optimal brain surgeon. In:Hanson SJ, Cowan JD, Giles CL, eds. Advances in Neural Information Processing Systems 5. Morgan Kaufmann, San Mateo, CA, USA. 1993. 164-171.
16 Han S, Mao HZ, Dally WJ. Deep compression:Compressing deep neural networks with pruning, trained quantization and huffman coding. Fiber, 2015, 56(4):3-7.
17 Wen W, Wu CP, Wang YD, et al. Learning structured sparsity in deep neural networks. Advances in Neural Information Processing Systems 29. 2016. 2074-2082.
18 Hinton G, Vinyals O, Dean J. Distilling the Knowledge in a Neural Network. Computer Science, 2015, 14(7):38-39.
19 Lin M, Chen Q, Yan SC. Network in network. arXiv:1312.4400, 2013.
20 Hu H, Peng R, Tai Y W, et al. Network trimming:A data-driven neuron pruning approach towards efficient deep architectures. arXiv:1607.03250, 2016.
21 Polyak A, Wolf L. Channel-level acceleration of deep face representations. IEEE Access, 2015, 3:2163-2175.[DOI:10.1109/ACCESS.2015.2494536]
22 Li H, Kadav A, Durdanovic I, et al. Pruning filters for efficient convnets. arXiv:1608.08710, 2016.
23 Hubel DH, Wiesel TN. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. The Journal of Physiology, 1962, 160(1):106-154.[DOI:10.1113/jphysiol.1962.sp006837]
24 Cun LY, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11):2278-2324.[DOI:10.1109/5.726791]
25 Dalal N, Triggs B. Histograms of oriented gradients for human detection. Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, CA, USA. 2005. 886-893.
26 Lindeberg T. Scale invariant feature transform. Scholarpedia, 2012, 7(5):10491.[DOI:10.4249/scholarpedia.10491]
27 Srivastava N, Hinton GE, Krizhevsky A, et al. Dropout:A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 2014, 15(1):1929-1958.
28 Ioffe S, Szegedy C. Batch normalization:Accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on Machine Learning. Lille, France. 2015. 448-456.