[1] HAN S, POOL J, TRAN J, et al. Learning both weights and connections for efficient neural network[C]//Proceedings of the 29th Annual Conference on Neural Information Processing Systems, December 7-12, 2015, La Jolla. CA:NIPS, 2015: 1135-1143. [2] ZHU M, GUPTA S. To prune, or not to prune: Exploring the efficacy of pruning for model compression[EB/OL]. (2017-11-13)[2021-08-31]. https://arxiv.org/abs/1710.01878. [3] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[EB/OL]. (2015-03-09)[2021-08-31]. https://arxiv.org/abs/1503.02531. [4] IOANNOU Y, ROBERTSON D, SHOTTON J, et al. Training CNNs with low-rank filters for efficient image classification[C]//Proceedings of the 4th International Conference on Learning Representations, May 2-4, 2016, San Juan, Puerto Rico. Appleton, WI: ICLR, 2016: 1-17. [5] KIAEE F, GAGN C, ABBASI M. Alternating direction method of multipliers for sparse convolutional neural networks[EB/OL]. (2017-01-15)[2021-08-31]. https://arxiv.org/abs/1611.01590. [6] WU W, FAN Q, ZURADA J M, et al. Batch gradient method with smoothing L1/2 regularization for training of feedforward neural networks[J]. Neural Networks, 2014, 50: 72-78. [7] LIU Y, BI S, PAN S. Equivalent Lipschitz surrogates for zero-norm and rank optimization problems[J]. Journal of Global Optimization, 2018, 72(4): 679-704. [8] KINGMA D P, BA J. Adam: A method for stochastic optimization[C]//Proceedings of the 3rd International Conference on Learning Representations, May 7-9, 2015, San Diego, CA. Appleton, WI: ICLR, 2015: 1-15. [9] WEN W, WU C, WANG Y, et al. Learning structured sparsity in deep neural networks[C]//Proceedings of the 30th Annual Conference on Neural Information Processing Systems, December 5-10, 2016, Barcelona. La Jolla, CA: NIPS, 2016: 2082-2090. [10] LOUIZOS C, WELLING M, KINGMA D P. Learning sparse neural networks through L0 regularization[C]//Proceedings of the 6th International Conference on Learning Representations, April 30-May 3, 2018, Vancouver. Appleton, WI: ICLR, 2018: 1-13. [11] ROCKAFELLAR R T. Convex analysis[M]. Princeton, NJ: Princeton University Press, 1970. [12] ZHANG T, YE S, ZHANG K, et al. A systematic DNN weight pruning framework using alternating direction method of multipliers[C]//Proceedings of the 15th European Conference on Computer Vision, September 8-14, 2018, Munich. Cham: Springer, 2018: 191-207. [13] WANG Y, YIN W, ZENG J. Global convergence of ADMM in nonconvex nonsmooth optimization[J]. Journal of Scientific Computing, 2019, 78(1): 29-63. [14] LI Y, JI S. L0-ARM: Network sparsification via stochastic binary optimization[C]//Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, September 16-20, 2019, Würzburg. Cham: Springer, 2020: 432-448. [15] GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, May 13-15, 2010, Sardinia. Cambridge.MA: JMLR, 2010: 249-256. |