运筹与管理 ›› 2023, Vol. 32 ›› Issue (10): 102-107.DOI: 10.12005/orms.2023.0326

• 理论分析与方法探讨 • 上一篇    下一篇

基于零模正则的神经网络剪枝方法

柳智   

  1. 华南理工大学 数学学院,广东 广州 510000
  • 收稿日期:2021-08-31 出版日期:2023-10-25 发布日期:2024-01-31
  • 作者简介:柳智(1996-),男,湖南郴州人,硕士,研究方向:最优化理论与算法及应用。
  • 基金资助:
    国家自然科学基金面上项目(11971177)

Pruning Approach to Neural Networks Based on Zero-norm Regularization

LIU Zhi   

  1. School of Mathematics, South China University of Technology, Guangzhou 510000, China
  • Received:2021-08-31 Online:2023-10-25 Published:2024-01-31

摘要: 本文提出一种有效的神经网络剪枝方法。该方法对神经网络训练模型引入零模正则项来促使模型权重稀疏,并通过删减取值为零的权重来压缩模型。对所提出的零模正则神经网络训练模型,文中通过建立其等价MPEC形式的全局精确罚得到其等价的局部Lipschitz代理,然后通过用交替方向乘子法求解该Lipschitz代理模型对网络进行训练、剪枝。最后,对MLP和LeNet-5网络模型进行测试,分别在误差2.2%和1%下,取得97.43%和99.50%的稀疏度,达到很好的剪枝效果。

关键词: 神经网络剪枝, 零模正则, 交替方向乘子法

Abstract: Deep Neural Network (DNN) has become ubiquitous in our daily life ranging from autonomous driving to smart home. It has become an inevitable trend to introduce DNN model into mobile devices and embedded systems. The redundancy of parameters has always been the main reason for hindering neural network inference and making it difficult to deploy on mobile system.
In recent years, academia and industry have proposed many methods for model compression, such as model compression, knowledge distillation, and network pruning. Neural network pruning, as an important means of network model compression, reduces network parameters by removing some neural connections, effectively overcoming the high computational cost and high memory resource proportion caused by neural network weight redundancy. Our method in this article is a further extension of the network pruning model and solving algorithm.
In this work, we propose an effective pruning method for neural networks against the problem of high computational costs and considerable memory bandwidth caused by huge complexity and parameters redundancy of neural network model. This method improves the sparsity of model weights by introducing zero-norm regularized term into the neural network model, and compresses the model by deleting those zero weights. For the proposed zero-norm regularized neural network model, by establishing the global exact penalty for its equivalent MPEC form, we obtain an equivalent Lipschitz surrogate.
Based on the equivalent local Lipschitz surrogate, considering that when the activation function is sigmod, the loss function of the final optimization model is a combination of smooth and non-smooth terms, and the smooth part can be solved through existing frameworks, while the non-smooth part has an exact expression, we design an proximal alternating direction multiplier method (P-ADMM) to solve the smooth loss model induced by sigmod activation function. Numerical experiments conducted for P-ADMM validate their efficiency. The tests for the MLP and LeNet-5 network respectively yield 97.43% and 99.50% sparsity without the loss of accuracy. The results of numerical experiment show that our method effectively reduces the complexity of the model, and has better sparse ratio compared with other pruning methods. Meanwhile, it has the advantages of convenient implementation and easy extension.
This article proposes a (P-ADMM) method for solving the smooth loss network pruning model. For the highly non convexity of the neural network model, although the paper utilizes alternating solution and the computational graph framework to solve the model, the convergence speed of the algorithm is slow in the later stage. Therefore, one of the future research directions is whether to propose an acceleration strategy to improve the convergence rate of the algorithm, and whether to directly solve the non-convex and non-smooth model using gradient methods for backpropagation algorithms and computational graph frameworks. Another interesting research direction is how to design effective algorithms to find a solution when the smooth loss function is non smooth, and what convergence properties the algorithm possesses.

Key words: network pruning, zero-norm regularization, ADMM

中图分类号: