본문 바로가기

카테고리 없음

DepGraph: Towards Any Structural Pruning

Problem : 

 

Structural pruning enables model acceleration by removing structurally-groupd parameters form NN.

However, the parameter-grouping patterns vary widely across different models, making architecture-specific pruners, which rely on manually-designed grouping schemes, non-generalizable to new architictures.

 

Abstract :

 

We study any structural pruning, to tackle general structural pruning of arbitrary architecture like CNNs, RNNs, GNNs and Transformers.

The most prominent obstacle towards this goal lies in the structural coupling, which forces different layers to be pruned simulatenously (which has dependency) and expects all removed parameters to be consistently unimportant, thereby avoiding structural issues and performance degradation.

 

A. Pruning approaces can be categorized into two schemes :

 

1. Structurual pruning : Structural pruning changes the structure of NN by physically removing grouped parameters. 

 

2. Unstructurual pruning : Unstructural pruning condecuts zeroing on partial weights withoug modification to the network structure. 

 

→ Compared to Unstructurual pruning, structural pruning does not rely on specific AI accelerators or software to reduce memory consupmtion and computation costs.

 

B. Dependency Modeling

 

Instead of modeling relationships at the layer level, we concentrate on the dependencies between the inputs and outpus of the layer which facilitates easier dependency modeling.

 

We denote the input and output of componet for each layer as :

$$f_i = { ( {{f_i}^-} , {{f_i}^+} ) }$$

 

Inter-layer Dep (i != j), Intra-layer Dep (i==j), The symbol <-> signifies the connectivity between two adjacent layers

 

 

Inter-layer Dependency : See Figure 2. 

 

Intra-layer Dependency : An intra-layer dependency necessitates that both the i/o of a single layer should be pruned simultaneously. Batch Normalization is Intra-layer Dependent and Conv Operation is Not Intra-layer Dependent. 

Look closely on the OR operation and AND operation

C. Group-level Pruning

Given a parameter group : $$ g = {w_1,w_2, ... , w_{|g|}} $$

 

Aggregated score (  where I(w) can be L2-Norm) : $$ I(g) = \sum_{w \in g }I(w) $$

Regularization term used for sparse training
Importance of k-th prunable dimension, regarding to the group
Use controllabble exponential form, this normalized score is used to control the shrinkage strength $$ \alpha_{k} $$
Relative Scoring for Prunning

 

You can use Sparse Training with GroupNormPruner, or just use MetaPruner using magnitude based