Problem :
Structural pruning enables model acceleration by removing structurally-groupd parameters form NN.
However, the parameter-grouping patterns vary widely across different models, making architecture-specific pruners, which rely on manually-designed grouping schemes, non-generalizable to new architictures.
Abstract :
We study any structural pruning, to tackle general structural pruning of arbitrary architecture like CNNs, RNNs, GNNs and Transformers.
The most prominent obstacle towards this goal lies in the structural coupling, which forces different layers to be pruned simulatenously (which has dependency) and expects all removed parameters to be consistently unimportant, thereby avoiding structural issues and performance degradation.
A. Pruning approaces can be categorized into two schemes :
1. Structurual pruning : Structural pruning changes the structure of NN by physically removing grouped parameters.
2. Unstructurual pruning : Unstructural pruning condecuts zeroing on partial weights withoug modification to the network structure.
→ Compared to Unstructurual pruning, structural pruning does not rely on specific AI accelerators or software to reduce memory consupmtion and computation costs.
B. Dependency Modeling
Instead of modeling relationships at the layer level, we concentrate on the dependencies between the inputs and outpus of the layer which facilitates easier dependency modeling.
We denote the input and output of componet for each layer as :
$$f_i = { ( {{f_i}^-} , {{f_i}^+} ) }$$
Inter-layer Dependency : See Figure 2.
Intra-layer Dependency : An intra-layer dependency necessitates that both the i/o of a single layer should be pruned simultaneously. Batch Normalization is Intra-layer Dependent and Conv Operation is Not Intra-layer Dependent.
C. Group-level Pruning
Given a parameter group : $$ g = {w_1,w_2, ... , w_{|g|}} $$
Aggregated score ( where I(w) can be L2-Norm) : $$ I(g) = \sum_{w \in g }I(w) $$