[C15] PAIRS: Pruning-AIded Row-Skipping for SDK-Based Convolutional Weight Mapping in Processing-In-Memory Architectures

Processing-in-memory (PIM) architecture is becoming a promising candidate for convolutional neural network (CNN) inference. A recent weight mapping method called shift and duplicate kernel (SDK) improves the utilization by the deployment of shifting the same kernels into idle columns. However, this method inevitably generates idle cells with an irregular distribution, which limits reducing the size of the weight matrix. To effectively compress the weight matrix in the PIM array, prior works have introduced a row-wise pruning scheme, one of the structured weight pruning schemes, that aims to skip the operation on a row by zeroing out all weight in the specific row (we call it row-skipping). However, due to the deployment of shifting kernels, SDK mapping complicates zeroing out all the weight in the same row. To address this issue, we propose pruning-aided row-skipping (PAIRS) that effectively reduces the number of rows of convolutional weights that are mapped with SDK mapping. By pairing the SDK mapping-aware pruning pattern design and row-wise pruning, PAIRS achieves a higher row-skipping ratio. In comparison to pruning methods, PAIRS achieves up to 1.95× rows skipped and 4× higher compression rate with similar or even better inference accuracy.

[C14] Weight-Aware Activation Mapping for Energy-Efficient Convolution on PIM Arrays

Convolutional weight mapping plays a stapling role in facilitating convolution operations on Processing-in-memory (PIM) architecture which is, at its essence, a matrix-vector multiplication (MVM) accelerator. Despite its importance, convolutional mapping methods are under-studied and existing mapping methods fail to exploit the sparse and redundant characteristics of heavily quantized convolutional weights, leading to low array utilization and ineffectual computations. To address these issues, this paper proposes a novel weight-aware activation mapping method where activations are mapped onto the memory cells instead of the weights. The proposed method significantly reduces the number of computing cycles by skipping zero-valued weights and merging those PIM array rows with the same weight values. Experimental results on ResNet-18 demonstrate that the proposed weight-aware activation mapping can achieve up to 90% energy saving and latency reduction compared to the conventional approaches.