- Model compression as constrained optimization, with application to neural nets. Part I: general framework
- Model compression as constrained optimization, with application to neural nets. Part II: quantization
- A Survey of Model Compression and Acceleration for Deep Neural Networks
- Recent Advances in Efficient Computation of Deep Convolutional Neural Networks
- Rethinking the Value of Network Pruning [code]
- To prune, or not to prune: exploring the efficacy of pruning for model compression [code]
- Low-Memory Neural Network Training: A Technical Report
- Rethinking Bottleneck Structure for EfficientMobile Network Design [code]
- GhostNet: More Features from Cheap Operations [code]
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks [code]
- Searching for MobileNetV3 [code] [code]
- Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation [code]
- Pelee:A Real-Time Object Detection System on Mobile Devices [code]
- PVANet: Lightweight Deep Neural Networks for Real-time Object Detection [code]
- Receptive Field Block Net for Accurate and Fast Object Detection [code]
- ClcNet: Improving the Efficiency of Convolutional Neural Network Using Channel Local Convolutions [code]
- ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions [code]
- Accelerating Deep Neural Networks with Spatial Bottleneck Modules [code]
- Neural Architecture Optimization [code]
- Neural Architecture Search with Reinforcement Learning [code]
- Regularized Evolution for Image Classifier Architecture Search [code]
- IRLAS: Inverse Reinforcement Learning for Architecture Search
- Auto-Keras: Efficient Neural Architecture Search with Network Morphism [code]
- EffNet: An Efficient Structure for Convolutional Neural Networks [code]
- MnasNet: Platform-Aware Neural Architecture Search for Mobile [code:pytorch] [code:caffe]
- Efficient Neural Architecture Search via Parameters Sharing [code:tensorflow] [code:pytorch]
- DARTS: Differentiable Architecture Search [code]
- Path-Level Network Transformation for Efficient Architecture Search [code]
- SqueezeNext: Hardware-Aware Neural Network Design [code]
- LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks [code]
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices [code]
- ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design [code]
- CondenseNet: An Efficient DenseNet using Learned Group Convolutions [code]
- ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware [code]
- Interleaved Group Convolutions for Deep Neural Networks [code]
- IGCV2: Interleaved Structured Sparse Convolutional Neural Networks
- IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks [code]
- Dynamic Capacity Networks [code]
- ResNeXt: Aggregated Residual Transformations for Deep Neural Networks [code]
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [code]
- Xception: Deep Learning with Depthwise Separable Convolutions [code]
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression [code]
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size [code]
- Residual Attention Network for Image Classification [code]
- SEP-Nets: Small and Effective Pattern Networks [code]
- Deep Networks with Stochastic Depth [code]
- Learning Infinite Layer Networks Without the Kernel Trick
- Coordinating Filters for Faster Deep Neural Networks [code]
- ResBinNet: Residual Binary Neural Network
- Squeezedet: Unified, small, low power fully convolutional neural networks [code]
- Efficient Sparse-Winograd Convolutional Neural Networks [code]
- DSD: Dense-Sparse-Dense Training for Deep Neural Networks [code]
- Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video
- Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation [code]
- Paraphrasing Complex Network: Network Compression via Factor Transfer
- Moonshine: Distilling with Cheap Convolutions
- Knowledge Distillation by On-the-Fly Native Ensemble
- Dark knowledge [code]
- Deep Mutual Learning [code]
- FitNets: Hints for Thin Deep Nets [code]
- Net2net: Accelerating learning via knowledge transfer [code]
- Distilling the Knowledge in a Neural Network [code]
- MobileID: Face Model Compression by Distilling Knowledge from Neurons [code]
- DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer [code]
- Deep Model Compression: Distilling Knowledge from Noisy Teachers
- Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer [code]
- Sequence-Level Knowledge Distillation [code]
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer [code]
- Learning Efficient Object Detection Models with Knowledge Distillation
- Data-Free Knowledge Distillation For Deep Neural Networks [code]
- Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks
- Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks
- Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification
- A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning
- Scalable Methods for 8-bit Training of Neural Networks [code]
- Local Binary Convolutional Neural Networks [code]
- Training Competitive Binary Neural Networks from Scratch
- Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration
- Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
- XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
- DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
- Learning Sparse Neural Networks via Sensitivity-Driven Regularization
- Effective Quantization Methods for Recurrent Neural Networks
- The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning
- Quantized Convolutional Neural Networks for Mobile Devices
- Compressing Deep Convolutional Networks using Vector Quantization
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
- Fixed-Point Performance Analysis of Recurrent Neural Networks
- Loss-aware Binarization of Deep Networks
- Towards the Limit of Network Quantization
- Deep Learning with Low Precision by Half-wave Gaussian Quantization
- ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks
- Trained Ternary Quantization
- Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM
- From Hashing to CNNs: Training BinaryWeight Networks via Hashing
- Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks
- QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding
- TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
- On Implicit Filter Level Sparsity in Convolutional Neural Networks
- Structured Pruning of Neural Networks with Budget-Aware Regularization
- Importance Estimation for Neural Network Pruning
- Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [code]
- Towards Optimal Structured CNN Pruning via Generative Adversarial Learning [code]
- Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure [code]
- Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search
- The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
- Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks [code]
- ResNet Can Be Pruned 60x: Introducing Network Purification and Unused Path Removal (P-RM) after Weight Pruning
- Discrimination-aware Channel Pruning for Deep Neural Networks
- Data-Driven Sparse Structure Selection for Deep Neural Networks
- Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization
- Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing
- Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning
- Pruning Filters for Efficient ConvNets
- Pruning Convolutional Neural Networks for Resource Efficient Inference
- Soft Weight-Sharing for Neural Network Compression
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
- Learning both Weights and Connections for Efficient Neural Networks
- Dynamic Network Surgery for Efficient DNNs
- ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA
- Faster CNNs with Direct Sparse Convolutions and Guided Pruning
- Channel Pruning for Accelerating Very Deep Neural Networks
- Combined Group and Exclusive Sparsity for Deep Neural Networks
- SBNet: Sparse Blocks Network for Fast Inference
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
- Learning Efficient Convolutional Networks through Network Slimming
- meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting
- Variational Dropout Sparsifies Deep Neural Networks
- Wide Compression: Tensor Ring Nets
- Tensorizing Neural Networks
- Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition
- Accelerating Convolutional Neural Networks for Mobile Applications
- Low-rank Bilinear Pooling for Fine-Grained Classification
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications
- Efficient and Accurate Approximations of Nonlinear Convolutional Networks
- Accelerating Very Deep Convolutional Networks for Classification and Detection
- Convolutional neural networks with low-rank regularization
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
- Speeding up convolutional neural networks with low rank expansions
- Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition