Why Quantization Won, and Where Pruning Survived: A Practitioner's View of LLM Compression
If both quantization and pruning compress models and speed up inference, which one ships in production? Quantization, almost always. A practitioner's view of model compression for LLM inference, the four places pruning still lives, and the layered recipe that real systems use.