Quantization

A technique that reduces the memory and compute footprint of an AI model by representing its weights with lower numerical precision, making it faster and cheaper to run.

Related terms