BitNet.cpp is Microsoft’s official inference framework for 1-bit LLMs. It enables running very large quantized models on standard CPUs without GPU hardware. The framework provides optimized kernels for lossless inference at 1.58-bit precision using the BitNet b1.58 architecture. The first release focuses on CPU inference, with GPU and NPU support planned. This lets you run models that...
How to Run 1-Bit LLMs on CPU with BitNet.cpp
H
