Tagstream

Flash-moe: How to Stream 397B MoE Models from SSD on MacBook Pro

F

Flash-moe runs a 400-billion parameter model on a MacBook Pro by streaming weights from SSD. It turns a server-rack problem into a local one with pure C and Metal shaders. The engine streams Qwen3.5-397B-A17B from disk at 4.4 tokens per second. It achieves production-quality output including tool calling on consumer hardware. This breakthrough democratizes massive model inference for developers...

Get in touch

Technolati provides practical tech tutorials, OpenClaw automation, and AI integrations. Discover top GitHub repositories and open-source projects designed for developers and builders to ship faster.