Tagstream

Flash-moe: How to Stream 397B MoE Models from SSD on MacBook Pro

F

Flash-moe runs a 400-billion parameter model on a MacBook Pro by streaming weights from SSD. It turns a server-rack problem into a local one with pure C and Metal shaders. The engine streams Qwen3.5-397B-A17B from disk at 4.4 tokens per second. It achieves production-quality output including tool calling on consumer hardware. This breakthrough democratizes massive model inference for developers...

Get in touch

Quickly communicate covalent niche markets for maintainable sources. Collaboratively harness resource sucking experiences whereas cost effective meta-services.