How to Access Apple’s Hidden 3B LLM on Your Mac

H

Your Apple Silicon Mac already has a 3-billion parameter language model sitting on disk, locked behind Siri. Apfel by Arthur-Ficia is a native Swift project that wraps Apple’s on-device Foundation Model into a CLI tool and an OpenAI-compatible HTTP server. No API keys, no cloud, no per-token costs. It runs entirely on your Neural Engine.

Apfel GitHub repository
Apfel GitHub repository homepage

What Is Apfel?

Apfel is for Apple Silicon Mac users who want to run a local AI model without APIs or cloud dependencies. It gives you direct access to the hardware-accelerated model already installed on your machine.

The project exposes two interfaces:

  1. CLI Tool – Run the model directly from your terminal for quick queries and script integration.
  2. OpenAI-compatible HTTP Server – Spin up a local server that mimics the OpenAI API, allowing you to use existing tools that talk to OpenAI endpoints, but with zero network latency and no usage costs.

Inference happens entirely on the Neural Engine, keeping everything on-device and private.

Technical Details and Limitations

The model is a 3B parameter LLM, but there are caveats around quantization:

“It’s a 3B, but quantized at 3-bit (allegedly) uniformly, which neuters its stability, predictably, and accuracy by quite a bit. At a minimum, you want LLMs to be quantized to 4-bits with some more important weights at higher bits. Furthermore, it seems like Apple’s own LLM model hasn’t been fine tuned or retrained for quite a while, so your mileage may vary a lot compared to newer releases.”

– @lilwingflyhigh

Community discussion
Community discussion about Apfel’s capabilities

What It’s Good For

  • Explaining code snippets – Plain-English breakdowns of short scripts.
  • Spotting simple issues – Identifying obvious syntax or logic errors.
  • Educational scenarios – Step-by-step explanations of shell commands or basic programming constructs.

What It Struggles With

  • Fixing full scripts – Loses context and structure on larger files.
  • Multi-step debugging – Can’t maintain coherent debugging sessions across many lines.
  • Production-grade code generation – Not reliable for real-world development tasks.

“Tried it. Result: fascinating, not ready for prime time. Gave it a deliberately broken LAN-mapping script. Instead of repairing it eventually overflowed context, lost plot, started explaining random shell lines incorrectly one by one. It became more of a confused tutor than an actual debugger.”

– @mikelawsonmusic

Threads commentary
More commentary on Apfel’s practical utility

Getting Started

Project link: https://github.com/Arthur-Ficial/apfel

Install via Homebrew or build from source. Once installed, run the CLI or start the HTTP server and point your OpenAI-compatible tools to http://localhost:8080.

While Apfel isn’t a drop-in replacement for cloud-based models like GPT-4, it represents a step toward truly local, private AI inference. For developers curious about what their Mac’s Neural Engine can already do, it’s a free window into edge AI.

Need more power? Flash-moe – stream massive 397B MoE models directly from your SSD.

About the author

Hairun Wicaksana

Hi, I just another vibecoder from Southeast Asia, currently based in Stockholm. Building startup experiments while keeping close to the KTH Innovation startup ecosystem. I focus on AI tools, automation, and fast product experiments, sharing the journey while turning ideas into working software.

Get in touch

Quickly communicate covalent niche markets for maintainable sources. Collaboratively harness resource sucking experiences whereas cost effective meta-services.