Hooking into NVIDIA’s NGC inference endpoints brings production-grade GPU acceleration to your Openclaw setup without managing infrastructure.
- NVIDIA’s optimized inference stack delivers low-latency responses from state-of-the-art models like Nemotron and Llama 3.
- Developers often struggle with proper API key configuration and model naming conventions.
- A streamlined setup connecting Openclaw directly to NVIDIA’s model catalog.
Start by obtaining your API key from the NVIDIA NGC portal. The key follows a predictable format starting with nvapi-.

Step 1: Set Environment Variable
Export your API key on the gateway host:
export NVIDIA_API_KEY="nvapi-..."
Step 2: Configure via CLI
Skip interactive auth and set your model directly:
openclaw onboard --auth-choice skip
openclaw models set nvidia/nvidia/llama-3.1-nemotron-70b-instruct
Step 3: Manual Configuration
For persistent settings, edit ~/.openclaw/openclaw.json:
{
"env": { "NVIDIA_API_KEY": "nvapi-..." },
"models": {
"providers": {
"nvidia": {
"baseUrl": "https://integrate.api.nvidia.com/v1",
"api": "openai-completions"
}
}
},
"agents": {
"defaults": {
"model": { "primary": "nvidia/nvidia/llama-3.1-nemotron-70b-instruct" }
}
}
}
Available Models
NVIDIA offers several optimized models:
- nvidia/llama-3.1-nemotron-70b-instruct — Default, general purpose
- meta/llama-3.3-70b-instruct — Latest Llama variant
- nvidia/mistral-nemo-minitron-8b-8k-instruct — Efficient smaller model
For alternative GPU inference options, consider Moonshot AI Models or Ollama Models for local deployment.
Troubleshooting & Best Practices
- Key format: Ensure your key starts with nvapi-.
- Region selection: Choose the closest NVIDIA region for lowest latency.
- Model updates: NVIDIA frequently updates models; check NGC catalog for latest versions.
NVIDIA integration gives Openclaw Platform access to high-performance inference without the complexity of self-hosting GPU clusters.
