Which Ollama Models Work with Hermes Agent? A Quick Context Window Check
If you’ve ever tried to run Hermes Agent only to get a cryptic error about context windows, you’re not alone. Here’s a quick guide to understand what’s happening — and how to find a compatible model in Ollama.
The Error
When you launch Hermes Agent with an incompatible model, you’ll see something like this:
Model deepseek-coder:33b has a context window of 16,384 tokens, which is below the minimum 64,000 required by Hermes Agent. Choose a model with at least 64K context, or set model.context_length in config.yaml to override.
Hermes Agent is designed to handle long, multi-step reasoning and tool-use chains. For that to work reliably, it needs a model with a context window of at least 64,000 tokens. Models with smaller windows simply can’t hold enough conversation history and tool output in memory — so the agent stops before it can do any useful work.
How to Check Context Window Sizes in Ollama
Not all Ollama models advertise their context size prominently. Here’s a handy one-liner you can run in your terminal to list every local model alongside its context length:
for model in $(ollama list | tail -n +2 | awk '{print $1}'); do
cl=$(ollama show "${model}" | grep "context length" | awk '{print $3}')
echo "${model} - ${cl}"
doneHere’s what that output looks like on a real system with a variety of models installed:
| Model | Context Length (tokens) | Hermes Agent Compatible? |
|---|---|---|
| qwen3:8b | 40,960 | ✗ Too small |
| glm-5.1:cloud | 202,752 | ✓ Compatible |
| qwen3.6:latest | 262,144 | ✓ Compatible |
| gemma4:latest | 131,072 | ✓ Compatible |
| deepseek-coder:33b | 16,384 | ✗ Too small |
| deepseek-coder:6.7b | 16,384 | ✗ Too small |
| deepseek-v3.2:cloud | 163,840 | ✓ Compatible |
| qwen3.5:latest | 262,144 | ✓ Compatible |
| llama3:latest | 8,192 | ✗ Too small |
Recommended Models for Hermes Agent
Based on the output above, the following locally available models meet the 64K minimum:
- qwen3.5:latest and qwen3.6:latest — both offer a massive 262,144-token context, making them excellent choices for long agentic sessions.
- gemma4:latest — 131,072 tokens, a solid mid-range option with strong general capabilities.
- deepseek-v3.2:cloud and glm-5.1:cloud — cloud-backed models with large context windows, though they require an internet connection.
The Override Option
If you really need to use a model with a smaller context window (perhaps because of hardware constraints), Hermes Agent lets you bypass the check. In your config.yaml, set:
model:
context_length: 65536Be aware that this is an override at your own risk — the agent may behave unpredictably or fail mid-task if the model runs out of context during a long chain of actions.
Tipp: Get locallly installed ollama models using your browser
If Ollama is running locally, you can view all installed models in JSON format by opening this URL in any browser:
http://127.0.0.1:11434/v1/models
This returns a simple JSON list of your installed models. It’s a convenient way to see what’s available at a glance, but it does not include details like context window size.
Summary
Hermes Agent’s 64K context requirement isn’t arbitrary — agentic workflows accumulate a lot of state quickly. Choosing the right model upfront saves a lot of debugging later. The one-liner above is a quick way to audit your Ollama library whenever you install new models. Keep it handy.