AI agents,  ollama

Which Ollama Models Work with Hermes Agent? A Quick Context Window Check

If you’ve ever tried to run Hermes Agent only to get a cryptic error about context windows, you’re not alone. Here’s a quick guide to understand what’s happening — and how to find a compatible model in Ollama.

The Error

When you launch Hermes Agent with an incompatible model, you’ll see something like this:

Hermes Agent is designed to handle long, multi-step reasoning and tool-use chains. For that to work reliably, it needs a model with a context window of at least 64,000 tokens. Models with smaller windows simply can’t hold enough conversation history and tool output in memory — so the agent stops before it can do any useful work.

How to Check Context Window Sizes in Ollama

Not all Ollama models advertise their context size prominently. Here’s a handy one-liner you can run in your terminal to list every local model alongside its context length:

Python
for model in $(ollama list | tail -n +2 | awk '{print $1}'); do
  cl=$(ollama show "${model}" | grep "context length" | awk '{print $3}')
  echo "${model} - ${cl}"
done

Here’s what that output looks like on a real system with a variety of models installed:

ModelContext Length (tokens)Hermes Agent Compatible?
qwen3:8b40,960✗ Too small
glm-5.1:cloud202,752✓ Compatible
qwen3.6:latest262,144✓ Compatible
gemma4:latest131,072✓ Compatible
deepseek-coder:33b16,384✗ Too small
deepseek-coder:6.7b16,384✗ Too small
deepseek-v3.2:cloud163,840✓ Compatible
qwen3.5:latest262,144✓ Compatible
llama3:latest8,192✗ Too small

Recommended Models for Hermes Agent

Based on the output above, the following locally available models meet the 64K minimum:

  • qwen3.5:latest and qwen3.6:latest — both offer a massive 262,144-token context, making them excellent choices for long agentic sessions.
  • gemma4:latest — 131,072 tokens, a solid mid-range option with strong general capabilities.
  • deepseek-v3.2:cloud and glm-5.1:cloud — cloud-backed models with large context windows, though they require an internet connection.

The Override Option

If you really need to use a model with a smaller context window (perhaps because of hardware constraints), Hermes Agent lets you bypass the check. In your config.yaml, set:

YAML
model:
  context_length: 65536

Be aware that this is an override at your own risk — the agent may behave unpredictably or fail mid-task if the model runs out of context during a long chain of actions.

Tipp: Get locallly installed ollama models using your browser

If Ollama is running locally, you can view all installed models in JSON format by opening this URL in any browser:

http://127.0.0.1:11434/v1/models

This returns a simple JSON list of your installed models. It’s a convenient way to see what’s available at a glance, but it does not include details like context window size.

Summary

Hermes Agent’s 64K context requirement isn’t arbitrary — agentic workflows accumulate a lot of state quickly. Choosing the right model upfront saves a lot of debugging later. The one-liner above is a quick way to audit your Ollama library whenever you install new models. Keep it handy.

Leave a Reply

Your email address will not be published. Required fields are marked *