Which Ollama Models Work with Hermes Agent? A Quick Context Window Check

If you’ve ever tried to run Hermes Agent only to get a cryptic error about context windows, you’re not alone. Here’s a quick guide to understand what’s happening — and how to find a compatible model in Ollama.

The Error

When you launch Hermes Agent with an incompatible model, you’ll see something like this:

Model deepseek-coder:33b has a context window of 16,384 tokens,
which is below the minimum 64,000 required by Hermes Agent.

Choose a model with at least 64K context, or set
model.context_length in config.yaml to override.

Hermes Agent is designed to handle long, multi-step reasoning and tool-use chains. For that to work reliably, it needs a model with a context window of at least 64,000 tokens. Models with smaller windows simply can’t hold enough conversation history and tool output in memory — so the agent stops before it can do any useful work.

What Is a Context Window?

The context window (also called context length) is the maximum amount of text a language model can “see” and work with at any one time — measured in tokens. A token is roughly ¾ of a word, so 64,000 tokens is approximately 48,000 words, or a short novel.

Think of it as the model’s working memory. Everything relevant to the current task — your instructions, the conversation history, tool outputs, retrieved documents, and the model’s own previous responses — must fit inside this window. Once the limit is reached, older content gets pushed out and the model effectively “forgets” it.

For simple one-off questions this rarely matters. But for an AI agent like Hermes, which plans multi-step tasks, calls tools, reads results, and reasons across long chains of actions, the context window fills up fast. A window that’s too small means the agent loses track of what it was doing — or stops entirely, as you’ve seen.

see https://docs.ollama.com/context-length

How to Check Context Window Sizes in Ollama

Not all Ollama models advertise their context size prominently. Here’s a handy one-liner you can run in your terminal to list every local model alongside its context length:

Python

for model in $(ollama list | tail -n +2 | awk '{print $1}'); do
  cl=$(ollama show "${model}" | grep "context length" | awk '{print $3}')
  echo "${model} - ${cl}"
done

for model in $(ollama list | tail -n +2 | awk '{print $1}'); do
  cl=$(ollama show "${model}" | grep "context length" | awk '{print $3}')
  echo "${model} - ${cl}"
done

Here’s what that output looks like on a real system with a variety of models installed:

Model	Context Length (tokens)	Hermes Agent Compatible?
qwen3:8b	40,960	✗ Too small
glm-5.1:cloud	202,752	✓ Compatible
qwen3.6:latest	262,144	✓ Compatible
gemma4:latest	131,072	✓ Compatible
deepseek-coder:33b	16,384	✗ Too small
deepseek-coder:6.7b	16,384	✗ Too small
deepseek-v3.2:cloud	163,840	✓ Compatible
qwen3.5:latest	262,144	✓ Compatible
llama3:latest	8,192	✗ Too small

Recommended Models for Hermes Agent

Based on the output above, the following locally available models meet the 64K minimum:

qwen3.5:latest and qwen3.6:latest — both offer a massive 262,144-token context, making them excellent choices for long agentic sessions.
gemma4:latest — 131,072 tokens, a solid mid-range option with strong general capabilities.
deepseek-v3.2:cloud and glm-5.1:cloud — cloud-backed models with large context windows, though they require an internet connection.

The Override Option

If you really need to use a model with a smaller context window (perhaps because of hardware constraints), Hermes Agent lets you bypass the check. In your config.yaml, set:

YAML

model:
  context_length: 65536

model:
  context_length: 65536

Be aware that this is an override at your own risk — the agent may behave unpredictably or fail mid-task if the model runs out of context during a long chain of actions.

Tipp: Get locallly installed ollama models using your browser

If Ollama is running locally, you can view all installed models in JSON format by opening this URL in any browser:

http://127.0.0.1:11434/v1/models

This returns a simple JSON list of your installed models. It’s a convenient way to see what’s available at a glance, but it does not include details like context window size.

Summary

Hermes Agent’s 64K context requirement isn’t arbitrary — agentic workflows accumulate a lot of state quickly. Choosing the right model upfront saves a lot of debugging later. The one-liner above is a quick way to audit your Ollama library whenever you install new models. Keep it handy.