{"id":1183,"date":"2026-04-19T11:42:07","date_gmt":"2026-04-19T11:42:07","guid":{"rendered":"https:\/\/techno.slomka.biz\/?p=1183"},"modified":"2026-04-21T09:56:19","modified_gmt":"2026-04-21T09:56:19","slug":"which-ollama-models-work-with-hermes-agent-a-quick-context-window-check","status":"publish","type":"post","link":"https:\/\/techno.slomka.biz\/?p=1183","title":{"rendered":"Which Ollama Models Work with Hermes Agent? A Quick Context Window Check"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">If you&#8217;ve ever tried to run <strong>Hermes Agent<\/strong> only to get a cryptic error about context windows, you&#8217;re not alone. Here&#8217;s a quick guide to understand what&#8217;s happening \u2014 and how to find a compatible model in Ollama.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Error<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">When you launch Hermes Agent with an incompatible model, you&#8217;ll see something like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted has-vivid-red-color has-text-color has-link-color wp-elements-4a3197757833c88b79c2872d67b48ee5\">Model deepseek-coder:33b has a context window of 16,384 tokens,\nwhich is below the minimum 64,000 required by Hermes Agent.\n\nChoose a model with at least 64K context, or set\nmodel.context_length in config.yaml to override.\n<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Hermes Agent is designed to handle long, multi-step reasoning and tool-use chains. For that to work reliably, it needs a model with a context window of <strong>at least 64,000 tokens<\/strong>. Models with smaller windows simply can&#8217;t hold enough conversation history and tool output in memory \u2014 so the agent stops before it can do any useful work.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What Is a Context Window?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The <strong>context window<\/strong> (also called <strong>context length<\/strong>) is the maximum amount of text a language model can &#8220;see&#8221; and work with at any one time \u2014 measured in <strong>tokens<\/strong>. A token is roughly \u00be of a word, so 64,000 tokens is approximately 48,000 words, or a short novel.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Think of it as the model&#8217;s working memory. Everything relevant to the current task \u2014 your instructions, the conversation history, tool outputs, retrieved documents, and the model&#8217;s own previous responses \u2014 must fit inside this window. Once the limit is reached, older content gets pushed out and the model effectively &#8220;forgets&#8221; it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For simple one-off questions this rarely matters. But for an <strong>AI agent<\/strong> like Hermes, which plans multi-step tasks, calls tools, reads results, and reasons across long chains of actions, the context window fills up fast. A window that&#8217;s too small means the agent loses track of what it was doing \u2014 or stops entirely, as you&#8217;ve seen.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">see <a href=\"https:\/\/docs.ollama.com\/context-length\">https:\/\/docs.ollama.com\/context-length<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How to Check Context Window Sizes in Ollama<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Not all Ollama models advertise their context size prominently. Here&#8217;s a handy one-liner you can run in your terminal to list every local model alongside its context length:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#EEFFFF;--cbp-line-number-width:calc(1 * 0.6 * .875rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#304047;color:#d5ffff\">Python<\/span><span role=\"button\" tabindex=\"0\" style=\"color:#EEFFFF;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>for model in $(ollama list | tail -n +2 | awk '{print $1}'); do\n  cl=$(ollama show \"${model}\" | grep \"context length\" | awk '{print $3}')\n  echo \"${model} - ${cl}\"\ndone<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki material-theme\" style=\"background-color: #263238\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #89DDFF; font-style: italic\">for<\/span><span style=\"color: #EEFFFF\"> model <\/span><span style=\"color: #89DDFF; font-style: italic\">in<\/span><span style=\"color: #EEFFFF\"> $<\/span><span style=\"color: #89DDFF\">(<\/span><span style=\"color: #EEFFFF\">ollama <\/span><span style=\"color: #FFCB6B\">list<\/span><span style=\"color: #EEFFFF\"> <\/span><span style=\"color: #89DDFF\">|<\/span><span style=\"color: #EEFFFF\"> tail <\/span><span style=\"color: #89DDFF\">-<\/span><span style=\"color: #EEFFFF\">n <\/span><span style=\"color: #89DDFF\">+<\/span><span style=\"color: #F78C6C\">2<\/span><span style=\"color: #EEFFFF\"> <\/span><span style=\"color: #89DDFF\">|<\/span><span style=\"color: #EEFFFF\"> awk <\/span><span style=\"color: #89DDFF\">&#39;<\/span><span style=\"color: #C3E88D\">{print $1}<\/span><span style=\"color: #89DDFF\">&#39;<\/span><span style=\"color: #89DDFF\">)<\/span><span style=\"color: #EEFFFF\">; do<\/span><\/span>\n<span class=\"line\"><span style=\"color: #EEFFFF\">  cl<\/span><span style=\"color: #89DDFF\">=<\/span><span style=\"color: #EEFFFF\">$<\/span><span style=\"color: #89DDFF\">(<\/span><span style=\"color: #EEFFFF\">ollama show <\/span><span style=\"color: #89DDFF\">&quot;<\/span><span style=\"color: #C3E88D\">$<\/span><span style=\"color: #F78C6C\">{model}<\/span><span style=\"color: #89DDFF\">&quot;<\/span><span style=\"color: #EEFFFF\"> <\/span><span style=\"color: #89DDFF\">|<\/span><span style=\"color: #EEFFFF\"> grep <\/span><span style=\"color: #89DDFF\">&quot;<\/span><span style=\"color: #C3E88D\">context length<\/span><span style=\"color: #89DDFF\">&quot;<\/span><span style=\"color: #EEFFFF\"> <\/span><span style=\"color: #89DDFF\">|<\/span><span style=\"color: #EEFFFF\"> awk <\/span><span style=\"color: #89DDFF\">&#39;<\/span><span style=\"color: #C3E88D\">{print $3}<\/span><span style=\"color: #89DDFF\">&#39;<\/span><span style=\"color: #89DDFF\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #EEFFFF\">  echo <\/span><span style=\"color: #89DDFF\">&quot;<\/span><span style=\"color: #C3E88D\">$<\/span><span style=\"color: #F78C6C\">{model}<\/span><span style=\"color: #C3E88D\"> - $<\/span><span style=\"color: #F78C6C\">{cl}<\/span><span style=\"color: #89DDFF\">&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #EEFFFF\">done<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Here&#8217;s what that output looks like on a real system with a variety of models installed:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th class=\"has-text-align-left\" data-align=\"left\">Model<\/th><th class=\"has-text-align-right\" data-align=\"right\">Context Length (tokens)<\/th><th class=\"has-text-align-center\" data-align=\"center\">Hermes Agent Compatible?<\/th><\/tr><\/thead><tbody><tr><td>qwen3:8b<\/td><td class=\"has-text-align-right\" data-align=\"right\">40,960<\/td><td class=\"has-text-align-center\" data-align=\"center\">\u2717 Too small<\/td><\/tr><tr><td>glm-5.1:cloud<\/td><td class=\"has-text-align-right\" data-align=\"right\">202,752<\/td><td class=\"has-text-align-center\" data-align=\"center\">\u2713 Compatible<\/td><\/tr><tr><td>qwen3.6:latest<\/td><td class=\"has-text-align-right\" data-align=\"right\">262,144<\/td><td class=\"has-text-align-center\" data-align=\"center\">\u2713 Compatible<\/td><\/tr><tr><td>gemma4:latest<\/td><td class=\"has-text-align-right\" data-align=\"right\">131,072<\/td><td class=\"has-text-align-center\" data-align=\"center\">\u2713 Compatible<\/td><\/tr><tr><td>deepseek-coder:33b<\/td><td class=\"has-text-align-right\" data-align=\"right\">16,384<\/td><td class=\"has-text-align-center\" data-align=\"center\">\u2717 Too small<\/td><\/tr><tr><td>deepseek-coder:6.7b<\/td><td class=\"has-text-align-right\" data-align=\"right\">16,384<\/td><td class=\"has-text-align-center\" data-align=\"center\">\u2717 Too small<\/td><\/tr><tr><td>deepseek-v3.2:cloud<\/td><td class=\"has-text-align-right\" data-align=\"right\">163,840<\/td><td class=\"has-text-align-center\" data-align=\"center\">\u2713 Compatible<\/td><\/tr><tr><td>qwen3.5:latest<\/td><td class=\"has-text-align-right\" data-align=\"right\">262,144<\/td><td class=\"has-text-align-center\" data-align=\"center\">\u2713 Compatible<\/td><\/tr><tr><td>llama3:latest<\/td><td class=\"has-text-align-right\" data-align=\"right\">8,192<\/td><td class=\"has-text-align-center\" data-align=\"center\">\u2717 Too small<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Recommended Models for Hermes Agent<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Based on the output above, the following locally available models meet the 64K minimum:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>qwen3.5:latest<\/strong> and <strong>qwen3.6:latest<\/strong> \u2014 both offer a massive 262,144-token context, making them excellent choices for long agentic sessions.<\/li>\n\n\n\n<li><strong>gemma4:latest<\/strong> \u2014 131,072 tokens, a solid mid-range option with strong general capabilities.<\/li>\n\n\n\n<li><strong>deepseek-v3.2:cloud<\/strong> and <strong>glm-5.1:cloud<\/strong> \u2014 cloud-backed models with large context windows, though they require an internet connection.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">The Override Option<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">If you really need to use a model with a smaller context window (perhaps because of hardware constraints), Hermes Agent lets you bypass the check. In your <code>config.yaml<\/code>, set:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#EEFFFF;--cbp-line-number-width:calc(1 * 0.6 * .875rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#304047;color:#d5ffff\">YAML<\/span><span role=\"button\" tabindex=\"0\" style=\"color:#EEFFFF;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>model:\n  context_length: 65536<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki material-theme\" style=\"background-color: #263238\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #F07178\">model<\/span><span style=\"color: #89DDFF\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #EEFFFF\">  <\/span><span style=\"color: #F07178\">context_length<\/span><span style=\"color: #89DDFF\">:<\/span><span style=\"color: #EEFFFF\"> <\/span><span style=\"color: #F78C6C\">65536<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Be aware that this is an override at your own risk \u2014 the agent may behave unpredictably or fail mid-task if the model runs out of context during a long chain of actions.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Tipp: Get locallly installed ollama models using your browser<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">If Ollama is running locally, you can view all installed models in JSON format by opening this URL in any browser:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"http:\/\/127.0.0.1:11434\/v1\/models\">http:\/\/127.0.0.1:11434\/v1\/models<\/a><br><br>This returns a simple JSON list of your installed models. It&#8217;s a convenient way to see what&#8217;s available at a glance, but it does <strong>not<\/strong> include details like context window size.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Summary<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Hermes Agent&#8217;s 64K context requirement isn&#8217;t arbitrary \u2014 agentic workflows accumulate a lot of state quickly. Choosing the right model upfront saves a lot of debugging later. The one-liner above is a quick way to audit your Ollama library whenever you install new models. Keep it handy.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you&#8217;ve ever tried to run Hermes Agent only to get a cryptic error about context windows, you&#8217;re not alone. Here&#8217;s a quick guide to understand what&#8217;s happening \u2014 and how to find a compatible model in Ollama. The Error When you launch Hermes Agent with an incompatible model, you&#8217;ll see something like this: Model deepseek-coder:33b has a context window of 16,384 tokens, which is below the minimum 64,000 required by Hermes Agent. Choose a model with at least 64K context, or set model.context_length in config.yaml to override. Hermes Agent is designed to handle long, multi-step reasoning and tool-use chains. For that to work reliably, it needs a model with a context window of at least 64,000 tokens. Models with smaller windows simply can&#8217;t hold enough conversation history and tool output in memory \u2014 so the agent stops before it can do any useful work. What Is a Context Window? The context window (also called context length) is the maximum amount of text a language model can &#8220;see&#8221; and work with at any one time \u2014 measured in tokens. A token is roughly \u00be of a word, so 64,000 tokens is approximately 48,000 words, or a short novel. Think of it as the model&#8217;s working memory. Everything relevant to the current task \u2014 your instructions, the conversation history, tool outputs, retrieved documents, and the model&#8217;s own previous responses \u2014 must fit inside this window. Once the limit is reached, older content gets pushed out and the model effectively &#8220;forgets&#8221; it. For simple one-off questions this rarely matters. But for an AI agent like Hermes, which plans multi-step tasks, calls tools, reads results, and reasons across long chains of actions, the context window fills up fast. A window that&#8217;s too small means the agent loses track of what it was doing \u2014 or stops entirely, as you&#8217;ve seen. see https:\/\/docs.ollama.com\/context-length How to Check Context Window Sizes in Ollama Not all Ollama models advertise their context size prominently. Here&#8217;s a handy one-liner you can run in your terminal to list every local model alongside its context length: Here&#8217;s what that output looks like on a real system with a variety of models installed: Model Context Length (tokens) Hermes Agent Compatible? qwen3:8b 40,960 \u2717 Too small glm-5.1:cloud 202,752 \u2713 Compatible qwen3.6:latest 262,144 \u2713 Compatible gemma4:latest 131,072 \u2713 Compatible deepseek-coder:33b 16,384 \u2717 Too small deepseek-coder:6.7b 16,384 \u2717 Too small deepseek-v3.2:cloud 163,840 \u2713 Compatible qwen3.5:latest 262,144 \u2713 Compatible llama3:latest 8,192 \u2717 Too small Recommended Models for Hermes Agent Based on the output above, the following locally available models meet the 64K minimum: The Override Option If you really need to use a model with a smaller context window (perhaps because of hardware constraints), Hermes Agent lets you bypass the check. In your config.yaml, set: Be aware that this is an override at your own risk \u2014 the agent may behave unpredictably or fail mid-task if the model runs out of context during a long chain of actions. Tipp: Get locallly installed ollama models using your browser If Ollama is running locally, you can view all installed models in JSON format by opening this URL in any browser: http:\/\/127.0.0.1:11434\/v1\/models This returns a simple JSON list of your installed models. It&#8217;s a convenient way to see what&#8217;s available at a glance, but it does not include details like context window size. Summary Hermes Agent&#8217;s 64K context requirement isn&#8217;t arbitrary \u2014 agentic workflows accumulate a lot of state quickly. Choosing the right model upfront saves a lot of debugging later. The one-liner above is a quick way to audit your Ollama library whenever you install new models. Keep it handy.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[150,143],"tags":[147,152,149],"class_list":["post-1183","post","type-post","status-publish","format-standard","hentry","category-ai-agents","category-ollama","tag-ai","tag-ollama","tag-wsl"],"_links":{"self":[{"href":"https:\/\/techno.slomka.biz\/index.php?rest_route=\/wp\/v2\/posts\/1183","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techno.slomka.biz\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techno.slomka.biz\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techno.slomka.biz\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techno.slomka.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1183"}],"version-history":[{"count":5,"href":"https:\/\/techno.slomka.biz\/index.php?rest_route=\/wp\/v2\/posts\/1183\/revisions"}],"predecessor-version":[{"id":1198,"href":"https:\/\/techno.slomka.biz\/index.php?rest_route=\/wp\/v2\/posts\/1183\/revisions\/1198"}],"wp:attachment":[{"href":"https:\/\/techno.slomka.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1183"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techno.slomka.biz\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1183"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techno.slomka.biz\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1183"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}