Running Ollama with NVIDIA GPU inside WSL (Ubuntu) – Step-by-Step Guide

Running large language models locally with GPU acceleration inside WSL2 is not only possible—it’s surprisingly efficient once properly configured. This guide walks through a working setup using Ubuntu, NVIDIA GPU passthrough, and Ollama.

🧩 Target Setup

Windows host with NVIDIA GPU
WSL2 (Ubuntu)
GPU passthrough via WSL
Ollama using GPU acceleration

1. Prepare Windows Host

Check Windows Version

Ensure you’re on a supported version:

PowerShell

winver

winver

Recommended:

Windows 11 (25H2+)

Enable WSL2

PowerShell

wsl --install
wsl --set-default-version 2

wsl --install
wsl --set-default-version 2

Install NVIDIA Driver (with WSL Support) on your Windows machine

Install a current NVIDIA driver that supports WSL CUDA.

Verify:

PowerShell

nvidia-smi

nvidia-smi

If this fails, stop here—GPU passthrough will not work.

2. Prepare Ubuntu (WSL)

Start WSL:

PowerShell

wsl

wsl

Update packages:

Bash

sudo apt update && sudo apt upgrade -y

sudo apt update && sudo apt upgrade -y

3. Verify GPU inside WSL

Bash

nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 4070 Ti SUPER (UUID: GPU-0122fdb1-cb26-cf9a-8c28-675c70ee828d)

nvidia-smi
Wed Mar 18 16:02:22 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.54                 Driver Version: 595.79         CUDA Version: 13.2     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 ...    On  |   00000000:01:00.0  On |                  N/A |
|  0%   38C    P8             17W /  285W |    1032MiB /  16376MiB |     11%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 4070 Ti SUPER (UUID: GPU-0122fdb1-cb26-cf9a-8c28-675c70ee828d)

nvidia-smi
Wed Mar 18 16:02:22 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.54                 Driver Version: 595.79         CUDA Version: 13.2     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 ...    On  |   00000000:01:00.0  On |                  N/A |
|  0%   38C    P8             17W /  285W |    1032MiB /  16376MiB |     11%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Expected:

Your GPU is listed
Driver info is visible

4. (Optional) Install CUDA Toolkit

Bash

sudo apt install -y nvidia-cuda-toolkit

sudo apt install -y nvidia-cuda-toolkit

Verify:

Bash

nvcc --version

nvcc --version

5. Install Ollama

⚠️ Required Dependency for Ollama

Before installing Ollama, install zstd (this is required and often missing):

Bash

sudo apt-get install -y zstd

sudo apt-get install -y zstd

Download and Install Ollama

Bash

curl -fsSL https://ollama.com/install.sh | sh

curl -fsSL https://ollama.com/install.sh | sh

6. Verify Ollama Installation

Bash

ollama --version

ollama --version

7. Run Your First Model

Bash

ollama run llama3

ollama run llama3

🔍 Verify GPU Usage

In a second terminal:

Bash

nvidia-smi
Wed Mar 18 16:08:13 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.54                 Driver Version: 595.79         CUDA Version: 13.2     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 ...    On  |   00000000:01:00.0  On |                  N/A |
|  0%   58C    P2            240W /  285W |    6331MiB /  16376MiB |     91%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

nvidia-smi
Wed Mar 18 16:08:13 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.54                 Driver Version: 595.79         CUDA Version: 13.2     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 ...    On  |   00000000:01:00.0  On |                  N/A |
|  0%   58C    P2            240W /  285W |    6331MiB /  16376MiB |     91%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

You should see:

A running ollama process
Increasing GPU usage (here 91%)
Increasing GPU memory usage (here 6GB compared to 1GB in the screnshot, when we called nvidia-smi for the first time)

⚙️ Troubleshooting

Ollama uses CPU instead of GPU

Try:

Bash

export OLLAMA_USE_GPU=1

export OLLAMA_USE_GPU=1

Model too large

Test with a smaller model:

Bash

ollama run phi

ollama run phi

GPU not visible in WSL

Check Windows driver again
Ensure WSL2 is used (wsl -l -v)

🧪 Test Ollama API

Bash

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Hello"
}'

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Hello"
}'

Get all running models:

Bash

curl --silent http://localhost:11434/api/ps | python3 -m json.tool

curl --silent http://localhost:11434/api/ps | python3 -m json.tool

You can access ollama from the windows system, where your wsl runs by using the api. Ollama in wsl automaticall binds to 0.0.0.0.
http://localhost:11434/api/ps

If this does not work, try to bind ollama to all interfaces:

Bash

export OLLAMA_HOST=0.0.0.0
ollama serve

export OLLAMA_HOST=0.0.0.0
ollama serve

🧩 Target Setup

1. Prepare Windows Host

Check Windows Version

Enable WSL2

Install NVIDIA Driver (with WSL Support) on your Windows machine

2. Prepare Ubuntu (WSL)

3. Verify GPU inside WSL

4. (Optional) Install CUDA Toolkit

5. Install Ollama

⚠️ Required Dependency for Ollama

Download and Install Ollama

6. Verify Ollama Installation

7. Run Your First Model

🔍 Verify GPU Usage

⚙️ Troubleshooting

Ollama uses CPU instead of GPU

Model too large

GPU not visible in WSL

🧪 Test Ollama API

Leave a Reply Cancel reply