Ollama Python Library

The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with Ollama.

Prerequisites

Ollama should be installed and running
Pull a model to use with the library: ollama pull <model> e.g. ollama pull gemma3
- See Ollama.com for more information on the models available.

Install

pip install ollama

Usage

from ollama import chat
from ollama import ChatResponse

response: ChatResponse = chat(model='gemma3', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content'])
# or access fields directly from the response object
print(response.message.content)

See _types.py for more information on the response types.

Streaming responses

Response streaming can be enabled by setting stream=True.

from ollama import chat

stream = chat(
    model='gemma3',
    messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
    stream=True,
)

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)

Cloud Models

Run larger models by offloading to Ollama’s cloud while keeping your local workflow.

Supported models: deepseek-v3.1:671b-cloud, gpt-oss:20b-cloud, gpt-oss:120b-cloud, kimi-k2:1t-cloud, qwen3-coder:480b-cloud, kimi-k2-thinking See Ollama Models - Cloud for more information

Run via local Ollama

Sign in (one-time):

ollama signin

Pull a cloud model:

ollama pull gpt-oss:120b-cloud

Make a request:

from ollama import Client

client = Client()

messages = [
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
]

for part in client.chat('gpt-oss:120b-cloud', messages=messages, stream=True):
  print(part.message.content, end='', flush=True)

Cloud API (ollama.com)

Access cloud models directly by pointing the client at https://ollama.com.

Create an API key from ollama.com , then set:

export OLLAMA_API_KEY=your_api_key

(Optional) List models available via the API:

curl https://ollama.com/api/tags

Generate a response via the cloud API:

import os
from ollama import Client

client = Client(
    host='https://ollama.com',
    headers={'Authorization': 'Bearer ' + os.environ.get('OLLAMA_API_KEY')}
)

messages = [
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
]

for part in client.chat('gpt-oss:120b', messages=messages, stream=True):
  print(part.message.content, end='', flush=True)

Custom client

A custom client can be created by instantiating Client or AsyncClient from ollama.

All extra keyword arguments are passed into the httpx.Client.

from ollama import Client
client = Client(
  host='http://localhost:11434',
  headers={'x-some-header': 'some-value'}
)
response = client.chat(model='gemma3', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])

Async client

The AsyncClient class is used to make asynchronous requests. It can be configured with the same fields as the Client class.

import asyncio
from ollama import AsyncClient

async def chat():
  message = {'role': 'user', 'content': 'Why is the sky blue?'}
  response = await AsyncClient().chat(model='gemma3', messages=[message])

asyncio.run(chat())

Setting stream=True modifies functions to return a Python asynchronous generator:

import asyncio
from ollama import AsyncClient

async def chat():
  message = {'role': 'user', 'content': 'Why is the sky blue?'}
  async for part in await AsyncClient().chat(model='gemma3', messages=[message], stream=True):
    print(part['message']['content'], end='', flush=True)

asyncio.run(chat())

API

The Ollama Python library's API is designed around the Ollama REST API

Chat

ollama.chat(model='gemma3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])

Generate

ollama.generate(model='gemma3', prompt='Why is the sky blue?')

List

ollama.list()

Show

ollama.show('gemma3')

Create

ollama.create(model='example', from_='gemma3', system="You are Mario from Super Mario Bros.")

Copy

ollama.copy('gemma3', 'user/gemma3')

Delete

ollama.delete('gemma3')

Pull

ollama.pull('gemma3')

Push

ollama.push('user/gemma3')

Embed

ollama.embed(model='gemma3', input='The sky is blue because of rayleigh scattering')

Embed (batch)

ollama.embed(model='gemma3', input=['The sky is blue because of rayleigh scattering', 'Grass is green because of chlorophyll'])

Ps

ollama.ps()

Errors

Errors are raised if requests return an error status or if an error is detected while streaming.

model = 'does-not-yet-exist'

try:
  ollama.chat(model)
except ollama.ResponseError as e:
  print('Error:', e.error)
  if e.status_code == 404:
    ollama.pull(model)

Name		Name	Last commit message	Last commit date
Latest commit History 287 Commits
.github		.github
examples		examples
ollama		ollama
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ollama Python Library

Prerequisites

Install

Usage

Streaming responses

Cloud Models

Run via local Ollama

Cloud API (ollama.com)

Custom client

Async client

API

Chat

Generate

List

Show

Create

Copy

Delete

Pull

Push

Embed

Embed (batch)

Ps

Errors

About

Uh oh!

Releases 33

Used by 31.8k

Contributors 41

Languages

License

ollama/ollama-python

Folders and files

Latest commit

History

Repository files navigation

Ollama Python Library

Prerequisites

Install

Usage

Streaming responses

Cloud Models

Run via local Ollama

Cloud API (ollama.com)

Custom client

Async client

API

Chat

Generate

List

Show

Create

Copy

Delete

Pull

Push

Embed

Embed (batch)

Ps

Errors

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 33

Used by 31.8k

Contributors 41

Languages