The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with Ollama.
- Ollama should be installed and running
- Pull a model to use with the library:
ollama pull <model>e.g.ollama pull gemma3- See Ollama.com for more information on the models available.
pip install ollamafrom ollama import chat
from ollama import ChatResponse
response: ChatResponse = chat(model='gemma3', messages=[
{
'role': 'user',
'content': 'Why is the sky blue?',
},
])
print(response['message']['content'])
# or access fields directly from the response object
print(response.message.content)See _types.py for more information on the response types.
Response streaming can be enabled by setting stream=True.
from ollama import chat
stream = chat(
model='gemma3',
messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
stream=True,
)
for chunk in stream:
print(chunk['message']['content'], end='', flush=True)Run larger models by offloading to Ollama’s cloud while keeping your local workflow.
- Supported models:
deepseek-v3.1:671b-cloud,gpt-oss:20b-cloud,gpt-oss:120b-cloud,kimi-k2:1t-cloud,qwen3-coder:480b-cloud,kimi-k2-thinkingSee Ollama Models - Cloud for more information
- Sign in (one-time):
ollama signin
- Pull a cloud model:
ollama pull gpt-oss:120b-cloud
- Make a request:
from ollama import Client
client = Client()
messages = [
{
'role': 'user',
'content': 'Why is the sky blue?',
},
]
for part in client.chat('gpt-oss:120b-cloud', messages=messages, stream=True):
print(part.message.content, end='', flush=True)Access cloud models directly by pointing the client at https://ollama.com.
- Create an API key from ollama.com , then set:
export OLLAMA_API_KEY=your_api_key
- (Optional) List models available via the API:
curl https://ollama.com/api/tags
- Generate a response via the cloud API:
import os
from ollama import Client
client = Client(
host='https://ollama.com',
headers={'Authorization': 'Bearer ' + os.environ.get('OLLAMA_API_KEY')}
)
messages = [
{
'role': 'user',
'content': 'Why is the sky blue?',
},
]
for part in client.chat('gpt-oss:120b', messages=messages, stream=True):
print(part.message.content, end='', flush=True)A custom client can be created by instantiating Client or AsyncClient from ollama.
All extra keyword arguments are passed into the httpx.Client.
from ollama import Client
client = Client(
host='http://localhost:11434',
headers={'x-some-header': 'some-value'}
)
response = client.chat(model='gemma3', messages=[
{
'role': 'user',
'content': 'Why is the sky blue?',
},
])The AsyncClient class is used to make asynchronous requests. It can be configured with the same fields as the Client class.
import asyncio
from ollama import AsyncClient
async def chat():
message = {'role': 'user', 'content': 'Why is the sky blue?'}
response = await AsyncClient().chat(model='gemma3', messages=[message])
asyncio.run(chat())Setting stream=True modifies functions to return a Python asynchronous generator:
import asyncio
from ollama import AsyncClient
async def chat():
message = {'role': 'user', 'content': 'Why is the sky blue?'}
async for part in await AsyncClient().chat(model='gemma3', messages=[message], stream=True):
print(part['message']['content'], end='', flush=True)
asyncio.run(chat())The Ollama Python library's API is designed around the Ollama REST API
ollama.chat(model='gemma3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])ollama.generate(model='gemma3', prompt='Why is the sky blue?')ollama.list()ollama.show('gemma3')ollama.create(model='example', from_='gemma3', system="You are Mario from Super Mario Bros.")ollama.copy('gemma3', 'user/gemma3')ollama.delete('gemma3')ollama.pull('gemma3')ollama.push('user/gemma3')ollama.embed(model='gemma3', input='The sky is blue because of rayleigh scattering')ollama.embed(model='gemma3', input=['The sky is blue because of rayleigh scattering', 'Grass is green because of chlorophyll'])ollama.ps()Errors are raised if requests return an error status or if an error is detected while streaming.
model = 'does-not-yet-exist'
try:
ollama.chat(model)
except ollama.ResponseError as e:
print('Error:', e.error)
if e.status_code == 404:
ollama.pull(model)