/ollama_docs/html/capabilities__vision.html

Documentation Index

Fetch the complete documentation index at: /ollama_docs/llms.txt

Use this file to discover all available pages before exploring further.

Vision models accept images alongside text so the model can describe, classify, and answer questions about what it sees.

Quick start

ollama run gemma3 ./image.png whats in this image?

Usage with Ollama’s API

Provide an images array. SDKs accept file paths, URLs or raw bytes while the REST API expects base64-encoded image data.
# 1. Download a sample image
curl -L -o test.jpg "https://upload.wikimedia.org/wikipedia/commons/3/3a/Cat03.jpg"

# 2. Encode the image
IMG=$(base64 < test.jpg | tr -d '\n')

# 3. Send it to Ollama
curl -X POST http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{
    "model": "gemma3",
    "messages": [{
    "role": "user",
    "content": "What is in this image?",
    "images": ["'"$IMG"'"]
    }],
    "stream": false
}'