Multimodal

YouRouter supports multimodal model calls in addition to plain text chat. Use this page when your integration needs images, visual understanding, provider-native multimodal formats, or video generation tasks.

Which API Should I Use?

Use case	Recommended API	Notes
Text chat	`POST /v1/chat/completions`	Standard OpenAI-compatible model call.
Image understanding	`POST /v1/chat/completions`	Send text plus `image_url` content blocks.
PDF / document understanding	`POST /v1/chat/completions`, `POST /v1/projects/...:generateContent`, or `POST /v1/messages`	Depends on the target model and upstream format. OpenAI-compatible calls can use `file` content blocks; Gemini and Claude can use provider-native document formats.
Gemini native multimodal	`POST /v1/projects/...:generateContent`	Use when you need Google’s native `contents` / `parts` format.
Claude native messages	`POST /v1/messages`	Use when you need Anthropic’s native Messages format.
Text-to-video / image-to-video	`POST /api/v3/contents/generations/tasks`	Task-based generation flow. Create a task, then poll for the result.

For most chat and vision integrations, start with https://api.yourouter.ai/v1 and the OpenAI-compatible Chat Completions format.

Image Input with Chat Completions

Use messages[].content as an array of content blocks. Include one text block and one or more image_url blocks.

curl https://api.yourouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $YOUROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Describe this image in one sentence."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/image.jpg"
            }
          }
        ]
      }
    ]
  }'

Read the generated answer from:

choices[0].message.content

Base64 Image Input

For private images, send a data URL.

{
  "type": "image_url",
  "image_url": {
    "url": "data:image/png;base64,<BASE64_IMAGE>"
  }
}

Keep payload size reasonable. For large images, use a temporary HTTPS URL when possible.

Python Example

import base64
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["YOUROUTER_API_KEY"],
    base_url="https://api.yourouter.ai/v1",
)

with open("image.jpg", "rb") as image_file:
    encoded = base64.b64encode(image_file.read()).decode("utf-8")

completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is shown in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{encoded}"
                    },
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)

PDF / Document Input

PDF support depends on the target model and upstream provider. Before sending a request, confirm that the model you selected supports document or vision understanding. Models that do not support PDFs will return an upstream error. For models that support OpenAI-compatible file content blocks, add a file block to messages[].content. The file_data value is the base64-encoded raw PDF bytes; do not include a data:application/pdf;base64, prefix.

curl https://api.yourouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $YOUROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "file",
            "file": {
              "filename": "document.pdf",
              "file_data": "<BASE64_PDF>"
            }
          },
          {
            "type": "text",
            "text": "Summarize the key points in this PDF."
          }
        ]
      }
    ]
  }'

If your use case depends on a specific upstream provider’s document features, use the provider-native format:

Gemini: use inlineData.mimeType: "application/pdf"; see Google Generate Content.
Claude: use a document content block with media_type: "application/pdf"; see Anthropic Messages.

PDFs usually consume more context window and request body size. In production, limit file size and page count. Split large files, or use an upload-based flow only after confirming that the target upstream file API is available for your integration.

Gemini Native Multimodal

Use Google’s generateContent format when your integration depends on Gemini-native request fields.

curl https://api.yourouter.ai/v1/projects/cognition/locations/us/publishers/google/models/gemini-2.5-flash:generateContent \
  -H "Authorization: Bearer $YOUROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -H "vendor: google" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [
          { "text": "Describe this image in one sentence." },
          {
            "inlineData": {
              "mimeType": "image/jpeg",
              "data": "<BASE64_IMAGE>"
            }
          }
        ]
      }
    ]
  }'

See Google Generate Content for the reference page.

Claude Native Messages

Use Anthropic’s Messages format when you need Claude-native request behavior.

curl https://api.yourouter.ai/v1/messages \
  -H "Authorization: Bearer $YOUROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -H "vendor: anthropic" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 300,
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "Describe this image in one sentence." },
          {
            "type": "image",
            "source": {
              "type": "base64",
              "media_type": "image/jpeg",
              "data": "<BASE64_IMAGE>"
            }
          }
        ]
      }
    ]
  }'

See Anthropic Messages for the reference page.

Video Generation Tasks

Video generation is task-based: create a task, then poll until the task is finished.

curl -X POST https://api.yourouter.ai/api/v3/contents/generations/tasks \
  -H "Authorization: Bearer $YOUROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-1-0-pro-250528",
    "content": [
      {
        "type": "text",
        "text": "A cinematic product shot with soft studio lighting --duration 5 --resolution 1080p"
      }
    ]
  }'

The create call returns a task id. Query that task until it is complete:

curl https://api.yourouter.ai/api/v3/contents/generations/tasks/{id} \
  -H "Authorization: Bearer $YOUROUTER_API_KEY"

See Ark Text-to-Video for the full task flow.

Integration Tips

Keep model IDs configurable so you can switch vision and multimodal models without code changes.
Use vendor only when you need a provider-specific format or behavior.
Prefer HTTPS URLs or split processing for large files; use base64 for private/local test images and small PDFs.
PDF support changes by model and upstream capability. Run a real-file test with the target model before shipping.
Preserve request IDs from responses when troubleshooting provider-specific multimodal issues.

Getting Started

Model APIs

Routing & Reliability

API Reference

Features

Legal

Which API Should I Use?

Image Input with Chat Completions

Base64 Image Input

Python Example

PDF / Document Input

Gemini Native Multimodal

Claude Native Messages

Video Generation Tasks

Integration Tips

​Which API Should I Use?

​Image Input with Chat Completions

​Base64 Image Input

​Python Example

​PDF / Document Input

​Gemini Native Multimodal

​Claude Native Messages

​Video Generation Tasks

​Integration Tips

Which API Should I Use?

Image Input with Chat Completions

Base64 Image Input

Python Example

PDF / Document Input

Gemini Native Multimodal

Claude Native Messages

Video Generation Tasks

Integration Tips