YouRouter supports multimodal model calls in addition to plain text chat. Use this page when your integration needs images, visual understanding, provider-native multimodal formats, or video generation tasks.
Which API Should I Use?
| Use case | Recommended API | Notes |
|---|
| Text chat | POST /v1/chat/completions | Standard OpenAI-compatible model call. |
| Image understanding | POST /v1/chat/completions | Send text plus image_url content blocks. |
| Gemini native multimodal | POST /v1/projects/...:generateContent | Use when you need Google’s native contents / parts format. |
| Claude native messages | POST /v1/messages | Use when you need Anthropic’s native Messages format. |
| Text-to-video / image-to-video | POST /api/v3/contents/generations/tasks | Task-based generation flow. Create a task, then poll for the result. |
For most chat and vision integrations, start with https://api.yourouter.ai/v1 and the OpenAI-compatible Chat Completions format.
Use messages[].content as an array of content blocks. Include one text block and one or more image_url blocks.
curl https://api.yourouter.ai/v1/chat/completions \
-H "Authorization: Bearer $YOUROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}
]
}'
Read the generated answer from:
choices[0].message.content
For private images, send a data URL.
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,<BASE64_IMAGE>"
}
}
Keep payload size reasonable. For large images, use a temporary HTTPS URL when possible.
Python Example
import base64
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["YOUROUTER_API_KEY"],
base_url="https://api.yourouter.ai/v1",
)
with open("image.jpg", "rb") as image_file:
encoded = base64.b64encode(image_file.read()).decode("utf-8")
completion = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What is shown in this image?"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{encoded}"
},
},
],
}
],
)
print(completion.choices[0].message.content)
Gemini Native Multimodal
Use Google’s generateContent format when your integration depends on Gemini-native request fields.
curl https://api.yourouter.ai/v1/projects/cognition/locations/us/publishers/google/models/gemini-2.5-flash:generateContent \
-H "Authorization: Bearer $YOUROUTER_API_KEY" \
-H "Content-Type: application/json" \
-H "vendor: google" \
-d '{
"contents": [
{
"role": "user",
"parts": [
{ "text": "Describe this image in one sentence." },
{
"inlineData": {
"mimeType": "image/jpeg",
"data": "<BASE64_IMAGE>"
}
}
]
}
]
}'
See Google Generate Content for the reference page.
Claude Native Messages
Use Anthropic’s Messages format when you need Claude-native request behavior.
curl https://api.yourouter.ai/v1/messages \
-H "Authorization: Bearer $YOUROUTER_API_KEY" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-H "vendor: anthropic" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 300,
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Describe this image in one sentence." },
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": "<BASE64_IMAGE>"
}
}
]
}
]
}'
See Anthropic Messages for the reference page.
Video Generation Tasks
Video generation is task-based: create a task, then poll until the task is finished.
curl -X POST https://api.yourouter.ai/api/v3/contents/generations/tasks \
-H "Authorization: Bearer $YOUROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "doubao-seedance-1-0-pro-250528",
"content": [
{
"type": "text",
"text": "A cinematic product shot with soft studio lighting --duration 5 --resolution 1080p"
}
]
}'
The create call returns a task id. Query that task until it is complete:
curl https://api.yourouter.ai/api/v3/contents/generations/tasks/{id} \
-H "Authorization: Bearer $YOUROUTER_API_KEY"
See Ark Text-to-Video for the full task flow.
Integration Tips
- Keep model IDs configurable so you can switch vision and multimodal models without code changes.
- Use
vendor only when you need a provider-specific format or behavior.
- Prefer HTTPS image URLs for large files; use base64 data URLs for private or local test images.
- Preserve request IDs from responses when troubleshooting provider-specific multimodal issues.