멀티모달

순수 텍스트 대화 외에도 YouRouter는 멀티모달 모델 호출을 지원합니다. 이미지, 시각 이해, 제공업체 네이티브 멀티모달 형식 또는 영상 생성 작업이 필요할 때 이 페이지를 사용하세요.

어떤 API를 써야 할까요?

사용 사례	권장 API	설명
텍스트 대화	`POST /v1/chat/completions`	표준 OpenAI 호환 호출.
이미지 이해	`POST /v1/chat/completions`	텍스트와 `image_url` 콘텐츠 블록을 함께 전송.
PDF / 문서 이해	`POST /v1/chat/completions`, `POST /v1/projects/...:generateContent` 또는 `POST /v1/messages`	대상 모델과 상위 제공업체 형식에 따라 다릅니다. OpenAI 호환 호출은 `file` 콘텐츠 블록을 사용할 수 있고, Gemini와 Claude는 provider-native 문서 형식을 사용할 수 있습니다.
Gemini 네이티브 멀티모달	`POST /v1/projects/...:generateContent`	Google 네이티브 `contents` / `parts` 구조가 필요할 때.
Claude 네이티브 messages	`POST /v1/messages`	Anthropic 네이티브 Messages 형식이 필요할 때.
텍스트-투-비디오 / 이미지-투-비디오	`POST /api/v3/contents/generations/tasks`	작업 기반 생성: 작업 생성 후 결과를 폴링.

대부분의 채팅 및 시각 연동에는 https://api.yourouter.ai/v1과 OpenAI 호환 Chat Completions 형식부터 시작하는 것을 권장합니다.

Chat Completions로 이미지 입력 보내기

messages[].content를 콘텐츠 블록 배열로 설정합니다. 일반적으로 text 블록 하나와 하나 이상의 image_url 블록을 포함합니다.

curl https://api.yourouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $YOUROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Describe this image in one sentence."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/image.jpg"
            }
          }
        ]
      }
    ]
  }'

생성 결과는 보통 다음 위치에 있습니다.

choices[0].message.content

Base64 이미지 입력

비공개 이미지는 data URL을 직접 보낼 수 있습니다.

{
  "type": "image_url",
  "image_url": {
    "url": "data:image/png;base64,<BASE64_IMAGE>"
  }
}

페이로드 크기는 가능한 한 작게 유지하세요. 매우 큰 이미지는 임시 HTTPS URL 사용을 권장합니다.

Python 예시

import base64
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["YOUROUTER_API_KEY"],
    base_url="https://api.yourouter.ai/v1",
)

with open("image.jpg", "rb") as image_file:
    encoded = base64.b64encode(image_file.read()).decode("utf-8")

completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is shown in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{encoded}"
                    },
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)

PDF / 문서 입력

PDF 지원 여부는 대상 모델과 상위 제공업체에 따라 다릅니다. 요청을 보내기 전에 선택한 모델이 문서 또는 시각 이해를 지원하는지 확인하세요. PDF를 지원하지 않는 모델은 상위 제공업체 오류를 반환합니다. OpenAI 호환 파일 콘텐츠 블록을 지원하는 모델에서는 messages[].content에 file 블록을 추가할 수 있습니다. file_data 값은 PDF 원본 바이트를 base64로 인코딩한 값이며, data:application/pdf;base64, 접두사는 붙이지 않습니다.

curl https://api.yourouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $YOUROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "file",
            "file": {
              "filename": "document.pdf",
              "file_data": "<BASE64_PDF>"
            }
          },
          {
            "type": "text",
            "text": "Summarize the key points in this PDF."
          }
        ]
      }
    ]
  }'

특정 상위 제공업체의 문서 기능이 필요한 경우 provider-native 형식을 사용하세요.

Gemini: inlineData.mimeType: "application/pdf"를 사용합니다. Google Generate Content를 참고하세요.
Claude: media_type: "application/pdf"가 포함된 document 콘텐츠 블록을 사용합니다. Anthropic Messages를 참고하세요.

PDF는 보통 더 많은 컨텍스트 창과 요청 본문 크기를 사용합니다. 프로덕션에서는 파일 크기와 페이지 수를 제한하세요. 큰 파일은 분할하거나, 대상 상위 제공업체의 파일 API가 연동에서 사용 가능한지 확인한 뒤 업로드 기반 흐름을 사용하세요.

Gemini 네이티브 멀티모달

연동이 Gemini 네이티브 필드에 의존하는 경우 Google의 generateContent 형식을 사용합니다.

curl https://api.yourouter.ai/v1/projects/cognition/locations/us/publishers/google/models/gemini-2.5-flash:generateContent \
  -H "Authorization: Bearer $YOUROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -H "vendor: google" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [
          { "text": "Describe this image in one sentence." },
          {
            "inlineData": {
              "mimeType": "image/jpeg",
              "data": "<BASE64_IMAGE>"
            }
          }
        ]
      }
    ]
  }'

레퍼런스 페이지는 Google Generate Content를 참고하세요.

Claude 네이티브 Messages

Claude 네이티브 요청 동작이 필요하면 Anthropic Messages 형식을 사용합니다.

curl https://api.yourouter.ai/v1/messages \
  -H "Authorization: Bearer $YOUROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -H "vendor: anthropic" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 300,
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "Describe this image in one sentence." },
          {
            "type": "image",
            "source": {
              "type": "base64",
              "media_type": "image/jpeg",
              "data": "<BASE64_IMAGE>"
            }
          }
        ]
      }
    ]
  }'

레퍼런스 페이지는 Anthropic Messages를 참고하세요.

영상 생성 작업

영상 생성은 작업 기반입니다. 작업을 만든 뒤 완료될 때까지 폴링합니다.

curl -X POST https://api.yourouter.ai/api/v3/contents/generations/tasks \
  -H "Authorization: Bearer $YOUROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "doubao-seedance-1-0-pro-250528",
    "content": [
      {
        "type": "text",
        "text": "A cinematic product shot with soft studio lighting --duration 5 --resolution 1080p"
      }
    ]
  }'

생성 호출은 작업 id를 반환합니다. 이후 해당 id로 상태를 조회합니다.

curl https://api.yourouter.ai/api/v3/contents/generations/tasks/{id} \
  -H "Authorization: Bearer $YOUROUTER_API_KEY"

전체 작업 흐름은 Ark 텍스트-투-비디오를 참고하세요.

연동 팁

모델 ID는 설정으로 관리해 시각·멀티모달 모델 전환 시 코드 변경을 최소화하세요.
제공업체 특유의 형식이나 동작이 분명히 필요할 때만 vendor를 사용하세요.
큰 파일은 HTTPS URL 또는 분할 처리를 우선하고, 비공개/로컬 테스트 이미지와 작은 PDF에는 base64가 적합합니다.
PDF 지원은 모델과 상위 제공업체 기능에 따라 달라집니다. 배포 전 대상 모델로 실제 파일 테스트를 수행하세요.
제공업체별 멀티모달 문제를 해결할 때는 응답의 요청 ID를 보존하세요.

시작하기

모델 API

라우팅 및 안정성

API 레퍼런스

기능

법무

어떤 API를 써야 할까요?

Chat Completions로 이미지 입력 보내기

Base64 이미지 입력

Python 예시

PDF / 문서 입력

Gemini 네이티브 멀티모달

Claude 네이티브 Messages

영상 생성 작업

연동 팁

​어떤 API를 써야 할까요?

​Chat Completions로 이미지 입력 보내기

​Base64 이미지 입력

​Python 예시

​PDF / 문서 입력

​Gemini 네이티브 멀티모달

​Claude 네이티브 Messages

​영상 생성 작업

​연동 팁

어떤 API를 써야 할까요?

Chat Completions로 이미지 입력 보내기

Base64 이미지 입력

Python 예시

PDF / 문서 입력

Gemini 네이티브 멀티모달

Claude 네이티브 Messages

영상 생성 작업

연동 팁