跳转到主要内容
YouRouter 提供两种主要方式来使用聊天模型:
  1. OpenAI 兼容 API:适合绝大多数场景,使用统一接口调用不同模型。
  2. Provider 原生 API:适合需要统一接口未暴露的上游特有能力的高级场景。
关于如何选择 provider 和 model,请参考 路由指南

OpenAI 兼容 API

这是使用 YouRouter 最简单、最灵活的方式。你可以直接使用熟悉的 OpenAI SDK,并通过极少的代码变更在不同模型和 provider 之间切换。

基础用法

下面的示例演示了如何发送一个基础 chat completion 请求。你可以修改 modelvendor 请求头,以切换不同模型和上游提供商。
from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.yourouter.ai/v1"
)

# 指定 OpenAI 的 gpt-4o 模型
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    extra_headers={"vendor": "openai"}
)

print(response.choices[0].message.content)

高级功能

多轮对话

如果要保持连续对话,只需要把完整对话历史放入 messages 数组中。
from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.yourouter.ai/v1"
)

messages = [
    {"role": "system", "content": "You are a witty assistant that tells jokes."},
    {"role": "user", "content": "Tell me a joke about computers."},
    {"role": "assistant", "content": "Why did the computer keep sneezing? It had a virus!"},
    {"role": "user", "content": "That was a good one. Tell me another."}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)

print(response.choices[0].message.content)

流式响应

对于聊天机器人等实时交互场景,可以用流式输出边生成边返回。把 stream=True 传入请求即可。
from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.yourouter.ai/v1"
)

stream = client.chat.completions.create(
    model="claude-3-haiku-20240307",
    messages=[{"role": "user", "content": "Write a short poem about the ocean."}],
    stream=True,
    extra_headers={"vendor": "anthropic"}
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

函数调用 / 工具调用

你可以让模型调用工具或函数,与外部系统交互。这通常是一个多步骤流程:
  1. 发送包含工具定义的请求。
  2. 模型返回它想调用哪些工具。
  3. 你在代码中实际执行这些工具。
  4. 再把工具执行结果返回给模型,由模型生成最终自然语言回复。
import json
from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.yourouter.ai/v1"
)

def get_current_weather(location, unit="celsius"):
    """Get the current weather in a given location"""
    if "boston" in location.lower():
        return json.dumps({"location": "Boston", "temperature": "10", "unit": unit})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    }
]

messages = [{"role": "user", "content": "What's the weather like in Boston, MA?"}]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="auto"
)

response_message = response.choices[0].message
tool_calls = response_message.tool_calls

if tool_calls:
    available_functions = {
        "get_current_weather": get_current_weather,
    }
    messages.append(response_message)

    for tool_call in tool_calls:
        function_name = tool_call.function.name
        function_to_call = available_functions[function_name]
        function_args = json.loads(tool_call.function.arguments)

        function_response = function_to_call(
            location=function_args.get("location"),
            unit=function_args.get("unit"),
        )

        messages.append(
            {
                "tool_call_id": tool_call.id,
                "role": "tool",
                "name": function_name,
                "content": function_response,
            }
        )

    second_response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
    )

    final_response = second_response.choices[0].message.content
    print(final_response)

Vision(多模态补全)

很多模型支持多模态输入,可以在请求中直接附带图片。这适用于图片描述、图像分析和视觉问答等场景。gpt-4oclaude-3-5-sonnet-20240620gemini-1.5-pro-latest 等模型都支持视觉能力。
import base64
from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.yourouter.ai/v1"
)

def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

image_path = "image.jpg"
base64_image = encode_image(image_path)

response = client.chat.completions.create(
    model="claude-3-5-sonnet-20240620",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What’s in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                    }
                }
            ]
        }
    ],
    max_tokens=300,
    extra_headers={"vendor": "anthropic"}
)

print(response.choices[0].message.content)

参数

参数类型说明默认值
modelstring要使用的模型 ID。必填
messagesarray截至当前的对话消息数组。必填
max_tokensinteger生成回复时允许的最大 token 数。null
temperaturenumber采样温度,范围通常在 0 到 2 之间。1
top_pnumber另一种采样控制方式,即 nucleus sampling。1
ninteger为每条输入消息生成多少个 completion 结果。1
streamboolean若开启,则像 ChatGPT 一样以增量消息方式返回。false
stopstring or array最多 4 个停止序列,命中后停止继续生成。null
presence_penaltynumber根据 token 是否已出现过,调整新 token 的概率。0
frequency_penaltynumber根据 token 的出现频率,调整新 token 的概率。0
logit_biasmap调整指定 token 出现的概率。null
userstring代表终端用户的唯一标识,可用于监控和滥用检测。null
tool_choicestring or object控制模型是否以及如何调用工具。none
toolsarray模型可以调用的工具列表。null

Provider 原生 API

对于一些高级场景,如果你需要统一 OpenAI 兼容接口中没有暴露的 provider 特有字段或能力,可以直接请求 provider 的原生接口。这种情况下你必须带上 vendor 请求头。
YouRouter 会将整个请求体,以及除 Authorization 外的所有请求头,原样转发给上游。更多说明见 请求透传

Gemini(Google)

Generate Content

Endpoint: POST /v1/projects/cognition/locations/us/publishers/google/models/{model}:generateContent
import requests
import json

url = "https://api.yourouter.ai/v1/projects/cognition/locations/us/publishers/google/models/gemini-1.5-pro-latest:generateContent"

headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
    "vendor": "google"
}

data = {
    "contents": [{
        "parts": [{"text": "Write a short story about a time-traveling historian."}]
    }]
}

response = requests.post(url, headers=headers, json=data)

print(json.dumps(response.json(), indent=2))

Safety Settings

你可以通过在请求中加入 safetySettings 来配置内容安全阈值。完整的分类和阈值列表,请参考官方 Google AI 文档
import requests
import json

url = "https://api.yourouter.ai/v1/projects/cognition/locations/us/publishers/google/models/gemini-pro:generateContent"

headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
    "vendor": "google"
}

data = {
    "contents": [{"parts": [{"text": "Tell me a potentially controversial joke."}]}],
    "safetySettings": [
        {
            "category": "HARM_CATEGORY_HATE_SPEECH",
            "threshold": "BLOCK_LOW_AND_ABOVE"
        },
        {
            "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
            "threshold": "BLOCK_MEDIUM_AND_ABOVE"
        }
    ]
}

response = requests.post(url, headers=headers, json=data)

print(json.dumps(response.json(), indent=2))

Claude(Anthropic)

Messages API

Endpoint: POST /v1/messages
import requests
import json

url = "https://api.yourouter.ai/v1/messages"

headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
    "anthropic-version": "2023-06-01",
    "vendor": "anthropic"
}

data = {
    "model": "claude-3-5-sonnet-20240620",
    "max_tokens": 1024,
    "messages": [
        {"role": "user", "content": "Explain the concept of neural networks to a 5-year-old."}
    ]
}

response = requests.post(url, headers=headers, json=data)

print(json.dumps(response.json(), indent=2))

Claude 的工具调用

你可以给 Claude 提供一组工具,它会根据用户请求决定何时调用。整个流程依然是多步对话:你的代码负责执行工具,再把结果回传给 Claude。 下面是一个完整的工具调用生命周期示例:
import requests
import json

def get_weather(location):
    if "san francisco" in location.lower():
        return json.dumps({"location": "San Francisco", "temperature": "15°C", "forecast": "Cloudy"})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather in a given location.",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                }
            },
            "required": ["location"]
        }
    }
]

messages = [{"role": "user", "content": "What is the weather like in San Francisco?"}]

initial_data = {
    "model": "claude-3-opus-20240229",
    "max_tokens": 1024,
    "tools": tools,
    "messages": messages
}

response = requests.post(
    "https://api.yourouter.ai/v1/messages",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
        "anthropic-version": "2023-06-01",
        "vendor": "anthropic"
    },
    json=initial_data
)

response_data = response.json()

if response_data.get("stop_reason") == "tool_use":
    tool_use_block = next(
        (block for block in response_data["content"] if block.get("type") == "tool_use"), None
    )

    if tool_use_block:
        tool_name = tool_use_block["name"]
        tool_input = tool_use_block["input"]
        tool_use_id = tool_use_block["id"]

        if tool_name == "get_weather":
            tool_result = get_weather(tool_input.get("location", ""))

            messages.append({"role": "assistant", "content": response_data["content"]})
            messages.append({
                "role": "user",
                "content": [
                    {
                        "type": "tool_result",
                        "tool_use_id": tool_use_id,
                        "content": tool_result,
                    }
                ],
            })

            final_data = {
                "model": "claude-3-opus-20240229",
                "max_tokens": 1024,
                "tools": tools,
                "messages": messages
            }

            final_response = requests.post(
                "https://api.yourouter.ai/v1/messages",
                headers={
                    "Authorization": "Bearer YOUR_API_KEY",
                    "Content-Type": "application/json",
                    "anthropic-version": "2023-06-01",
                    "vendor": "anthropic"
                },
                json=final_data
            ).json()

            final_text = next(
                (block["text"] for block in final_response["content"] if block.get("type") == "text"),
                "No final text response found."
            )
            print(final_text)

最佳实践

  • 路由:生产环境建议使用 auto 获得更高可用性。只有在需要固定模型版本或特有能力时,才固定 provider。详见 路由指南
  • 错误处理:网络问题和 provider 故障都可能发生,建议实现可靠的错误处理与指数退避重试,尤其是长耗时任务。
  • 流式输出:凡是面向用户的交互型应用,都建议开启流式输出,提升实时性和体验。
  • 系统提示词:高质量的 system prompt 对模型行为、语气和风格影响很大,建议持续测试和优化。
  • Token 管理:始终关注输入上下文和输出生成的 token 限制,并利用响应中的 usage 信息跟踪成本与截断风险。