Skip to main content
YouRouter provides two primary ways to interact with chat models:
  1. OpenAI-Compatible API: The recommended method for most use cases, providing a unified interface for all models.
  2. Native Provider APIs: For advanced use cases requiring provider-specific features not exposed through the unified API.
For details on how to select providers and models, see the Router Guide.

OpenAI-Compatible API

This is the simplest and most flexible way to use YouRouter. It allows you to use the familiar OpenAI SDKs and switch between different models and providers with minimal code changes.

Basic Usage

The following example shows how to send a basic chat completion request. You can change the model and the vendor header to target different models and providers.
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["YOUROUTER_API_KEY"],
    base_url="https://api.yourouter.ai/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    extra_headers={"vendor": "openai"}
)

print(response.choices[0].message.content)

Advanced Features

Multi-turn Conversation

To maintain a continuous conversation, simply pass the entire history of the chat in the messages array.
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["YOUROUTER_API_KEY"],
    base_url="https://api.yourouter.ai/v1"
)

messages = [
    {"role": "system", "content": "You are a witty assistant that tells jokes."},
    {"role": "user", "content": "Tell me a joke about computers."},
    {"role": "assistant", "content": "Why did the computer keep sneezing? It had a virus!"},
    {"role": "user", "content": "That was a good one. Tell me another."}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)

print(response.choices[0].message.content)

Streaming Responses

For real-time applications like chatbots, you can stream the response as it’s being generated. Set stream=True in your request.
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["YOUROUTER_API_KEY"],
    base_url="https://api.yourouter.ai/v1"
)

stream = client.chat.completions.create(
    model="claude-3-haiku-20240307",
    messages=[{"role": "user", "content": "Write a short poem about the ocean."}],
    stream=True,
    extra_headers={"vendor": "anthropic"}
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Function Calling / Tool Use

You can enable models to use tools or call functions to interact with external systems. This is a multi-step process:
  1. You send a request with a list of available tools.
  2. The model responds with a request to call one or more of those tools.
  3. You execute the tools in your code.
  4. You send the tool results back to the model, which then generates a final, natural-language response.
import json
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["YOUROUTER_API_KEY"],
    base_url="https://api.yourouter.ai/v1"
)

def get_current_weather(location, unit="celsius"):
    """Get the current weather in a given location"""
    if "boston" in location.lower():
        return json.dumps({"location": "Boston", "temperature": "10", "unit": unit})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    }
]

messages = [{"role": "user", "content": "What's the weather like in Boston, MA?"}]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="auto"
)

response_message = response.choices[0].message
tool_calls = response_message.tool_calls

if tool_calls:
    available_functions = {
        "get_current_weather": get_current_weather,
    }
    messages.append(response_message)

    for tool_call in tool_calls:
        function_name = tool_call.function.name
        function_to_call = available_functions[function_name]
        function_args = json.loads(tool_call.function.arguments)

        function_response = function_to_call(
            location=function_args.get("location"),
            unit=function_args.get("unit"),
        )

        messages.append(
            {
                "tool_call_id": tool_call.id,
                "role": "tool",
                "name": function_name,
                "content": function_response,
            }
        )

    second_response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
    )

    final_response = second_response.choices[0].message.content
    print(final_response)

Vision (Multimodal Completions)

Many models support multimodal inputs, allowing you to include images in your requests. This is useful for tasks like image description, analysis, and visual Q&A. This feature is not exclusive to any single provider; models like gpt-4o, claude-3-5-sonnet-20240620, and gemini-1.5-pro-latest all have vision capabilities.
import base64
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["YOUROUTER_API_KEY"],
    base_url="https://api.yourouter.ai/v1"
)

def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

image_path = "image.jpg"
base64_image = encode_image(image_path)

response = client.chat.completions.create(
    model="claude-3-5-sonnet-20240620",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What’s in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                    }
                }
            ]
        }
    ],
    max_tokens=300,
    extra_headers={"vendor": "anthropic"}
)

print(response.choices[0].message.content)

Parameters

ParameterTypeDescriptionDefault
modelstringID of the model to use.Required
messagesarrayA list of messages comprising the conversation so far.Required
max_tokensintegerThe maximum number of tokens to generate in the chat completion.null
temperaturenumberWhat sampling temperature to use, between 0 and 2.1
top_pnumberAn alternative to sampling with temperature, called nucleus sampling.1
nintegerHow many chat completion choices to generate for each input message.1
streambooleanIf set, partial message deltas will be sent, like in ChatGPT.false
stopstring or arrayUp to 4 sequences where the API will stop generating further tokens.null
presence_penaltynumberNumber between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far.0
frequency_penaltynumberNumber between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far.0
logit_biasmapModify the likelihood of specified tokens appearing in the completion.null
userstringA unique identifier representing your end-user, which can help to monitor and detect abuse.null
tool_choicestring or objectControls if and how the model uses tools.none
toolsarrayA list of tools the model may call.null

Native Provider APIs

For advanced use cases that require parameters or features not available in the OpenAI-compatible API, you can make requests directly to the native provider endpoints. You must include the vendor header in these requests.
YouRouter forwards the entire request body (and all headers except Authorization) to the upstream provider. See the Request Forwarding guide for more details.

Gemini (Google)

Generate Content

Endpoint: POST /v1/projects/cognition/locations/us/publishers/google/models/{model}:generateContent
import os
import requests
import json

url = "https://api.yourouter.ai/v1/projects/cognition/locations/us/publishers/google/models/gemini-1.5-pro-latest:generateContent"

headers = {
    "Authorization": f"Bearer {os.environ['YOUROUTER_API_KEY']}",
    "Content-Type": "application/json",
    "vendor": "google"
}

data = {
    "contents": [{
        "parts": [{"text": "Write a short story about a time-traveling historian."}]
    }]
}

response = requests.post(url, headers=headers, json=data)

print(json.dumps(response.json(), indent=2))

Safety Settings

You can configure content thresholds by including the safetySettings object in your request. Refer to the official Google AI documentation for a full list of categories and thresholds.
import os
import requests
import json

url = "https://api.yourouter.ai/v1/projects/cognition/locations/us/publishers/google/models/gemini-pro:generateContent"

headers = {
    "Authorization": f"Bearer {os.environ['YOUROUTER_API_KEY']}",
    "Content-Type": "application/json",
    "vendor": "google"
}

data = {
    "contents": [{"parts": [{"text": "Tell me a potentially controversial joke."}]}],
    "safetySettings": [
        {
            "category": "HARM_CATEGORY_HATE_SPEECH",
            "threshold": "BLOCK_LOW_AND_ABOVE"
        },
        {
            "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
            "threshold": "BLOCK_MEDIUM_AND_ABOVE"
        }
    ]
}

response = requests.post(url, headers=headers, json=data)

print(json.dumps(response.json(), indent=2))

Claude (Anthropic)

Messages API

Endpoint: POST /v1/messages
import os
import requests
import json

url = "https://api.yourouter.ai/v1/messages"

headers = {
    "Authorization": f"Bearer {os.environ['YOUROUTER_API_KEY']}",
    "Content-Type": "application/json",
    "anthropic-version": "2023-06-01",
    "vendor": "anthropic"
}

data = {
    "model": "claude-3-5-sonnet-20240620",
    "max_tokens": 1024,
    "messages": [
        {"role": "user", "content": "Explain the concept of neural networks to a 5-year-old."}
    ]
}

response = requests.post(url, headers=headers, json=data)

print(json.dumps(response.json(), indent=2))

Tool Use with Claude

You can equip Claude with a set of tools, and it will intelligently decide when to use them to answer a user’s request. This process involves a multi-step conversation where your code executes the tool and sends the result back to Claude. Here’s a complete example demonstrating the full tool-use lifecycle:
import os
import requests
import json

def get_weather(location):
    if "san francisco" in location.lower():
        return json.dumps({"location": "San Francisco", "temperature": "15°C", "forecast": "Cloudy"})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather in a given location.",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                }
            },
            "required": ["location"]
        }
    }
]

messages = [{"role": "user", "content": "What is the weather like in San Francisco?"}]

initial_data = {
    "model": "claude-3-opus-20240229",
    "max_tokens": 1024,
    "tools": tools,
    "messages": messages
}

response = requests.post(
    "https://api.yourouter.ai/v1/messages",
    headers={
        "Authorization": f"Bearer {os.environ['YOUROUTER_API_KEY']}",
        "Content-Type": "application/json",
        "anthropic-version": "2023-06-01",
        "vendor": "anthropic"
    },
    json=initial_data
)

response_data = response.json()

if response_data.get("stop_reason") == "tool_use":
    tool_use_block = next(
        (block for block in response_data["content"] if block.get("type") == "tool_use"), None
    )

    if tool_use_block:
        tool_name = tool_use_block["name"]
        tool_input = tool_use_block["input"]
        tool_use_id = tool_use_block["id"]

        if tool_name == "get_weather":
            tool_result = get_weather(tool_input.get("location", ""))

            messages.append({"role": "assistant", "content": response_data["content"]})
            messages.append({
                "role": "user",
                "content": [
                    {
                        "type": "tool_result",
                        "tool_use_id": tool_use_id,
                        "content": tool_result,
                    }
                ],
            })

            final_data = {
                "model": "claude-3-opus-20240229",
                "max_tokens": 1024,
                "tools": tools,
                "messages": messages
            }

            final_response = requests.post(
                "https://api.yourouter.ai/v1/messages",
                headers={
                    "Authorization": f"Bearer {os.environ['YOUROUTER_API_KEY']}",
                    "Content-Type": "application/json",
                    "anthropic-version": "2023-06-01",
                    "vendor": "anthropic"
                },
                json=final_data
            ).json()

            final_text = next(
                (block["text"] for block in final_response["content"] if block.get("type") == "text"),
                "No final text response found."
            )
            print(final_text)

Best Practices

  • Routing: For production applications, use the auto routing mode for high availability. For specific model versions or features, use manual routing. See the Router Guide for details.
  • Error Handling: Network issues and provider outages can occur. Implement robust error handling with retries and exponential backoff, especially for long-running tasks.
  • Streaming for UX: For any user-facing application, use streaming to provide a responsive, real-time experience.
  • System Prompts: A well-crafted system prompt is crucial for guiding the model’s behavior, tone, and personality. Test and refine your prompts thoroughly.
  • Token Management: Always be mindful of token limits for both the input context and the output generation. Monitor the usage data returned in the API response to track costs and avoid unexpected truncation.