Chat Completions

YouRouter provides two primary ways to interact with chat models:

OpenAI-Compatible API: The recommended method for most use cases, providing a unified interface for all models.
Native Provider APIs: For advanced use cases requiring provider-specific features not exposed through the unified API.

For details on how to select providers and models, see the Router Guide.

OpenAI-Compatible API

This is the simplest and most flexible way to use YouRouter. It allows you to use the familiar OpenAI SDKs and switch between different models and providers with minimal code changes.

Basic Usage

The following example shows how to send a basic chat completion request. You can change the model and the vendor header to target different models and providers.

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.yourouter.ai/v1"
)

# Target OpenAI's gpt-4o model
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    extra_headers={"vendor": "openai"}
)

print(response.choices[0].message.content)

Advanced Features

Multi-turn Conversation

To maintain a continuous conversation, simply pass the entire history of the chat in the messages array.

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.yourouter.ai/v1"
)

messages = [
    {"role": "system", "content": "You are a witty assistant that tells jokes."},
    {"role": "user", "content": "Tell me a joke about computers."},
    {"role": "assistant", "content": "Why did the computer keep sneezing? It had a virus!"},
    {"role": "user", "content": "That was a good one. Tell me another."}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)

print(response.choices[0].message.content)

Streaming Responses

For real-time applications like chatbots, you can stream the response as it’s being generated. Set stream=True in your request.

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.yourouter.ai/v1"
)

stream = client.chat.completions.create(
    model="claude-3-haiku-20240307",
    messages=[{"role": "user", "content": "Write a short poem about the ocean."}],
    stream=True,
    extra_headers={"vendor": "anthropic"}
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Function Calling / Tool Use

You can enable models to use tools or call functions to interact with external systems. This is a multi-step process:

You send a request with a list of available tools.
The model responds with a request to call one or more of those tools.
You execute the tools in your code.
You send the tool results back to the model, which then generates a final, natural-language response.

import json
from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.yourouter.ai/v1"
)

# Step 1: Define a mock function and the tools for the model
def get_current_weather(location, unit="celsius"):
    """Get the current weather in a given location"""
    if "boston" in location.lower():
        return json.dumps({"location": "Boston", "temperature": "10", "unit": unit})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        },
    }
]

messages = [{"role": "user", "content": "What's the weather like in Boston, MA?"}]

print("--- Step 1: Sending request to the model with tool definitions ---")
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="auto"
)

response_message = response.choices[0].message
tool_calls = response_message.tool_calls

# Step 2: Check if the model wants to call a tool
if tool_calls:
    print("--- Step 2: Model wants to call a tool. ---")
    print(tool_calls)
    
    # Step 3: Execute the function and get results
    # Note: the JSON response may not always be valid; be sure to handle errors
    available_functions = {
        "get_current_weather": get_current_weather,
    }
    messages.append(response_message)  # extend conversation with assistant's reply

    # In a real application, you may want to handle multiple tool calls here
    for tool_call in tool_calls:
        function_name = tool_call.function.name
        function_to_call = available_functions[function_name]
        function_args = json.loads(tool_call.function.arguments)
        
        print(f"--- Step 3: Executing function '{function_name}' with args {function_args} ---")
        function_response = function_to_call(
            location=function_args.get("location"),
            unit=function_args.get("unit"),
        )
        print(f"--- Got result: {function_response} ---")
        
        messages.append(
            {
                "tool_call_id": tool_call.id,
                "role": "tool",
                "name": function_name,
                "content": function_response,
            }
        )  # extend conversation with function response

    # Step 4: Send the tool results back to the model
    print("--- Step 4: Sending tool results back to the model for final response ---")
    second_response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
    )
    
    final_response = second_response.choices[0].message.content
    print("--- Final Answer ---")
    print(final_response)

Vision (Multimodal Completions)

Many models support multimodal inputs, allowing you to include images in your requests. This is useful for tasks like image description, analysis, and visual Q&A. This feature is not exclusive to any single provider; models like gpt-4o, claude-3-5-sonnet-20240620, and gemini-1.5-pro-latest all have vision capabilities.

import base64
from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.yourouter.ai/v1"
)

# Helper function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "image.jpg"
base64_image = encode_image(image_path)

response = client.chat.completions.create(
    model="claude-3-5-sonnet-20240620",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What’s in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                    }
                }
            ]
        }
    ],
    max_tokens=300,
    extra_headers={"vendor": "anthropic"}
)

print(response.choices[0].message.content)

Parameters

Parameter	Type	Description	Default
`model`	string	ID of the model to use.	Required
`messages`	array	A list of messages comprising the conversation so far.	Required
`max_tokens`	integer	The maximum number of tokens to generate in the chat completion.	null
`temperature`	number	What sampling temperature to use, between 0 and 2.	1
`top_p`	number	An alternative to sampling with temperature, called nucleus sampling.	1
`n`	integer	How many chat completion choices to generate for each input message.	1
`stream`	boolean	If set, partial message deltas will be sent, like in ChatGPT.	false
`stop`	string or array	Up to 4 sequences where the API will stop generating further tokens.	null
`presence_penalty`	number	Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far.	0
`frequency_penalty`	number	Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far.	0
`logit_bias`	map	Modify the likelihood of specified tokens appearing in the completion.	null
`user`	string	A unique identifier representing your end-user, which can help to monitor and detect abuse.	null
`tool_choice`	string or object	Controls if and how the model uses tools.	`none`
`tools`	array	A list of tools the model may call.	null

Native Provider APIs

For advanced use cases that require parameters or features not available in the OpenAI-compatible API, you can make requests directly to the native provider endpoints. You must include the vendor header in these requests.

YouRouter forwards the entire request body (and all headers except Authorization) to the upstream provider. See the Request Forwarding guide for more details.

Gemini (Google)

Generate Content

Endpoint: POST /v1/projects/cognition/locations/us/publishers/google/models/{model}:generateContent

import requests
import json

url = "https://api.yourouter.ai/v1/projects/cognition/locations/us/publishers/google/models/gemini-1.5-pro-latest:generateContent"

headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
    "vendor": "google"
}

data = {
    "contents": [{
        "parts": [{"text": "Write a short story about a time-traveling historian."}]
    }]
}

response = requests.post(url, headers=headers, json=data)

print(json.dumps(response.json(), indent=2))

Safety Settings

You can configure content thresholds by including the safetySettings object in your request. Refer to the official Google AI documentation for a full list of categories and thresholds.

import requests
import json

url = "https://api.yourouter.ai/v1/projects/cognition/locations/us/publishers/google/models/gemini-pro:generateContent"

headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
    "vendor": "google"
}

data = {
    "contents": [{"parts": [{"text": "Tell me a potentially controversial joke."}]}],
    "safetySettings": [
        {
            "category": "HARM_CATEGORY_HATE_SPEECH",
            "threshold": "BLOCK_LOW_AND_ABOVE"
        },
        {
            "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
            "threshold": "BLOCK_MEDIUM_AND_ABOVE"
        }
    ]
}

response = requests.post(url, headers=headers, json=data)

print(json.dumps(response.json(), indent=2))

Claude (Anthropic)

Messages API

Endpoint: POST /v1/messages

import requests
import json

url = "https://api.yourouter.ai/v1/messages"

headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
    "anthropic-version": "2023-06-01",
    "vendor": "anthropic"
}

data = {
    "model": "claude-3-5-sonnet-20240620",
    "max_tokens": 1024,
    "messages": [
        {"role": "user", "content": "Explain the concept of neural networks to a 5-year-old."}
    ]
}

response = requests.post(url, headers=headers, json=data)

print(json.dumps(response.json(), indent=2))

Tool Use with Claude

You can equip Claude with a set of tools, and it will intelligently decide when to use them to answer a user’s request. This process involves a multi-step conversation where your code executes the tool and sends the result back to Claude. Here’s a complete example demonstrating the full tool-use lifecycle:

import requests
import json

# --- Step 1: Define a tool and send the initial request ---

# This is a mock function. In a real application, this would
# call a weather API.
def get_weather(location):
    if "san francisco" in location.lower():
        return json.dumps({"location": "San Francisco", "temperature": "15°C", "forecast": "Cloudy"})
    else:
        return json.dumps({"location": location, "temperature": "unknown"})

# Define the tool for the model
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather in a given location.",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                }
            },
            "required": ["location"]
        }
    }
]

# Initial user message
messages = [{"role": "user", "content": "What is the weather like in San Francisco?"}]

initial_data = {
    "model": "claude-3-opus-20240229",
    "max_tokens": 1024,
    "tools": tools,
    "messages": messages
}

print("--- Step 1: Sending request to Claude with tool definition ---")
response = requests.post(
    "https://api.yourouter.ai/v1/messages",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
        "anthropic-version": "2023-06-01",
        "vendor": "anthropic"
    },
    json=initial_data
)

response_data = response.json()
print(json.dumps(response_data, indent=2))

# --- Step 2: Check if the model wants to use a tool ---

if response_data.get("stop_reason") == "tool_use":
    tool_use_block = next(
        (block for block in response_data["content"] if block.get("type") == "tool_use"), None
    )

    if tool_use_block:
        tool_name = tool_use_block["name"]
        tool_input = tool_use_block["input"]
        tool_use_id = tool_use_block["id"]

        print(f"--- Step 2: Claude wants to use the '{tool_name}' tool with input: {tool_input} ---")

        # --- Step 3: Execute the tool and get the result ---

        if tool_name == "get_weather":
            tool_result = get_weather(tool_input.get("location", ""))
            print(f"--- Step 3: Executed local function '{tool_name}', got result: {tool_result} ---")

            # --- Step 4: Send the result back to Claude ---

            # Append the assistant's response and the tool result to the message history
            messages.append({"role": "assistant", "content": response_data["content"]})
            messages.append({
                "role": "user",
                "content": [
                    {
                        "type": "tool_result",
                        "tool_use_id": tool_use_id,
                        "content": tool_result,
                    }
                ],
            })

            print("--- Step 4: Sending tool result back to Claude ---")
            final_data = {
                "model": "claude-3-opus-20240229",
                "max_tokens": 1024,
                "tools": tools,
                "messages": messages
            }

            final_response = requests.post(
                "https://api.yourouter.ai/v1/messages",
                headers={
                    "Authorization": "Bearer YOUR_API_KEY",
                    "Content-Type": "application/json",
                    "anthropic-version": "2023-06-01",
                    "vendor": "anthropic"
                },
                json=final_data
            ).json()

            print("--- Step 5: Final response from Claude ---")
            print(json.dumps(final_response, indent=2))
            
            # Extract and print the final text response
            final_text = next(
                (block["text"] for block in final_response["content"] if block.get("type") == "text"),
                "No final text response found."
            )
            print("\nFinal Answer:\n", final_text)

Best Practices

Routing: For production applications, use the auto routing mode for high availability. For specific model versions or features, use manual routing. See the Router Guide for details.
Error Handling: Network issues and provider outages can occur. Implement robust error handling with retries and exponential backoff, especially for long-running tasks.
Streaming for UX: For any user-facing application, use streaming to provide a responsive, real-time experience.
System Prompts: A well-crafted system prompt is crucial for guiding the model’s behavior, tone, and personality. Test and refine your prompts thoroughly.
Token Management: Always be mindful of token limits for both the input context and the output generation. Monitor the usage data returned in the API response to track costs and avoid unexpected truncation.

Introduction

Guides

Features

Legal

OpenAI-Compatible API

Basic Usage

Advanced Features

Multi-turn Conversation

Streaming Responses

Function Calling / Tool Use

Vision (Multimodal Completions)

Parameters

Native Provider APIs

Gemini (Google)

Generate Content

Safety Settings

Claude (Anthropic)

Messages API

Tool Use with Claude

Best Practices

Introduction

Guides

Features

Legal

​OpenAI-Compatible API

​Basic Usage

​Advanced Features

​Multi-turn Conversation

​Streaming Responses

​Function Calling / Tool Use

​Vision (Multimodal Completions)

​Parameters

​Native Provider APIs

​Gemini (Google)

​Generate Content

​Safety Settings

​Claude (Anthropic)

​Messages API

​Tool Use with Claude

​Best Practices

OpenAI-Compatible API

Basic Usage

Advanced Features

Multi-turn Conversation

Streaming Responses

Function Calling / Tool Use

Vision (Multimodal Completions)

Parameters

Native Provider APIs

Gemini (Google)

Generate Content

Safety Settings

Claude (Anthropic)

Messages API

Tool Use with Claude

Best Practices