- OpenAI-Compatible API: The recommended method for most use cases, providing a unified interface for all models.
- Native Provider APIs: For advanced use cases requiring provider-specific features not exposed through the unified API.
OpenAI-Compatible API
This is the simplest and most flexible way to use YouRouter. It allows you to use the familiar OpenAI SDKs and switch between different models and providers with minimal code changes.Basic Usage
The following example shows how to send a basic chat completion request. You can change themodel
and the vendor
header to target different models and providers.
Advanced Features
Multi-turn Conversation
To maintain a continuous conversation, simply pass the entire history of the chat in themessages
array.
Streaming Responses
For real-time applications like chatbots, you can stream the response as it’s being generated. Setstream=True
in your request.
Function Calling / Tool Use
You can enable models to use tools or call functions to interact with external systems. This is a multi-step process:- You send a request with a list of available tools.
- The model responds with a request to call one or more of those tools.
- You execute the tools in your code.
- You send the tool results back to the model, which then generates a final, natural-language response.
Vision (Multimodal Completions)
Many models support multimodal inputs, allowing you to include images in your requests. This is useful for tasks like image description, analysis, and visual Q&A. This feature is not exclusive to any single provider; models likegpt-4o
, claude-3-5-sonnet-20240620
, and gemini-1.5-pro-latest
all have vision capabilities.
Parameters
Parameter | Type | Description | Default |
---|---|---|---|
model | string | ID of the model to use. | Required |
messages | array | A list of messages comprising the conversation so far. | Required |
max_tokens | integer | The maximum number of tokens to generate in the chat completion. | null |
temperature | number | What sampling temperature to use, between 0 and 2. | 1 |
top_p | number | An alternative to sampling with temperature, called nucleus sampling. | 1 |
n | integer | How many chat completion choices to generate for each input message. | 1 |
stream | boolean | If set, partial message deltas will be sent, like in ChatGPT. | false |
stop | string or array | Up to 4 sequences where the API will stop generating further tokens. | null |
presence_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far. | 0 |
frequency_penalty | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far. | 0 |
logit_bias | map | Modify the likelihood of specified tokens appearing in the completion. | null |
user | string | A unique identifier representing your end-user, which can help to monitor and detect abuse. | null |
tool_choice | string or object | Controls if and how the model uses tools. | none |
tools | array | A list of tools the model may call. | null |
Native Provider APIs
For advanced use cases that require parameters or features not available in the OpenAI-compatible API, you can make requests directly to the native provider endpoints. You must include thevendor
header in these requests.
YouRouter forwards the entire request body (and all headers except
Authorization
) to the upstream provider. See the Request Forwarding guide for more details.Gemini (Google)
Generate Content
Endpoint:POST /v1/projects/cognition/locations/us/publishers/google/models/{model}:generateContent
Safety Settings
You can configure content thresholds by including thesafetySettings
object in your request. Refer to the official Google AI documentation for a full list of categories and thresholds.
Claude (Anthropic)
Messages API
Endpoint:POST /v1/messages
Tool Use with Claude
You can equip Claude with a set of tools, and it will intelligently decide when to use them to answer a user’s request. This process involves a multi-step conversation where your code executes the tool and sends the result back to Claude. Here’s a complete example demonstrating the full tool-use lifecycle:Best Practices
- Routing: For production applications, use the
auto
routing mode for high availability. For specific model versions or features, use manual routing. See the Router Guide for details. - Error Handling: Network issues and provider outages can occur. Implement robust error handling with retries and exponential backoff, especially for long-running tasks.
- Streaming for UX: For any user-facing application, use streaming to provide a responsive, real-time experience.
- System Prompts: A well-crafted system prompt is crucial for guiding the model’s behavior, tone, and personality. Test and refine your prompts thoroughly.
- Token Management: Always be mindful of token limits for both the input context and the output generation. Monitor the
usage
data returned in the API response to track costs and avoid unexpected truncation.