Skip to main content

Overview

Embeddings are numerical vector representations of text that capture semantic meaning. This allows you to perform mathematical operations on text, such as measuring the “distance” or “similarity” between different pieces of text. They are a foundational component for a wide range of AI applications, including semantic search, clustering, recommendations, anomaly detection, and classification. YouRouter provides access to leading embedding models through a simple, unified API.

Usage

from openai import OpenAI

client = OpenAI(
    api_key="your-api-key-here",
    base_url="https://api.yourouter.ai/v1"
)

response = client.embeddings.create(
    input="The quick brown fox jumps over the lazy dog",
    model="text-embedding-ada-002" # A popular and efficient model
)

embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")
# Example Output: Embedding dimensions: 1536

Batch Processing

For efficiency, you can pass an array of strings to the input parameter to generate multiple embeddings in a single request.
response = client.embeddings.create(
    input=[
        "First sentence to embed.",
        "Another sentence for batch processing."
    ],
    model="text-embedding-ada-002"
)

for i, data in enumerate(response.data):
    print(f"Embedding for input {i+1}: {len(data.embedding)} dimensions")

Parameters

input
string or array
required
The input text or texts to embed. Can be a single string or an array of strings for batch processing.
model
string
required
The ID of the embedding model to use (e.g., text-embedding-ada-002).
encoding_format
string
default:"float"
The format to return the embeddings in. Can be float or base64. Base64 is useful for reducing JSON payload size.
user
string
A unique identifier representing your end-user, which can help to monitor and detect abuse.

Use Cases

Instead of keyword matching, semantic search finds results that are contextually related to the user’s query, even if they don’t share the exact same words. This is achieved by comparing the vector embedding of the query with the embeddings of your documents.
import numpy as np
from scipy.spatial.distance import cosine

# Assume 'client' is an initialized OpenAI client

# 1. Example documents to search through
documents = [
    "The sky is blue and beautiful.",
    "The sun is the star at the center of the Solar System.",
    "Artificial intelligence will reshape our world."
]

# 2. Generate embeddings for all documents (in a real app, you'd store these)
doc_embeddings = [
    client.embeddings.create(input=doc, model="text-embedding-ada-002").data[0].embedding
    for doc in documents
]

# 3. User's search query
query = "What is the future of AI?"

# 4. Generate embedding for the query
query_embedding = client.embeddings.create(input=query, model="text-embedding-ada-002").data[0].embedding

# 5. Calculate cosine similarity between the query and each document
# (1 - cosine distance is cosine similarity)
similarities = [1 - cosine(query_embedding, doc_emb) for doc_emb in doc_embeddings]

# 6. Find the most similar document
most_similar_index = np.argmax(similarities)

print(f"Query: '{query}'")
print(f"Most similar document: '{documents[most_similar_index]}'")
# Expected Output: Most similar document: 'Artificial intelligence will reshape our world.'

Classification & Clustering

Embeddings are powerful features for machine learning models. You can use them to train a classifier (e.g., for sentiment analysis or topic categorization) or to group similar items together using clustering algorithms.

Best Practices

While modern embedding models are robust, for some applications, basic preprocessing like removing irrelevant characters or normalizing text can still be beneficial. However, avoid aggressive stemming or stop-word removal, as it can strip away important context.
When you need to embed multiple pieces of text, always use the batching capability by passing an array of strings. This significantly reduces latency by minimizing the number of network round-trips.
If your application frequently embeds the same text (e.g., popular search queries or document titles), implement a caching layer (like Redis or an in-memory cache) to store and retrieve embeddings. This reduces API calls, lowers costs, and improves performance.

Scaling Up: Vector Databases

For any application with more than a few thousand embeddings, performing a brute-force similarity search (like the example above) becomes slow and inefficient. This is where vector databases come in. Vector databases are specialized systems designed to store and search through millions or even billions of vector embeddings at high speed. They use sophisticated indexing algorithms (like HNSW or IVF) to perform Approximate Nearest Neighbor (ANN) searches, providing a an excellent balance between speed and accuracy. Popular choices include:
  • Cloud-based: Pinecone, Zilliz Cloud
  • Open-source / Self-hosted: Weaviate, Milvus, Chroma, Qdrant
I