Skip to main content

Overview

Embeddings are numerical vector representations of text that capture semantic meaning. This allows you to perform mathematical operations on text, such as measuring the “distance” or “similarity” between different pieces of text. They are a foundational component for a wide range of AI applications, including semantic search, clustering, recommendations, anomaly detection, and classification. YouRouter provides access to leading embedding models through a simple, unified API.

Usage

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["YOUROUTER_API_KEY"],
    base_url="https://api.yourouter.ai/v1"
)

response = client.embeddings.create(
    input="The quick brown fox jumps over the lazy dog",
    model="text-embedding-ada-002"
)

embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")

Batch Processing

For efficiency, you can pass an array of strings to the input parameter to generate multiple embeddings in a single request.
response = client.embeddings.create(
    input=[
        "First sentence to embed.",
        "Another sentence for batch processing."
    ],
    model="text-embedding-ada-002"
)

for i, data in enumerate(response.data):
    print(f"Embedding for input {i+1}: {len(data.embedding)} dimensions")

Parameters

input
string or array
required
The input text or texts to embed. Can be a single string or an array of strings for batch processing.
model
string
required
The ID of the embedding model to use (e.g., text-embedding-ada-002).
encoding_format
string
default:"float"
The format to return the embeddings in. Can be float or base64. Base64 is useful for reducing JSON payload size.
user
string
A unique identifier representing your end-user, which can help to monitor and detect abuse.

Use Cases

Instead of keyword matching, semantic search finds results that are contextually related to the user’s query, even if they don’t share the exact same words. This is achieved by comparing the vector embedding of the query with the embeddings of your documents.
import numpy as np
from scipy.spatial.distance import cosine

documents = [
    "The sky is blue and beautiful.",
    "The sun is the star at the center of the Solar System.",
    "Artificial intelligence will reshape our world."
]

doc_embeddings = [
    client.embeddings.create(input=doc, model="text-embedding-ada-002").data[0].embedding
    for doc in documents
]

query = "What is the future of AI?"
query_embedding = client.embeddings.create(input=query, model="text-embedding-ada-002").data[0].embedding

similarities = [1 - cosine(query_embedding, doc_emb) for doc_emb in doc_embeddings]
most_similar_index = np.argmax(similarities)

print(f"Query: '{query}'")
print(f"Most similar document: '{documents[most_similar_index]}'")

Classification & Clustering

Embeddings are powerful features for machine learning models. You can use them to train a classifier (e.g., for sentiment analysis or topic categorization) or to group similar items together using clustering algorithms.

Best Practices

While modern embedding models are robust, for some applications, basic preprocessing like removing irrelevant characters or normalizing text can still be beneficial. However, avoid aggressive stemming or stop-word removal, as it can strip away important context.
When you need to embed multiple pieces of text, always use the batching capability by passing an array of strings. This significantly reduces latency by minimizing the number of network round-trips.
If your application frequently embeds the same text (e.g., popular search queries or document titles), implement a caching layer (like Redis or an in-memory cache) to store and retrieve embeddings. This reduces API calls, lowers costs, and improves performance.

Scaling Up: Vector Databases

For any application with more than a few thousand embeddings, performing a brute-force similarity search (like the example above) becomes slow and inefficient. This is where vector databases come in. Vector databases are specialized systems designed to store and search through millions or even billions of vector embeddings at high speed. They use sophisticated indexing algorithms (like HNSW or IVF) to perform Approximate Nearest Neighbor (ANN) searches, providing a an excellent balance between speed and accuracy. Popular choices include:
  • Cloud-based: Pinecone, Zilliz Cloud
  • Open-source / Self-hosted: Weaviate, Milvus, Chroma, Qdrant