Overview
Embeddings are numerical vector representations of text that capture semantic meaning. This allows you to perform mathematical operations on text, such as measuring the “distance” or “similarity” between different pieces of text. They are a foundational component for a wide range of AI applications, including semantic search, clustering, recommendations, anomaly detection, and classification. YouRouter provides access to leading embedding models through a simple, unified API.Usage
Batch Processing
For efficiency, you can pass an array of strings to theinput
parameter to generate multiple embeddings in a single request.
Parameters
The input text or texts to embed. Can be a single string or an array of strings for batch processing.
The ID of the embedding model to use (e.g.,
text-embedding-ada-002
).The format to return the embeddings in. Can be
float
or base64
. Base64 is useful for reducing JSON payload size.A unique identifier representing your end-user, which can help to monitor and detect abuse.
Use Cases
Semantic Search
Instead of keyword matching, semantic search finds results that are contextually related to the user’s query, even if they don’t share the exact same words. This is achieved by comparing the vector embedding of the query with the embeddings of your documents.Classification & Clustering
Embeddings are powerful features for machine learning models. You can use them to train a classifier (e.g., for sentiment analysis or topic categorization) or to group similar items together using clustering algorithms.Best Practices
Preprocessing
Preprocessing
While modern embedding models are robust, for some applications, basic preprocessing like removing irrelevant characters or normalizing text can still be beneficial. However, avoid aggressive stemming or stop-word removal, as it can strip away important context.
Batching for Efficiency
Batching for Efficiency
When you need to embed multiple pieces of text, always use the batching capability by passing an array of strings. This significantly reduces latency by minimizing the number of network round-trips.
Caching
Caching
If your application frequently embeds the same text (e.g., popular search queries or document titles), implement a caching layer (like Redis or an in-memory cache) to store and retrieve embeddings. This reduces API calls, lowers costs, and improves performance.
Scaling Up: Vector Databases
For any application with more than a few thousand embeddings, performing a brute-force similarity search (like the example above) becomes slow and inefficient. This is where vector databases come in. Vector databases are specialized systems designed to store and search through millions or even billions of vector embeddings at high speed. They use sophisticated indexing algorithms (like HNSW or IVF) to perform Approximate Nearest Neighbor (ANN) searches, providing a an excellent balance between speed and accuracy. Popular choices include:- Cloud-based: Pinecone, Zilliz Cloud
- Open-source / Self-hosted: Weaviate, Milvus, Chroma, Qdrant