Embeddings
Supported embedding models, dimension differences, and re-indexing requirements.
Orchid uses pluggable embedding models to convert documents and queries into dense vectors stored in the vector backend (Qdrant or ChromaDB). The embedding model is configured once at the deployment level — all agents in a deployment share the same model and collection dimensions.
Supported models and dimensions
| Model | Dimensions | Notes |
|---|---|---|
ollama/nomic-embed-text | 768 | Local Ollama — default for the demo |
text-embedding-3-small | 1536 | OpenAI |
gemini/gemini-embedding-001 | 3072 |
Set the embedding model:
defaults:
rag:
embedding_model: "ollama/nomic-embed-text"Any LiteLLM-compatible embedding model string is accepted. The LiteLLMEmbedder in orchid_ai/rag/embeddings.py delegates the actual call to LiteLLM.
Switching models requires re-indexing
Vector store collections are created with a fixed vector dimension. If you change embedding_model after ingesting documents, the dimensions of new vectors will not match the existing collections and inserts will fail. Drop and recreate the collections, then re-index all documents from scratch. This applies to both Qdrant and ChromaDB.
How embeddings are generated
Embedding happens lazily inside the vector store repository. Documents passed to ingest_document() are chunked first, then embedded on the fly — you never need to call the embedder directly.
On the retrieval side, the query string is embedded once per retrieval call and used to search all configured namespaces in parallel (domain namespace + uploads). Results are merged by score and the top-k are returned.
Hybrid search
When rag.retrieval.strategy: hybrid is set, Orchid runs a dense vector search alongside a sparse (BM25 or SPLADE) pass and fuses the results with Reciprocal Rank Fusion. The sparse encoder is independent of the dense embedding model — you can pair any dense model with the built-in BM25 encoder without re-indexing the sparse index.