orchid-api
FastAPI server exposing the Orchid framework over HTTP with SSE streaming.
orchid-api is the HTTP layer over the Orchid framework. It imports orchid-ai as a dependency and exposes endpoints for chat management, SSE-streamed message handling, document upload and RAG ingestion, and identity bridging. All agent logic, graph building, and persistence live in the orchid library — this package adds only the FastAPI plumbing and the operator-facing surface. The API does not bundle storage or RAG backends; install the corresponding plugin packages alongside it.
Installation and startup
pip install orchid-ai orchid-api orchid-storage-postgres orchid-rag-qdrant
ORCHID_CONFIG=orchid.yml uvicorn orchid_api.main:app --port 8000For local development with hot-reload:
ORCHID_CONFIG=examples/basketball/orchid.yml uvicorn orchid_api.main:app --port 8000 --reloadRouter map
Endpoints are split across domain-scoped routers. New endpoints always go into the appropriate router, never into main.py.
chats — Session CRUD
POST /chats creates a new chat session. GET /chats lists sessions for the authenticated user. DELETE /chats/{id} removes a session and its messages. These are the lightweight bookkeeping operations; message content lives in the messages and streaming routers.
messages — Send and upload
POST /chats/{id}/messages sends a message and returns the full assistant response (non-streaming, multipart/form-data). POST /chats/{id}/upload accepts a file upload and indexes it into the chat's RAG scope. GET /chats/{id}/messages returns the message history for a session.
streaming — SSE-streamed send
POST /chats/{id}/messages/stream opens a Server-Sent Events stream and emits a rich event vocabulary: assistant.delta (token-by-token), supervisor.routing_decision, mini_agent.{decomposed,started,finished,aggregated}, tool_call.requires_approval, and assistant.complete. The frontend and orchid-mcp gateway both consume this endpoint.
resume — HITL approval
POST /chats/{id}/resume resumes a graph that paused on a requires_approval: true tool call. The caller passes an approve or deny decision; the supervisor continues from the interrupt point.
sharing — Promote RAG scope
POST /chats/{id}/share promotes chat-scoped RAG data to user scope, making it available across all chats for that user.
session — MCP cache warm-up
POST /session/warm triggers proactive MCP capability warming for passthrough and oauth servers for the authenticated user. The frontend calls this once after login. See MCP.
mcp_auth — Outbound MCP OAuth
GET /mcp/auth/servers lists OAuth-capable MCP servers and their per-user auth status. GET /mcp/auth/servers/{name}/authorize generates an authorization URL. GET /mcp/auth/callback handles the IdP redirect. DELETE /mcp/auth/servers/{name}/token revokes a stored token.
auth_info, auth_exchange, auth_identity — Identity bridging
GET /auth-info advertises the deployment's auth posture. POST /auth/exchange-code and POST /auth/refresh-token perform server-side token exchange so orchid-mcp and frontends never hold client_secret. POST /auth/resolve-identity proxies upstream token resolution via the pluggable OrchidIdentityResolver.
admin — RAG indexing
POST /index triggers document indexing for a given namespace. Used by admin scripts and the CLI when targeting a running API rather than running the library in-process.
diagnostics — Health check
GET /health returns the API's readiness status. Used by container orchestrators and the orchid-mcp startup probe.
AppContext, lifespan, and identity resolution
At startup the FastAPI lifespan function calls lifecycle.setup_orchid(), which:
- Loads
OrchidAgentsConfigfromORCHID_CONFIG. - Calls
build_graph()to wire the LangGraph supervisor and agent nodes. - Initialises the configured
OrchidChatStoragebackend. - Warms MCP capability caches for all
auth.mode: noneservers.
The result is stored in AppContext (orchid_api/context.py) as app_ctx.orchid — a single Orchid facade instance. All routers access it via from ..context import app_ctx. There are no module-level globals.
Identity resolution is handled once per request by the get_auth_context FastAPI dependency in auth.py. It resolves the incoming Bearer token into an OrchidAuthContext using the operator-supplied OrchidIdentityResolver subclass (configured via IDENTITY_RESOLVER_CLASS). The resolved context flows through the entire request; no other code re-resolves or re-exchanges tokens.
Key environment variables
See Configuration for the full reference. The most commonly set variables:
# Core
ORCHID_CONFIG=orchid.yml # Path to agents.yaml / orchid.yml
LITELLM_MODEL=openai/gpt-4o-mini # Default LLM for all agents
QDRANT_URL=http://qdrant:6333 # Vector store
CHAT_DB_DSN=postgresql+asyncpg://user:pass@db:5432/orchid
# Auth
IDENTITY_RESOLVER_CLASS=myproject.identity.MyIdentityResolver
DEV_AUTH_BYPASS=true # Skip auth in local dev
# LangSmith tracing (optional)
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=lsv2_…
LANGCHAIN_PROJECT=orchid-devLangSmith tracing
Set LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY to enable LangSmith. All LangGraph traces — supervisor routing decisions, agent invocations, tool calls — will appear in your LangSmith project. The tracing.py module initialises this at startup with no code changes required.
OrchidAuthContext is obtained once
The get_auth_context dependency in auth.py resolves the Bearer token exactly once per request. Routers receive a fully populated OrchidAuthContext and pass it downstream — they never call the identity resolver themselves, never initiate OAuth flows, and never re-read the Authorization header. This single-resolution pattern keeps auth logic in one place and avoids subtle double-resolution bugs.