Agents Configuration
Detailed reference for every agents.yaml property — defaults, supervisor, agents, tools, skills, guardrails, events.
Index
version
defaults
supervisor
agents[]
tools[]
skills[]
guardrails
mcp_gateway
events
defaults
Top-level defaults inherited by every agent unless explicitly overridden per-agent. Think of this as the global agent template.
defaults.llm
| |
|---|
| Short | Default LLM settings for all agents. |
| Detailed | Every agent that does not specify its own llm block inherits these values. This is the most common place to set the primary model and fallback. |
defaults.llm.model
| |
|---|
| Short | Default LLM model for all agents. |
| Detailed | LiteLLM provider/model-name format. Used for agent reasoning, tool calling, and summarisation when no per-agent override exists. |
| Default | gemini/gemini-2.5-flash |
| Available values | Any LiteLLM-compatible model string |
defaults.llm.temperature
| |
|---|
| Short | Sampling temperature. |
| Detailed | Controls randomness. Lower values (0.0–0.3) produce more deterministic, repeatable outputs. Higher values (0.7–1.0) increase creativity and variation. For tool-calling agents, keep this low to ensure consistent JSON formatting. |
| Default | 0.2 |
| Available values | 0.0 to 2.0 (provider-dependent) |
⚠Tool-calling temperature
High temperatures cause malformed tool-call JSON and hallucinated function names. Keep temperature <= 0.3 for agents that rely on structured tool calls.
defaults.llm.fallback_model
| |
|---|
| Short | Fallback model when the primary fails. |
| Detailed | When the primary model returns a 503, rate-limit error, or timeout, Orchid automatically retries with this fallback model. The fallback is tried once per request. |
| Default | null |
| Available values | Any LiteLLM-compatible model string, or null to disable |
⚠Always set in production
Always configure a fallback model in production. Pair a cloud-hosted primary with a local Ollama model so the service degrades gracefully during provider outages rather than returning errors to users.
defaults.llm.retry_attempts
| |
|---|
| Short | Retry count on transient LLM errors. |
| Detailed | When > 0, transient errors (network timeouts, 5xx responses) are retried with exponential backoff. 0 means no automatic retry — the error surfaces to the user immediately. |
| Default | 0 |
| Available values | 0 or any positive integer |
defaults:
llm:
model: gemini/gemini-2.5-flash
temperature: 0.2
fallback_model: ollama/llama3.2
retry_attempts: 2
---
defaults:
llm:
model: openai/gpt-4o
temperature: 0.1
fallback_model: groq/llama-3.3-70b-versatile
---
defaults.rag
| |
|---|
| Short | Default RAG settings for all agents. |
| Detailed | Retrieval-Augmented Generation configuration that applies globally unless overridden per-agent. |
defaults.rag.k
| |
|---|
| Short | Number of chunks retrieved per query. |
| Detailed | The top-k chunks returned by vector similarity search. Higher values surface more documents but increase prompt size and cost. Lower values improve precision but may miss relevant context. |
| Default | 5 |
⚠Tuning retrieval count
Raise k for knowledge-base agents that must surface multiple relevant documents per query (e.g. catalog search, document Q&A). Lower it for precision-focused agents to keep prompts concise and reduce hallucination from noisy context.
defaults.rag.enabled
| |
|---|
| Short | Enable RAG context retrieval. |
| Detailed | Master switch. When disabled, no vector retrieval runs and no RAG context is injected into prompts. Useful for agents that rely purely on tools or static prompts. |
| Default | true |
| Available values | true, false |
defaults.rag.rag_ttl
| |
|---|
| Short | Cache TTL for RAG results in seconds. |
| Detailed | When > 0, repeated queries within the TTL window reuse the previous retrieval result without hitting the vector database. 0 disables caching — every query triggers a fresh retrieval. |
| Default | 0 |
⚠Cache freshness vs cost
Set a non-zero TTL (e.g. 300–600 seconds) for agents with stable knowledge bases to reduce Qdrant load and latency. Set to 0 for real-time data agents where documents change frequently.
defaults.rag.max_context_chars
| |
|---|
| Short | Maximum characters of RAG context injected into prompts. |
| Detailed | A hard cap on the RAG context block size. Even if k retrieves 10 chunks, the total injected text is truncated to this limit. Prevents oversized prompts from consuming the LLM's context window. |
| Default | 3000 |
defaults.rag.ingestion
Document ingestion settings that control how documents are split and processed before embedding.
defaults.rag.ingestion.strategy
| |
|---|
| Short | Chunking strategy name. |
| Detailed | The algorithm used to split documents into chunks before embedding. Each strategy balances semantic coherence with retrieval granularity differently. |
| Default | recursive (inherited when null) |
| Available values | recursive, semantic, hierarchical, headered |
recursive — Splits text recursively by separators (paragraphs, sentences, words) until chunks fit chunk_size. Best general-purpose choice.
semantic — Uses an embedding model to detect semantic boundaries and split at natural topic transitions. Higher quality but slower and more expensive.
hierarchical — Creates parent-child chunk relationships. Parent chunks provide broad context; child chunks enable precise retrieval. Requires parent_chunk_size > 0.
headered — Splits at Markdown/HTML headers, preserving document structure. Ideal for well-structured documentation.
defaults.rag.ingestion.chunk_size
| |
|---|
| Short | Text chunk size in characters. |
| Default | 1000 |
defaults.rag.ingestion.chunk_overlap
| |
|---|
| Short | Character overlap between consecutive chunks. |
| Default | 200 |
defaults.rag.ingestion.parent_chunk_size
| |
|---|
| Short | Parent chunk size for hierarchical chunking. |
| Detailed | When > 0, enables hierarchical parent-child layout. Parent chunks are embedded separately and stored in chunk metadata. Child chunks are used for precise retrieval; parent chunks provide broader context. |
| Default | 0 (disabled) |
defaults.rag.ingestion.parent_chunk_overlap
| |
|---|
| Short | Overlap for parent chunks. |
| Default | 200 |
defaults.rag.ingestion.post_processors
| |
|---|
| Short | Post-processing pipeline applied after chunking. |
| Detailed | Ordered list of post-processor names that transform chunks after initial splitting. |
| Default | [] |
| Available values | contextual_headers, entity_extraction |
contextual_headers — Prepends a contextually-aware header to each chunk describing what document it came from.
entity_extraction — Extracts named entities and relationships for GraphRAG. Requires retrieval.graph.enabled: true.
defaults:
rag:
ingestion:
strategy: hierarchical
chunk_size: 1000
chunk_overlap: 200
parent_chunk_size: 4000
parent_chunk_overlap: 400
post_processors:
- contextual_headers
defaults.rag.retrieval
Query retrieval settings that control how user queries are transformed and matched against the vector store.
defaults.rag.retrieval.strategy
| |
|---|
| Short | Retrieval strategy name. |
| Detailed | The algorithm used to match queries against embedded chunks. Different strategies optimise for different query types and content characteristics. |
| Default | simple (inherited when null) |
| Available values | simple, multi_query, hyde, hybrid, graph_rag |
simple — Cosine similarity between query embedding and chunk embeddings. Fast, baseline quality.
multi_query — Generates multiple paraphrased versions of the query and retrieves for each, then deduplicates. Improves recall for ambiguous or paraphrased questions.
hyde — Generates a hypothetical answer to the query, embeds that answer, and retrieves chunks similar to the hypothetical answer. Excellent for dense technical knowledge where exact keyword matches are unreliable.
hybrid — Combines dense vector similarity with sparse keyword matching (BM25 or SPLADE). Best when exact keyword matches matter alongside semantic similarity.
graph_rag — Traverses an entity-relationship graph extracted during ingestion. Requires entity_extraction post-processor and graph.enabled: true.
⚠Strategy selection
Start with simple. Switch to multi_query when single-query retrieval misses paraphrased or ambiguous questions. Use hyde for domains where hypothetical answers improve recall (dense technical knowledge). Use hybrid when exact keyword matches matter alongside semantic similarity.
defaults.rag.retrieval.query_transformers
| |
|---|
| Short | Ordered list of query transformer names. |
| Detailed | Pre-strategy and strategy-level transformers that rewrite or expand queries before retrieval. Pre-strategy transformers (e.g. reformulate) run at turn entry. Strategy-level transformers (e.g. multi_query, hyde, decompose) are forwarded to the active strategy. |
| Default | null (inherits from defaults, effectively []) |
| Available values | reformulate, multi_query, hyde, decompose |
defaults.rag.retrieval.metadata_filters
| |
|---|
| Short | Metadata filter expressions applied to all retrievals. |
| Detailed | An operator mini-language for filtering retrieved chunks by metadata fields. Supports equality, range, and boolean operators. |
| Default | {} |
defaults.rag.retrieval.exclude_dynamic
| |
|---|
| Short | Exclude dynamically-injected tool output from retrieval. |
| Detailed | When true, adds a dynamic: {"not": true} clause to prevent re-retrieving chunks that were dynamically injected by tool calls in previous turns. Prevents circular retrieval of tool-generated content. |
| Default | false |
defaults.rag.retrieval.hyde
HyDE-specific retrieval knobs.
defaults.rag.retrieval.hyde.n_hypothetical
| |
|---|
| Short | Number of hypothetical answers generated per query. |
| Detailed | Classic HyDE uses 1 hypothetical answer. Increasing this grows recall at the cost of additional LLM calls (one per hypothetical answer). Each hypothetical answer is embedded separately and retrieval results are merged. |
| Default | 1 |
defaults:
rag:
retrieval:
strategy: hyde
hyde:
n_hypothetical: 3
defaults.rag.retrieval.hybrid
Hybrid retrieval knobs (sparse + dense).
defaults.rag.retrieval.hybrid.sparse_encoder
| |
|---|
| Short | Sparse encoder type for keyword matching. |
| Default | bm25 |
| Available values | bm25, splade |
defaults.rag.retrieval.hybrid.sparse_weight
| |
|---|
| Short | Weight of the sparse signal in linear fusion. |
| Detailed | Only used when fusion is linear. 0.0 = pure dense vectors. 1.0 = pure sparse keywords. 0.4–0.5 is a good starting point for most domains. |
| Default | 0.4 |
defaults.rag.retrieval.hybrid.fusion
| |
|---|
| Short | Fusion method for combining sparse and dense rankings. |
| Default | rrf |
| Available values | rrf, linear |
rrf — Reciprocal Rank Fusion. Parameter-free. Ranks are combined as 1 / (k + rank) where k defaults to 60. Good default choice.
linear — Weighted linear combination using sparse_weight. More tunable but requires calibration.
defaults.rag.retrieval.hybrid.rrf_k
| |
|---|
| Short | RRF constant k. |
| Detailed | The constant used in the Reciprocal Rank Fusion formula. Default 60 follows Cormack et al. Lower values emphasise top-ranked documents more heavily. |
| Default | 60 |
defaults.rag.retrieval.graph
GraphRAG-specific retrieval knobs.
defaults.rag.retrieval.graph.enabled
| |
|---|
| Short | Enable graph entity extraction during ingestion. |
| Detailed | When enabled, the entity_extraction post-processor extracts entities and relationships from chunks and builds a knowledge graph. This graph is then traversed at retrieval time. |
| Default | false |
defaults.rag.retrieval.graph.max_hops
| |
|---|
| Short | Maximum BFS depth from seed entities. |
| Detailed | How many relationship hops to traverse from each seed entity found in the query. Higher values surface more connected context but increase retrieval time and noise. |
| Default | 2 |
defaults.rag.retrieval.graph.fuse_with_vectors
| |
|---|
| Short | Merge graph context with vector hits. |
| Detailed | When true, the retrieved subgraph is serialised and appended alongside standard vector retrieval results. When false, only graph context is returned (no vector hits). |
| Default | true |
defaults.rag.retrieval.graph.relation_filter
| |
|---|
| Short | Restrict graph traversal to specific edge labels. |
| Detailed | When non-empty, only traverse edges with these relationship types. Useful for domain-specific graphs where only certain relation types are relevant. |
| Default | [] |
defaults:
rag:
retrieval:
strategy: graph_rag
graph:
enabled: true
max_hops: 3
relation_filter:
- works_for
- manages
defaults.rag.retrieval.transformer_prompts
Override prompts for the built-in query transformers.
defaults.rag.retrieval.transformer_prompts.multi_query
| |
|---|
| Short | Override prompt for the multi-query transformer. |
| Detailed | Replaces the default prompt that asks the LLM to generate paraphrased query variants. Use this to tailor paraphrasing style to your domain. |
| Default | null (module-level default) |
defaults.rag.retrieval.transformer_prompts.hyde
defaults.rag.retrieval.transformer_prompts.hyde.single
| |
|---|
| Short | HyDE prompt for a single hypothetical answer. |
| Default | null (module-level default) |
defaults.rag.retrieval.transformer_prompts.hyde.multi
| |
|---|
| Short | HyDE prompt for multiple hypothetical answers. |
| Detailed | Uses a {'{n}'} placeholder that is replaced with the value of n_hypothetical. |
| Default | null (module-level default) |
defaults.rag.retrieval.transformer_prompts.decompose
| |
|---|
| Short | Override prompt for the decompose transformer. |
| Detailed | Replaces the default prompt that breaks complex queries into sub-queries. |
| Default | null (module-level default) |
defaults.rag.retrieval.transformer_prompts.reformulate
| |
|---|
| Short | Override prompt for the reformulate transformer. |
| Detailed | Replaces the default prompt that reformulates the query for better retrieval. |
| Default | null (module-level default) |
defaults.cache_enabled
| |
|---|
| Short | Enable global in-memory LLM response cache. |
| Detailed | Activates LangChain's InMemoryCache via set_llm_cache(). Identical prompts (same model, messages, temperature) return cached results without an LLM call. Cache lives for the process lifetime and is lost on restart. |
| Default | false |
| Available values | true, false |
⚠Cache scope and invalidation
The cache key includes the full prompt text, model string, and temperature. Changing any of these invalidates the cache entry. There is no explicit cache invalidation API — restart the process to clear. Do not enable if your agents produce time-sensitive or user-specific outputs that must vary per call.
defaults:
cache_enabled: true
supervisor
The supervisor is the central orchestrator that routes queries to agents, manages multi-turn conversation state, and synthesises final responses.
supervisor.assistant_name
| |
|---|
| Short | Display name for the AI assistant. |
| Detailed | Used in supervisor prompts and shown in the UI. Customise to match your product branding. |
| Default | "AI assistant" |
supervisor:
assistant_name: "Orchid Helpdesk"
---
supervisor:
assistant_name: "Acme Support Bot"
---
supervisor.fallback_model
| |
|---|
| Short | Fallback LLM for the supervisor. |
| Detailed | Overrides defaults.llm.fallback_model specifically for supervisor operations (routing, synthesis, sequential advance). Useful when the supervisor needs a different fallback than agents. |
| Default | null (inherits defaults.llm.fallback_model) |
supervisor.streaming_enabled
| |
|---|
| Short | Enable SSE streaming for responses. |
| Detailed | When enabled, the API returns text/event-stream responses with tokens arriving as they are generated. When disabled, responses are buffered and returned as complete JSON. |
| Default | true |
| Available values | true, false |
supervisor.routing_system_prompt
| |
|---|
| Short | Custom system prompt for the routing phase. |
| Detailed | Replaces the default template that tells the supervisor how to classify queries and select agents. Use this to inject domain-specific routing instructions. |
| Default | null (built-in template) |
supervisor.synthesis_system_prompt
| |
|---|
| Short | Custom system prompt for the synthesis phase. |
| Detailed | Replaces the default template that tells the supervisor how to combine agent outputs into a coherent final response. |
| Default | null (built-in template) |
supervisor.sequential_advance_prompt
| |
|---|
| Short | Custom handoff prompt for sequential multi-agent flows. |
| Detailed | Used when agents are chained sequentially (one after another). Replaces the default template that tells the supervisor how to pass state between agents. |
| Default | null (built-in template) |
supervisor.history_max_turns
| |
|---|
| Short | Maximum conversation exchange pairs retained. |
| Detailed | The supervisor keeps the most recent N user/assistant exchange pairs in context (up to 2xN messages). Older turns are dropped or summarised depending on history_summary_enabled. |
| Default | 20 |
supervisor.history_max_chars
| |
|---|
| Short | Maximum characters per message before truncation. |
| Detailed | Individual messages longer than this are truncated with a ... suffix. Prevents a single oversized message from consuming the entire context window. |
| Default | 1000 |
supervisor.routing_model
| |
|---|
| Short | Cheaper/faster LLM for routing and advance phases. |
| Detailed | When set, the supervisor uses this model for routing decisions and sequential handoffs instead of the primary model. Saves cost and latency because routing requires less reasoning power than synthesis. |
| Default | null (uses supervisor's main model) |
supervisor.history_summary_enabled
| |
|---|
| Short | Enable sliding-window summarization. |
| Detailed | When enabled, conversation history beyond history_summary_recent_turns is compressed via a cheap LLM call into a summary. The summary plus the recent verbatim turns are sent to the model. Dramatically reduces token usage for long-running conversations. |
| Default | true |
| Available values | true, false |
⚠When to enable
Enable for long-running chats with token-priced LLMs where context accumulates over many turns. Disable for short-form workflows where keeping the full verbatim history is cheaper than the summarization LLM call.
supervisor.history_summary_model
| |
|---|
| Short | Model used for history summarization. |
| Detailed | A cheap, fast model is recommended for summarization (e.g. gemini/gemini-2.5-flash or an Ollama model). Falls back to the supervisor's main model when not set. |
| Default | null |
supervisor.history_summary_recent_turns
| |
|---|
| Short | Number of recent turns kept verbatim. |
| Detailed | The most recent N exchange pairs are kept in full text. Everything older is summarised. Set this high enough to preserve the immediate conversation context. |
| Default | 10 |
supervisor.skip_synthesis_when_single_agent
| |
|---|
| Short | Skip synthesis when only one agent ran. |
| Detailed | When enabled (default), if exactly one agent produced a substantive text response, that text is returned directly without running the supervisor synthesis LLM call. Saves 5–15 seconds and one LLM call per single-agent turn. |
| Default | true |
| Available values | true, false |
⚠When to disable
Leave enabled to save 5–15 s and one LLM call on every single-agent turn. Disable only if the supervisor must always rewrite or augment the agent's raw output regardless of routing.
supervisor:
assistant_name: "Helpdesk Bot"
history_max_turns: 30
history_summary_enabled: true
history_summary_model: ollama/llama3.2
history_summary_recent_turns: 15
skip_synthesis_when_single_agent: true
supervisor.memory
Conversation memory configuration — controls how past conversation context is summarized, persisted, and retrieved beyond the current LangGraph state. Three strategies available (see Chat Summarization).
| |
|---|
| Short | Conversation memory strategy and configuration. |
| Detailed | A nested block that controls incremental running summaries, structured JSON entity extraction, and Qdrant-backed semantic retrieval of past turns. Default is strategy: "none" (no memory, backward-compatible). |
| Default | {strategy: "none", structured_output: true, ...} |
supervisor.memory.strategy
| |
|---|
| Short | Memory strategy selection. |
| Detailed | none — no memory (backward-compatible). running_summary — stateful incremental compression (avoids O(n²) re-compute). rag_augmented — adds Qdrant semantic retrieval of past turns on top of running summary. |
| Default | "none" |
| Available values | "none", "running_summary", "rag_augmented" |
supervisor.memory.summary_recent_turns
| |
|---|
| Short | Recent turns kept verbatim when using memory-based summarization. |
| Detailed | When memory is active, the most recent N exchange pairs are preserved in full text alongside the incremental summary. Independent of supervisor.history_summary_recent_turns. |
| Default | 10 |
supervisor.memory.summary_model
| |
|---|
| Short | LLM model for summary extension calls in the memory pipeline. |
| Detailed | A cheap, fast model recommended (e.g. gemini/gemini-2.5-flash-lite). Falls back to supervisor.history_summary_model, then the supervisor's main model. |
| Default | null |
supervisor.memory.summary_prompt
| |
|---|
| Short | Custom compression prompt. |
| Detailed | When set, overrides the default compression/extension prompt used by the memory system. null uses the built-in defaults (structured JSON extraction or narrative compression depending on structured_output). |
| Default | null |
supervisor.memory.persist_summary
| |
|---|
| Short | Persist running summaries to chat storage. |
| Detailed | When true, summaries are stored in the conversation_summaries table (SQLite/PostgreSQL) for cross-invocation reuse. When false, summaries are computed fresh each turn (ephemeral, no disk write). |
| Default | true |
| Available values | true, false |
supervisor.memory.structured_output
| |
|---|
| Short | Enable structured JSON entity extraction in summaries. |
| Detailed | When true, the LLM produces JSON with topics, entities, actions, decisions, questions, and preferences. Falls back to narrative-only on JSON parse failure. When false, produces a flat paragraph summary. |
| Default | true |
| Available values | true, false |
✓Entity deduplication
When structured_output: true, entities mentioned across multiple turns are automatically deduplicated by name. New details are appended to the existing entity record rather than creating duplicates.
supervisor.memory.rag_namespace
| |
|---|
| Short | Qdrant namespace for conversation memory embeddings. |
| Detailed | Reserved namespace in Qdrant where conversation turns are stored as embeddings. Uses OrchidRAGScope for hierarchical tenant isolation. Only relevant when strategy: "rag_augmented". |
| Default | "__memory__" |
supervisor.memory.rag_k
| |
|---|
| Short | Number of semantically relevant past turns to retrieve. |
| Detailed | How many past conversation turns to retrieve from Qdrant via semantic search on each new user query. Higher values surface more context at the cost of token budget. Only relevant when strategy: "rag_augmented". |
| Default | 5 |
supervisor.memory.rag_similarity_threshold
| |
|---|
| Short | Minimum similarity score for RAG-retrieved turns. |
| Detailed | Results below this score are discarded. Range 0.0–1.0. Lower values include more turns (potentially noisy). Higher values are stricter. Only relevant when strategy: "rag_augmented". |
| Default | 0.5 |
supervisor.memory.store_turns
| |
|---|
| Short | Automatically embed and store each conversation turn in Qdrant. |
| Detailed | When true, each user message and assistant response is embedded and stored in the __memory__ Qdrant namespace for future retrieval. Only relevant when strategy: "rag_augmented". |
| Default | true |
| Available values | true, false |
supervisor.memory.truncation_strategy
| |
|---|
| Short | How messages exceeding max_chars are truncated. |
| Detailed | hard — content[:max_chars] + "…" (current behavior). middle — keeps first 40% and last 40%, with …[truncated]… marker. llm — asks LLM to summarize; falls back to middle on failure. semantic — reserved for embedding-based selection; falls back to middle. |
| Default | "hard" |
| Available values | "hard", "middle", "llm", "semantic" |
supervisor.memory.truncation_max_chars
| |
|---|
| Short | Character limit for message truncation. |
| Detailed | Individual messages longer than this are truncated using truncation_strategy. Overrides supervisor.history_max_chars when memory is enabled. |
| Default | 1000 |
# Full memory config example
supervisor:
memory:
strategy: "rag_augmented"
summary_recent_turns: 10
structured_output: true
persist_summary: true
rag_k: 5
rag_similarity_threshold: 0.5
store_turns: true
truncation_strategy: "middle"
truncation_max_chars: 1000
agents[]
Agent definitions. Each key becomes an agent name. The name is used for routing, logging, and namespace addressing.
agents[].name
| |
|---|
| Short | Agent name (set automatically). |
| Detailed | The dictionary key in YAML or the filename stem in Markdown mode. Read-only — set by the loader, not by the user. |
| Default | "" |
agents[].description
| |
|---|
| Short | Human-readable purpose for supervisor routing. |
| Detailed | The supervisor uses this description to decide whether to route a query to this agent. Be concise and specific: describe what the agent does and what types of queries it handles. |
| Default | Required |
agents:
basketball:
description: "Answers questions about NBA players, teams, and statistics."
---
description: "Answers questions about NBA players, teams, and statistics."
---
agents[].prompt
| |
|---|
| Short | System prompt for the agent. |
| Detailed | The core instructions injected into the agent's agentic loop. In YAML this is a string (use ` |
| Default | Required |
agents:
basketball:
prompt: |
You are a basketball expert. Use the provided tools to look up
player stats, team rosters, and game schedules. Be concise.
---
# frontmatter goes here
---
You are a basketball expert. Use the provided tools to look up
player stats, team rosters, and game schedules. Be concise.
agents[].class
| |
|---|
| Short | Dotted import path to a custom OrchidAgent subclass. |
| Detailed | When omitted, the agent uses GenericAgent. Custom subclasses can override run(), summarise(), or add bespoke tool-call logic. The class is resolved at runtime via importlib. |
| Default | null (uses GenericAgent) |
| Available values | Any dotted Python path to an OrchidAgent subclass |
agents:
support:
class: myapp.agents.support.SupportAgent
agents[].parallel_tools
| |
|---|
| Short | Dispatch independent tool calls in parallel. |
| Detailed | When enabled, the agent partitions its tool_calls into a parallel batch (dispatched via asyncio.gather) and a sequential tail. Per-tool safety is resolved from parallel_safe on the tool config, MCP readOnlyHint, or the built-in tool registry. |
| Default | false |
| Available values | true, false |
⚠Parallel safety
Enable when an agent consistently makes multiple independent read-only tool calls per turn. Keep disabled for write operations or any tool chain where order guarantees matter — parallel dispatch removes sequencing. Read-only tools with parallel_safe: true (or MCP readOnlyHint: true) run in parallel; all others run sequentially.
agents[].llm
Per-agent LLM override. Same structure as defaults.llm. When any field is set, it overrides the corresponding default.
agents:
creative:
llm:
model: anthropic/claude-sonnet-4-20250514
temperature: 0.8
agents[].rag
Per-agent RAG override. Same structure as defaults.rag. All fields cascade: unset fields inherit from defaults.rag.
agents:
knowledge:
rag:
namespace: docs
k: 10
enabled: true
agents[].rag.namespace
| |
|---|
| Short | Qdrant collection namespace for this agent. |
| Detailed | Documents indexed for this agent are stored in this namespace. Different agents can share a namespace (common knowledge base) or use separate ones (isolated domains). |
| Default | "" (uses agent name as namespace) |
agents[].rag.payload_indexes
| |
|---|
| Short | Explicit Qdrant payload index declarations. |
| Detailed | Map of field_name -> qdrant_schema_type for metadata fields you want to filter on. Types: keyword, integer, float, bool, datetime, text, geo. |
| Default | {} |
agents:
catalog:
rag:
payload_indexes:
category: keyword
price: float
in_stock: bool
agents[].mcp_servers[]
MCP server connections for this agent. Each entry defines a remote tool provider.
agents[].mcp_servers[].name
| |
|---|
| Short | Unique identifier for this MCP server. |
| Default | Required |
agents[].mcp_servers[].type
| |
|---|
| Short | Server type. |
| Default | local |
| Available values | local, remote |
agents[].mcp_servers[].transport
| |
|---|
| Short | Transport protocol. |
| Default | streamable_http |
| Available values | streamable_http, sse |
agents[].mcp_servers[].url
| |
|---|
| Short | MCP server URL. |
| Detailed | Supports ${ENV_VAR} interpolation for runtime configuration. |
| Default | Required |
agents:
sales:
mcp_servers:
- name: crm
type: remote
transport: streamable_http
url: "${CRM_MCP_URL}"
agents[].mcp_servers[].auth
agents[].mcp_servers[].auth.mode
| |
|---|
| Short | Authentication mode for this MCP server. |
| Detailed | Determines how the agent authenticates to the MCP server. |
| Default | none |
| Available values | none, passthrough, oauth |
none — No authentication headers are sent. Use for local servers on private networks.
passthrough — Forwards the graph's OrchidAuthContext bearer token unchanged. Use when the MCP server trusts the same identity provider.
oauth — Per-user OAuth 2.0 via MCP 2025-03-26 spec. On the first 401, Orchid discovers the server's OAuth metadata (RFC 9728), fetches the authorization server metadata (RFC 8414), and performs dynamic client registration (RFC 7591). No client_id or client_secret lives in config — everything is discovered at runtime.
⚠OAuth mode
Use none for local MCP servers that need no credentials. Use passthrough when the MCP server shares the same identity provider. Use oauth when each user must independently authorize the MCP server; Orchid discovers everything from the server's 401 response automatically.
agents[].mcp_servers[].tools
| |
|---|
| Short | Tool allow-list or wildcard. |
| Detailed | List of OrchidToolConfig entries defining which tools this agent may call. Use "*" or ["*"] to discover all tools at runtime. Individual tools can override parallel_safe, inject_to_rag, requires_approval, and rag settings. |
| Default | [] |
agents:
sales:
mcp_servers:
- name: crm
url: https://crm.example.com/mcp
tools:
- name: search_contacts
inject_to_rag: true
rag_ttl: 300
requires_approval: false
parallel_safe: true
- name: delete_contact
requires_approval: true
agents[].mcp_servers[].prompts
| |
|---|
| Short | MCP prompt names to load. |
| Detailed | Pre-configured prompts exposed by the MCP server that the agent can reference. Use "*" to discover all prompts at runtime. |
| Default | [] |
agents[].mcp_servers[].resources
| |
|---|
| Short | MCP resource URIs to load. |
| Detailed | Static resources (documents, schemas, etc.) exposed by the MCP server. Use "*" to discover all resources at runtime. |
| Default | [] |
agents[].mcp_servers[].tool_call_strategy
| |
|---|
| Short | How tools from this server are dispatched. |
| Detailed | Strategy name registered in the OrchidToolCallStrategy registry. |
| Default | all |
| Available values | all, sequential, llm_decides, or any custom registered strategy |
all — Call every tool concurrently, collect all results.
sequential — Call tools in order, chaining previous_results forward.
llm_decides — Ask the LLM which tools to call and with what arguments. Falls back to all on failure.
agents[].mcp_servers[].discover_all_tools
| |
|---|
| Short | Auto-discovered flag for tools. |
| Detailed | Set automatically by the wildcard validator when tools: "*" or tools: ["*"]. Do not set manually. |
| Default | false |
agents[].mcp_servers[].discover_all_prompts
| |
|---|
| Short | Auto-discovered flag for prompts. |
| Default | false |
agents[].mcp_servers[].discover_all_resources
| |
|---|
| Short | Auto-discovered flag for resources. |
| Default | false |
agents[].execution_hints
Hints for the supervisor when routing.
agents[].execution_hints.parallel_safe
| |
|---|
| Short | Mark this agent safe to run in parallel. |
| Detailed | Hint to the supervisor that this agent has no side effects and can be dispatched concurrently with other agents in multi-agent flows. |
| Default | true |
agents[].tools
| |
|---|
| Short | Built-in tool names available to this agent. |
| Detailed | Must match keys in the top-level tools: section or the built-in tool registry. These are Python functions invoked directly in-process (not via MCP). |
| Default | [] |
agents[].skills
| |
|---|
| Short | Per-agent skill definitions. |
| Detailed | Multi-step workflows that this agent can execute. Each skill is a named sequence of tool calls or agent invocations. Unlike orchestrator-level skills (top-level skills:), these run within a single agent and do not involve the supervisor. |
| Default | {} |
agents:
support:
skills:
escalate:
description: "Escalate a ticket through the support hierarchy"
steps:
- tool: create_ticket
arguments:
priority: high
- agent: manager
instruction: "A high-priority ticket needs your attention"
agents[].guardrails
| |
|---|
| Short | Per-agent guardrail chains. |
| Detailed | Input and output guardrails that apply only to this agent. In addition to any global guardrails defined at the top level. |
| Default | {} (no guardrails) |
agents[].children
| |
|---|
| Short | Sub-agent configurations nested under this agent. |
| Detailed | Creates a hierarchical agent tree with one level of nesting. The parent agent can route to its children. Children inherit defaults from the parent and can define their own overrides. The graph builder, MCP inventory, and auth registry only handle agents[].children[] — deeper nesting is not supported. Mini-agents are forbidden on child agents. |
| Default | null |
agents:
support:
description: "Top-level support router"
children:
billing:
description: "Handles billing questions"
prompt: "You are a billing specialist..."
technical:
description: "Handles technical issues"
prompt: "You are a technical support engineer..."
agents[].mini_agent
Opt-in mini-agent (self-clone) configuration. Only allowed on top-level agents.
agents[].mini_agent.enabled
| |
|---|
| Short | Enable mini-agent decomposition. |
| Detailed | When enabled, complex requests are decomposed into independent sub-tasks, each handled by a cloned mini-agent running in parallel. The results are then aggregated into a final response. Adds one extra LLM call (decomposer) per turn. |
| Default | false |
| Available values | true, false |
⚠Cost vs speed trade-off
Enable when a single complex user request can be decomposed into independent sub-tasks that do not share state. The decomposer adds one extra LLM call per turn; only opt in when the parallelism speedup outweighs that cost. Nesting is not supported — only top-level agents can enable mini-agents.
agents[].mini_agent.max_count
| |
|---|
| Short | Maximum number of parallel mini-agents. |
| Detailed | The decomposer produces at most this many sub-tasks. Range is enforced by Pydantic validation. |
| Default | 3 |
| Available values | 2 to 8 |
agents[].mini_agent.decomposer_model
| |
|---|
| Short | LLM model for the decomposer step. |
| Detailed | A cheap, fast model is recommended. Falls back to the parent agent's llm.model when not set. |
| Default | null |
agents[].mini_agent.timeout_seconds
| |
|---|
| Short | Hard timeout per mini-agent. |
| Detailed | If a mini-agent does not complete within this time, it is cancelled and its result is omitted from aggregation. |
| Default | 60 |
| Available values | 5 to 600 |
agents[].mini_agent.tool_allowlist_mode
| |
|---|
| Short | Tool exposure mode for mini-agents. |
| Detailed | Controls which tools cloned mini-agents may access. |
| Default | strict |
| Available values | strict, parent_full, inferred |
strict — Every tool name in allowed_tools must exist in the parent's inventory. Fails validation if a name is unknown.
parent_full — Ignores allowed_tools entirely. Mini-agents get access to the parent's full tool set.
inferred — strict behaviour, but an empty allowed_tools falls back to the parent's full inventory with a warning.
agents[].mini_agent.stream_inner_tokens
| |
|---|
| Short | Stream mini-agent tokens to SSE. |
| Detailed | When enabled, tokens generated by inner mini-agents propagate to the SSE stream. When disabled, only lifecycle events (start, complete, error) surface. |
| Default | false |
agents[].mini_agent.decomposer_prompt
| |
|---|
| Short | Custom decomposer prompt. |
| Detailed | Replaces the built-in template that instructs the LLM how to break a query into sub-tasks. |
| Default | null |
agents[].mini_agent.aggregator_prompt
| |
|---|
| Short | Custom aggregator prompt. |
| Detailed | Replaces the built-in template that instructs the LLM how to combine mini-agent results into a final response. |
| Default | null |
agents[].mini_agent.system_prompt_template
| |
|---|
| Short | Template for each mini-agent's system prompt. |
| Detailed | Supports placeholders: {parent_prompt} (the parent agent's prompt), {instruction} (the decomposed sub-task), {tool_list} (available tools). |
| Default | null |
agents:
research:
mini_agent:
enabled: true
max_count: 5
timeout_seconds: 45
tool_allowlist_mode: parent_full
stream_inner_tokens: true
agents[].prompt_sections
Customisable templates for the agentic-loop system prompt assembly.
agents[].prompt_sections.prior_results_header
| |
|---|
| Short | Header for prior-turn tool results. |
| Default | `" |
| --- Previous Tool Results (from prior turns) ---"` | |
agents[].prompt_sections.mcp_prompt_template
| |
|---|
| Short | Template for rendered MCP prompts. |
| Detailed | Placeholders: {name}, {text}. |
| Default | `" |
| --- MCP Prompt: {name} --- | |
| {text}"` | |
agents[].prompt_sections.skipped_prompt_template
| |
|---|
| Short | Template for MCP prompts requiring arguments. |
| Detailed | Shown when a prompt requires arguments that were not provided. Placeholders: {name}, {description}, {required_args}. |
| Default | `" |
| [Available prompt: {name}] {description} (requires: {required_args})"` | |
agents[].prompt_sections.resources_header
| |
|---|
| Short | Header for MCP resources block. |
| Default | `" |
| --- Available Resources ---"` | |
agents[].prompt_sections.resource_template
| |
|---|
| Short | Template for each MCP resource. |
| Detailed | Placeholders: {name}, {content}. |
| Default | `" |
| [{name}] | |
| {content}"` | |
agents[].prompt_sections.rag_header
| |
|---|
| Short | Header for RAG context block. |
| Default | `" |
| --- Background Knowledge (RAG) ---"` | |
agents[].prompt_sections.prior_results_max_chars
| |
|---|
| Short | Character cap on prior tool-results JSON. |
| Default | 4000 |
agents[].prompt_sections.resource_max_chars
| |
|---|
| Short | Character cap per MCP resource body. |
| Default | 2000 |
agents[].prompt_sections.summarise_history_reminder
| |
|---|
| Short | Reminder block for summarise prompt when history is present. |
| Default | Built-in reminder text |
agents[].prompt_sections.summarise_prior_results_header
| |
|---|
| Short | Header for prior results in summarise prompt. |
| Default | `" |
--- Previous Tool Results (from prior turns) ---
"` |
agents[].prompt_sections.summarise_rag_section_header
| |
|---|
| Short | Header for RAG block in summarise user message. |
| Default | `"Background knowledge (from RAG): |
| "` | |
agents[].prompt_sections.summarise_user_template
| |
|---|
| Short | User-content template for summarise call. |
| Detailed | Placeholders: {query}, {rag_section}, {mcp_data}. |
| Default | `"User query: {query} |
{rag_section}Live data (from API):
{mcp_data}"` |
agents[].prompt_sections.summarise_prior_results_max_chars
| |
|---|
| Short | Max characters of prior results in summarise prompt. |
| Default | 4000 |
tools[]
Global built-in tool declarations. Each key is a tool name; the value defines its handler and metadata.
tools[].handler
| |
|---|
| Short | Dotted import path to the Python function. |
| Detailed | The function implementing this tool. Auto-extracted signature is used for parameter schema unless parameters is explicitly defined. Sync handlers are automatically wrapped with asyncio.to_thread. |
| Default | Required |
tools[].description
| |
|---|
| Short | Tool description for LLM invocation. |
| Detailed | Shown to the LLM in the tool schema. Be specific about what the tool does, what inputs it expects, and what it returns. |
| Default | "" |
tools[].parameters
| |
|---|
| Short | Parameter declarations. |
| Detailed | When omitted, parameters are auto-extracted from the Python function signature. Framework-injected params (query, context, auth_context, **kwargs) are filtered out. YAML declarations take precedence over auto-extraction. |
| Default | {} |
tools[].parameters[].type
| |
|---|
| Short | Parameter type. |
| Default | string |
| Available values | string, int, float, bool |
tools[].parameters[].description
| |
|---|
| Short | Parameter description. |
| Default | "" |
tools[].parameters[].required
| |
|---|
| Short | Whether the parameter is required. |
| Default | true |
tools[].parameters[].default
| |
|---|
| Short | Default value when not provided. |
| Default | null |
tools[].inject_to_rag
| |
|---|
| Short | Store this tool's results in RAG. |
| Detailed | When enabled, the tool's output is embedded and stored in the agent's RAG namespace. Subsequent queries can retrieve this output as context. |
| Default | false |
tools[].rag_ttl
| |
|---|
| Short | Cache TTL for RAG-stored tool results. |
| Detailed | How long (in seconds) the tool's RAG-injected output remains retrievable. null uses the agent's default rag_ttl. |
| Default | null |
tools[].requires_approval
| |
|---|
| Short | Require human approval before execution. |
| Detailed | When enabled, the tool call is paused and a HITL (human-in-the-loop) request is sent to the frontend. The user must approve before the tool executes. |
| Default | false |
tools[].parallel_safe
| |
|---|
| Short | Declare the tool safe for parallel dispatch. |
| Detailed | For built-in tools, null resolves to false (sequential). Set true for pure read-only, side-effect-free handlers. Only consulted when the agent has parallel_tools: true. |
| Default | null |
tools:
format_date:
handler: myapp.tools.dates.format_date
description: "Format a date string into a human-readable form"
parameters:
date_str:
type: string
description: "ISO 8601 date string"
required: true
format:
type: string
description: "Output format (e.g. 'long', 'short')"
required: false
default: long
inject_to_rag: false
requires_approval: false
parallel_safe: true
skills[]
Orchestrator-level (cross-agent) skill definitions. These are multi-step workflows that can invoke multiple agents in sequence.
skills[].description
| |
|---|
| Short | Human-readable skill purpose. |
| Default | "" |
skills[].steps
| |
|---|
| Short | Ordered list of agent invocations. |
| Detailed | Each step names an agent and provides an instruction. The supervisor routes each step through the named agent, passing the instruction as the query. |
| Default | Required |
skills[].steps[].agent
| |
|---|
| Short | Agent name to invoke. |
| Default | Required |
skills[].steps[].instruction
| |
|---|
| Short | Hint passed to the agent. |
| Detailed | The instruction is sent to the agent as the user query for that step. Can reference prior step results via template variables in future versions. |
| Default | "" |
skills:
onboarding:
description: "Walk a new user through account setup"
steps:
- agent: greeter
instruction: "Welcome the user and explain what we do"
- agent: account_setup
instruction: "Guide the user through creating their profile"
- agent: preferences
instruction: "Ask about notification and privacy preferences"
guardrails
Global guardrail chains applied to every request. Per-agent guardrails can augment or override these.
guardrails.input
| |
|---|
| Short | Input guardrail rules. |
| Detailed | Applied to the user's raw query before any agent processes it. Chains are evaluated in order; the first failing rule triggers its fail_action. |
| Default | [] |
guardrails.output
| |
|---|
| Short | Output guardrail rules. |
| Detailed | Applied to agent responses before delivery to the user. |
| Default | [] |
guardrails.input[].type / guardrails.output[].type
| |
|---|
| Short | Guardrail type name. |
| Detailed | Must match a registered guardrail implementation. Built-ins include content_safety, pii_detection, prompt_injection, max_length, topic_restriction. |
| Default | Required |
guardrails.input[].fail_action / guardrails.output[].fail_action
| |
|---|
| Short | Action on guardrail failure. |
| Default | block |
| Available values | block, warn, redact, log |
block — Reject the message entirely.
warn — Allow but append a warning.
redact — Mask sensitive content.
log — Record the violation but take no action.
guardrails.input[].config / guardrails.output[].config
| |
|---|
| Short | Guardrail constructor kwargs. |
| Detailed | Passed as keyword arguments to the guardrail class constructor. Schema depends on the guardrail type. |
| Default | {} |
guardrails:
input:
- type: content_safety
fail_action: block
config:
threshold: 0.8
- type: prompt_injection
fail_action: block
output:
- type: pii_detection
fail_action: redact
mcp_gateway
MCP gateway exposure configuration. Controls how Orchid exposes its own capabilities to upstream MCP hosts.
mcp_gateway.tools
| |
|---|
| Short | Tool title/description overrides. |
| Detailed | Map of canonical tool name -> override config. Used to customise how tools appear to upstream MCP hosts without changing the underlying tool implementation. |
| Default | {} |
mcp_gateway.tools[].title
| |
|---|
| Short | Override title for the tool. |
| Default | null (keeps gateway default) |
mcp_gateway.tools[].description
| |
|---|
| Short | Override description for the tool. |
| Default | null (keeps gateway default) |
mcp_gateway.prompts
| |
|---|
| Short | MCP prompt templates exposed by the gateway. |
| Detailed | Pre-canned prompts that upstream hosts can request. Each prompt has a handle, optional arguments, and a template body. |
| Default | [] |
mcp_gateway.prompts[].name
| |
|---|
| Short | Unique prompt handle. |
| Detailed | Must match ^[a-zA-Z_][a-zA-Z0-9_-]*$. Used by upstream hosts to reference the prompt. |
| Default | Required |
mcp_gateway.prompts[].title
| |
|---|
| Short | Display title. |
| Default | null |
mcp_gateway.prompts[].description
| |
|---|
| Short | Prompt description. |
| Default | null |
mcp_gateway.prompts[].arguments
| |
|---|
| Short | Arguments accepted by the prompt. |
| Default | [] |
mcp_gateway.prompts[].arguments[].name
| |
|---|
| Short | Argument name. |
| Detailed | Must match ^[a-zA-Z_][a-zA-Z0-9_-]*$. |
| Default | Required |
mcp_gateway.prompts[].arguments[].description
| |
|---|
| Short | Argument description. |
| Default | null |
mcp_gateway.prompts[].arguments[].required
| |
|---|
| Short | Whether the argument is required. |
| Default | false |
mcp_gateway.prompts[].template
| |
|---|
| Short | Prompt body template. |
| Detailed | Uses {{arg_name}} syntax for argument substitution. Rendered at request time with the provided arguments. |
| Default | Required |
mcp_gateway:
tools:
orchid_ask:
title: "Ask Orchid"
description: "Send a question to the Orchid multi-agent system"
prompts:
- name: summarise_thread
title: "Summarise Conversation"
description: "Produces a bullet-point summary of the current chat thread"
arguments:
- name: max_points
description: "Maximum bullet points"
required: false
template: |
Summarise the following conversation in at most {{max_points}} bullet points.
Focus on decisions made and action items.
events
Pollen + Bloom event-driven activation layer. null or absent = disabled (zero overhead).
events.enabled
| |
|---|
| Short | Master switch for the event layer. |
| Detailed | When false, no producers, processors, queues, or schedulers are started. The event system has zero runtime cost when disabled. |
| Default | false |
| Available values | true, false |
⚠Zero-cost when disabled
When events.enabled is false (or the events key is absent), no background tasks, threads, or connections are created. There is absolutely no runtime overhead from the event system when it is not in use.
events.store
| |
|---|
| Short | Event storage backend. |
| Detailed | Required when enabled: true. Stores event state, trigger history, and schedule metadata. |
| Default | null |
events.store.class
| |
|---|
| Short | Dotted import path for the event store. |
| Default | Required when enabled: true |
events.store.extra_args
| |
|---|
| Short | Additional constructor kwargs. |
| Default | {} |
events.queue
| |
|---|
| Short | Signal queue backend configuration. |
| Detailed | Required when enabled: true. Buffers signals between producers and processors. |
| Default | null |
events.queue.class
| |
|---|
| Short | Dotted import path for the queue backend. |
| Default | Required when enabled: true |
events.queue.notify_enabled
| |
|---|
| Short | Enable queue notifications. |
| Default | true |
events.queue.poll_interval_ms
| |
|---|
| Short | Poll interval in milliseconds. |
| Default | 200 |
| Minimum | 10 |
events.queue.lease_seconds
| |
|---|
| Short | Message lease duration. |
| Detailed | How long a processor has exclusive access to a message before it becomes available for re-processing. |
| Default | 30 |
| Minimum | 1 |
events.queue.max_attempts
| |
|---|
| Short | Maximum processing attempts. |
| Default | 5 |
| Minimum | 1 |
events.queue.dead_letter_table
| |
|---|
| Short | Dead letter table name. |
| Detailed | Messages that exceed max_attempts are moved to this table for later inspection. |
| Default | signal_queue_dead_letter |
events.scheduler
| |
|---|
| Short | Scheduler backend for cron-based triggers. |
| Detailed | Optional. Required only if you use cron schedules. Typically an APScheduler wrapper. |
| Default | null |
events.scheduler.class
| |
|---|
| Short | Dotted import path for the scheduler. |
| Default | Required when schedules are used |
events.scheduler.extra_args
| |
|---|
| Short | Additional constructor kwargs. |
| Default | {} |
events.producers
| |
|---|
| Short | Signal producer configurations. |
| Detailed | Producers emit signals into the queue. Each producer runs independently and may poll external systems (webhooks, message buses, file watchers). |
| Default | [] |
events.producers[].class
| |
|---|
| Short | Dotted import path for the producer. |
| Default | Required |
events.producers[].extra_args
| |
|---|
| Short | Additional constructor kwargs. |
| Default | {} |
events.processors
| |
|---|
| Short | Signal processor configurations. |
| Detailed | Required when enabled: true. Workers that consume signals from the queue and execute triggers. |
| Default | [] |
events.processors[].class
| |
|---|
| Short | Dotted import path for the processor. |
| Default | Required |
events.processors[].concurrency
| |
|---|
| Short | Worker concurrency. |
| Default | 4 |
| Minimum | 1 |
events.processors[].poll_interval_ms
| |
|---|
| Short | Processor poll interval. |
| Default | 200 |
| Minimum | 10 |
events.processors[].lease_seconds
| |
|---|
| Short | Message lease duration. |
| Default | 30 |
| Minimum | 1 |
events.processors[].max_attempts
| |
|---|
| Short | Maximum processing attempts. |
| Default | 5 |
| Minimum | 1 |
events.processors[].drain_timeout_seconds
| |
|---|
| Short | Drain timeout on shutdown. |
| Detailed | How long to wait for in-flight messages to complete before force-stopping. |
| Default | 10.0 |
| Minimum | > 0 |
events.middleware
| |
|---|
| Short | Processing middleware. |
| Detailed | Applied to signals before they reach processors. Can transform, filter, or enrich signals. |
| Default | [] |
events.middleware[].class
| |
|---|
| Short | Dotted import path for the middleware. |
| Default | Required |
events.middleware[].extra_args
| |
|---|
| Short | Additional constructor kwargs. |
| Default | {} |
events.ingestion
| |
|---|
| Short | Webhook source registry. |
| Detailed | Defines valid inbound webhook sources with validation rules. |
| Default | {} |
events.ingestion.sources
| |
|---|
| Short | Registered webhook sources. |
| Default | [] |
events.ingestion.sources[].id
| |
|---|
| Short | Unique source identifier. |
| Default | Required |
events.ingestion.sources[].validator
| |
|---|
| Short | Validator configuration. |
| Detailed | Validates incoming webhook signatures (HMAC, bearer token, mTLS). |
| Default | Required |
####### events.ingestion.sources[].validator.class
| |
|---|
| Short | Dotted import path for the validator. |
| Default | Required |
####### events.ingestion.sources[].validator.secret_ref
| |
|---|
| Short | Secret reference. |
| Detailed | e.g. An HMAC key name or certificate thumbprint. |
| Default | null |
####### events.ingestion.sources[].validator.extra_args
| |
|---|
| Short | Additional constructor kwargs. |
| Default | {} |
events.ingestion.sources[].allowed_types
| |
|---|
| Short | Signal types accepted from this source. |
| Detailed | Empty list means all types are accepted. |
| Default | [] |
events.schedules
| |
|---|
| Short | Cron/interval schedule definitions. |
| Default | [] |
events.schedules[].id
| |
|---|
| Short | Unique schedule identifier. |
| Default | Required |
events.schedules[].cron
| |
|---|
| Short | Cron expression. |
| Detailed | Standard 5-field cron: min hour day month dow. Mutually exclusive with interval_seconds. |
| Default | null |
events.schedules[].interval_seconds
| |
|---|
| Short | Interval between runs in seconds. |
| Detailed | Mutually exclusive with cron. Must be > 0. |
| Default | null |
| Minimum | > 0 |
events.schedules[].trigger_id
| |
|---|
| Short | Target trigger ID. |
| Detailed | Must reference a trigger defined in events.triggers with signal: cron. |
| Default | Required |
events.schedules[].identity
| |
|---|
| Short | Identity claim for scheduled runs. |
| Detailed | Discriminated union on mode: service_account, addressed_to_user, or act_as_user. |
| Default | Required |
events.schedules[].enabled
| |
|---|
| Short | Whether this schedule is active. |
| Default | true |
events.triggers
| |
|---|
| Short | Trigger definitions. |
| Detailed | Map signals to agent activations. Each trigger has match conditions, emission configuration, and a retry policy. |
| Default | [] |
events.triggers[].id
| |
|---|
| Short | Unique trigger identifier. |
| Default | Required |
events.triggers[].on
| |
|---|
| Short | Match conditions. |
| Default | Required |
events.triggers[].on.signal
| |
|---|
| Short | Signal name to match. |
| Detailed | "cron" is reserved for time-driven triggers fired by schedules. |
| Default | Required |
events.triggers[].on.cron
| |
|---|
| Short | Cron expression for time-driven triggers. |
| Detailed | Required when signal == "cron". Rejected for non-cron signals. |
| Default | null |
events.triggers[].on.when
| |
|---|
| Short | JMESPath boolean expression. |
| Detailed | Evaluated against the signal envelope. Only matches when the expression returns true. |
| Default | null |
events.triggers[].emits
| |
|---|
| Short | Emission configuration. |
| Default | Required |
events.triggers[].emits.agent
| |
|---|
| Short | Agent to activate. |
| Default | Required |
events.triggers[].emits.prompt_template
| |
|---|
| Short | Prompt template for the agent. |
| Detailed | Sent to the agent as the user query when the trigger fires. Can use template variables from the signal envelope. |
| Default | Required |
events.triggers[].emits.identity
| |
|---|
| Short | Identity claim. |
| Detailed | Determines who the trigger runs as. service_account runs as a system identity. addressed_to_user and act_as_user resolve a real user from the signal envelope. |
| Default | Required |
events.triggers[].emits.respect_chat_binding
| |
|---|
| Short | Respect chat binding from signal. |
| Detailed | When true, the trigger's output is appended to the chat specified in the signal envelope. Requires a non-service-account identity. |
| Default | false |
events.triggers[].emits.proactive_chat
| |
|---|
| Short | Create a new chat for the user. |
| Detailed | When true, a new chat session is created for the resolved user. Requires a non-service-account identity. |
| Default | false |
events.triggers[].emits.visibility
| |
|---|
| Short | Visibility override. |
| Detailed | actor = visible to the triggering user only. addressed = visible to addressed users. tenant = visible to all users in the tenant. admin = admin-only. null = computed from identity mode. |
| Default | null |
events.triggers[].retry
| |
|---|
| Short | Retry policy. |
| Default | {} |
events.triggers[].retry.max
| |
|---|
| Short | Maximum retry attempts. |
| Default | 0 |
| Minimum | 0 |
events.triggers[].retry.backoff
| |
|---|
| Short | Backoff strategy. |
| Default | exponential |
| Available values | fixed, linear, exponential |
events.triggers[].retry.jitter
| |
|---|
| Short | Add jitter to backoff. |
| Default | true |
events.triggers[].retry.initial_delay_seconds
| |
|---|
| Short | Initial delay before first retry. |
| Default | 1.0 |
| Minimum | > 0 |
events.triggers[].retry.max_delay_seconds
| |
|---|
| Short | Maximum delay between retries. |
| Default | 300.0 |
| Minimum | > 0, must be >= initial_delay_seconds |
events.triggers[].parallelism
| |
|---|
| Short | Concurrency scope. |
| Detailed | Controls how many concurrent executions of this trigger are allowed. |
| Default | per_user |
| Available values | per_user, per_tenant, unbounded |
per_user — One concurrent execution per user.
per_tenant — One concurrent execution per tenant.
unbounded — No concurrency limit.
events:
enabled: true
store:
class: orchid_ai.events.stores.sqlite.SQLiteEventStore
queue:
class: orchid_ai.events.queues.sqlite.SQLiteSignalQueue
processors:
- class: orchid_ai.events.processors.default.DefaultProcessor
concurrency: 4
schedules:
- id: daily_digest
cron: "0 7 * * 1-5"
trigger_id: morning_briefing
identity:
mode: service_account
name: scheduler
triggers:
- id: morning_briefing
on:
signal: cron
emits:
agent: digest
prompt_template: "Generate the morning briefing for {{user.name}}"
identity:
mode: addressed_to_user
service_account: scheduler
user_id_from: signal.user_id
retry:
max: 3
backoff: exponential
Load Modes Summary
| Mode | Root File | Agent Configs | Detection |
|---|
| YAML | orchid.yml | agents.yaml | .yml or .yaml extension |
| MD | orchid.md | agents/*.md | .md extension |
| Hybrid | orchid.yml | agents/*.md | AGENTS_CONFIG_PATH points to a directory |
Hot-Reload (MD Only)
The on-demand config watcher detects file changes via SHA-256 hashing — no background threads, no fs-notify libraries.
OrchidConfigWatcher tracks orchid.md + agents/*.md by hash.
Orchid.reload_config() calls watcher.reload_if_changed() and rebuilds the graph.
- Graph rebuild is serialised via
asyncio.Lock — existing requests complete with the old config.
- The API middleware polls at most every
ORCHID_RELOAD_INTERVAL seconds (default 30, set to 0 to disable).
# Enable hot-reload with 10-second polling:
ORCHID_RELOAD_INTERVAL=10