Multi-LLM Support

Provider agnosticism via LiteLLM: configure any model per agent, swap at runtime.

AnthropicOpenAIOllamaGoogle

Orchid is provider-agnostic. Agents, the supervisor, and the embedding pipeline each accept a LiteLLM-style model string — switch between Anthropic, OpenAI, Google, Ollama, Groq, or any other LiteLLM provider by editing a single line.

Setting the model per agent

Model strings follow LiteLLM's provider/model-name convention:

defaults:
llm:
  model: "gemini/gemini-2.5-flash"
  temperature: 0.2
  fallback_model: "ollama/llama3.2"

agents:
stats:
  llm:
    model: "openai/gpt-4o"
    temperature: 0.1
analysis:
  # inherits defaults.llm

The stats agent would live in agents/stats.md with its own frontmatter:

---
llm:
  model: "openai/gpt-4o"
  temperature: 0.1
---

Stats agent prompt body...

The analysis agent inherits defaults from the root config.

Each agent can use a different provider. The optional fallback_model is tried automatically on 503, rate-limit, or timeout errors.

Simple completions vs agentic tool-calling

Two LLM usage patterns appear inside the framework — each with a distinct interface:

Simple completions (summarisation, routing, history compression): use self.summarise() inherited from OrchidAgent. This delegates to the injected BaseChatModel via chat_model.ainvoke(messages). Custom agents should always use this path for synthesising responses.

summary = await self.summarise(
    query, mcp_data, rag_data,
    system_prompt=MY_PROMPT,
    conversation_history=history,
)

Agentic tool-calling loops (where you need tool_calls on the response object): use litellm directly with a lazy import inside the method. The full response object is required to inspect which tools the LLM chose to call. This is acceptable because tool-calling inherently depends on the OpenAI function-calling protocol.

async def _my_tool_loop(self, messages, tools):
    import litellm  # lazy — needed for tool_calls response shape
    response = await litellm.acompletion(
        model=self.model_id,
        messages=messages,
        tools=tools,
    )
    if response.choices[0].message.tool_calls:
        # handle tool calls...

Never import litellm at module level in consumer agents. Use self.summarise() or self._chat_model.ainvoke() for everything that doesn't require the raw tool_calls field.

build_chat_model factory

The graph builder creates LangChain BaseChatModel instances via build_chat_model(model_string) from orchid_ai/llm_factory.py. This returns a provider-specific integration class (e.g. ChatGoogleGenerativeAI, ChatOpenAI) when the provider package is installed, falling back to ChatLiteLLM as a universal adapter.

External reading