--- sidebar_position: 11 title: "Plugin LLM Access" description: "Run any LLM call from inside a plugin via ctx.llm — chat or structured, sync or async. Host-owned auth, fail-closed trust gate, optional JSON Schema validation." --- # Plugin LLM Access `ctx.llm` is the supported way for a plugin to make an LLM call. Chat completion, structured extraction, sync, async, with or without images — same surface, same trust gate, same host-owned credentials. Plugins reach for this when they need to do something that involves the model but isn't part of the agent's conversation. A hook that rewrites a tool error into something a non-engineer can read. A gateway adapter that translates an inbound message before queuing it. A slash command that summarises a long paste. A scheduled job that scores yesterday's activity and writes one line to a status board. A pre-filter that decides whether a message is worth waking the agent up for at all. These are jobs the agent shouldn't be in the loop on. They want one LLM call, a typed answer, and to be done. ## The smallest possible call ```python result = ctx.llm.complete(messages=[{"role": "user", "content": "ping"}]) return result.text ``` That's the whole API in one line. No keys, no provider config, no SDK initialisation. The plugin runs against whatever provider and model the user is currently using — when they switch providers, the plugin follows them automatically. ## A more complete chat example ```python result = ctx.llm.complete( messages=[ {"role": "system", "content": "Rewrite errors as one short sentence a non-engineer can act on."}, {"role": "user", "content": traceback_text}, ], max_tokens=64, purpose="hooks.error-rewrite", ) return result.text ``` `purpose` is a free-form audit string — it shows up in `agent.log` and in `result.audit` so operators can see which plugin made which call. Optional but recommended for anything that fires often. ## Structured output When the plugin needs a typed answer, switch to the structured lane: ```python result = ctx.llm.complete_structured( instructions="Score this support reply for urgency (0–1) and pick a category.", input=[{"type": "text", "text": message_body}], json_schema=TRIAGE_SCHEMA, purpose="support.triage", temperature=0.0, max_tokens=128, ) if result.parsed["urgency"] > 0.8: await dispatch_to_oncall(result.parsed["category"], message_body) ``` The host requests JSON output from the provider, parses it locally as a fallback, validates against your schema if `jsonschema` is installed, and hands back a Python object on `result.parsed`. If the model couldn't produce valid JSON, `result.parsed` is `None` and `result.text` carries the raw response. ## What this lane gives you * **One call, four shapes.** `complete()` for chat, `complete_structured()` for typed JSON, `acomplete()` and `acomplete_structured()` for asyncio. Same arguments, same result objects. * **Host-owned credentials.** OAuth tokens, refresh flows, the credential pool, per-task aux overrides — every credential concept Hermes already has applies. The plugin never sees a token; the host attributes the call back through `result.audit`. * **Bounded.** Single sync or async call. No streaming, no tool loops, no conversation state to manage. State the input, get the result, return. * **Fail-closed trust.** A plugin you've never configured cannot pick its own provider, model, agent, or stored credential. The default posture is "use what the user is using." Operators opt in to specific overrides, per plugin, in `config.yaml`. ## Quick start Two complete plugins below — one chat, one structured. Both ship inside a single `register(ctx)` function and need zero outside configuration to run against whatever model the user has active. ### Chat completion — `/tldr` ```python def register(ctx): ctx.register_command( name="tldr", handler=lambda raw: _tldr(ctx, raw), description="Summarise the supplied text in one paragraph.", args_hint="", ) def _tldr(ctx, raw_args: str) -> str: text = raw_args.strip() if not text: return "Usage: /tldr " result = ctx.llm.complete( messages=[ {"role": "system", "content": "Summarise the user's text in one tight paragraph. No preamble."}, {"role": "user", "content": text}, ], max_tokens=256, temperature=0.3, purpose="tldr", ) return result.text ``` `result.text` is the model's response; `result.usage` carries token counts; `result.provider` and `result.model` carry attribution. ### Structured extraction — `/paste-to-tasks` ```python def register(ctx): ctx.register_command( name="paste-to-tasks", handler=lambda raw: _paste_to_tasks(ctx, raw), description="Turn freeform meeting notes into structured tasks.", args_hint="", ) _TASKS_SCHEMA = { "type": "object", "properties": { "tasks": { "type": "array", "items": { "type": "object", "properties": { "owner": {"type": "string"}, "action": {"type": "string"}, "due": {"type": "string", "description": "ISO date or empty"}, }, "required": ["action"], }, }, }, "required": ["tasks"], } def _paste_to_tasks(ctx, raw_args: str) -> str: if not raw_args.strip(): return "Usage: /paste-to-tasks " result = ctx.llm.complete_structured( instructions=( "Extract concrete action items from these meeting notes. " "One task per actionable line. If no owner is named, leave 'owner' blank." ), input=[{"type": "text", "text": raw_args}], json_schema=_TASKS_SCHEMA, schema_name="meeting.tasks", purpose="paste-to-tasks", temperature=0.0, max_tokens=512, ) if result.parsed is None: return f"Couldn't parse a response. Raw output:\n{result.text}" lines = [f"- [{t.get('owner') or '?'}] {t['action']}" for t in result.parsed["tasks"]] return "\n".join(lines) or "(no tasks found)" ``` A third worked example, this time with image input, lives in the [`hermes-example-plugins`](https://github.com/NousResearch/hermes-example-plugins/tree/main/plugin-llm-example) repo (companion repo for reference plugins — not bundled with hermes-agent itself). For the async surface (`acomplete()` / `acomplete_structured()` with `asyncio.gather()`), see [`plugin-llm-async-example`](https://github.com/NousResearch/hermes-example-plugins/tree/main/plugin-llm-async-example) in the same repo. ## When to use which | You want… | Reach for | |---|---| | A free-form text response (translation, summary, rewrite, generation) | `complete()` | | A multi-turn prompt (system + few-shot examples + user) | `complete()` | | A typed dict back, validated against a schema | `complete_structured()` | | Image-or-text input with a typed dict back | `complete_structured()` | | The same call from async code (gateway adapters, async hooks) | `acomplete()` / `acomplete_structured()` | Everything else — provider selection, model resolution, auth, fallback, timeout, vision routing — is the same across all four. ## API surface `ctx.llm` is an instance of `agent.plugin_llm.PluginLlm`. ### `complete()` ```python result = ctx.llm.complete( messages=[{"role": "user", "content": "Hi"}], provider=None, # optional, gated — Hermes provider id (e.g. "openrouter") model=None, # optional, gated — whatever string that provider expects temperature=None, max_tokens=None, timeout=None, # seconds agent_id=None, # optional, gated profile=None, # optional, gated — explicit auth-profile name purpose="optional-audit-string", ) # → PluginLlmCompleteResult(text, provider, model, agent_id, usage, audit) ``` Plain chat completion. `messages` is the standard OpenAI shape — a list of `{"role": "...", "content": "..."}` dicts. Multi-turn prompts (system + few-shot user/assistant pairs + final user) work exactly as they would with the OpenAI SDK. `provider=` and `model=` are independent and follow the same shape as the host's main config (`model.provider` + `model.model`). Set just `model=` to use the user's active provider with a different model on it. Set both to switch providers entirely. Either argument without operator opt-in raises `PluginLlmTrustError`. ### `complete_structured()` ```python result = ctx.llm.complete_structured( instructions="What you want extracted.", input=[ {"type": "text", "text": "..."}, {"type": "image", "data": b"...", "mime_type": "image/png"}, {"type": "image", "url": "https://..."}, ], json_schema={...}, # optional — triggers parsed result + validation json_mode=False, # set True without a schema to ask for JSON anyway schema_name=None, # optional human-readable schema name system_prompt=None, provider=None, # optional, gated model=None, # optional, gated temperature=None, max_tokens=None, timeout=None, agent_id=None, profile=None, purpose=None, ) # → PluginLlmStructuredResult(text, provider, model, agent_id, # usage, parsed, content_type, audit) ``` Inputs are typed text or image blocks (raw bytes get base64 encoded as a `data:` URL automatically). When `json_schema` or `json_mode=True` is supplied, the host requests JSON output via `response_format`, parses it locally as a fallback, and validates against your schema if `jsonschema` is installed. * `result.content_type == "json"` — `result.parsed` is a Python object that matches your schema. * `result.content_type == "text"` — parsing or validation failed; inspect `result.text` for the raw model response. ### Async ```python result = await ctx.llm.acomplete(messages=...) result = await ctx.llm.acomplete_structured(instructions=..., input=...) ``` Same arguments and result types as their sync counterparts. Use these from gateway adapters, async hooks, or any plugin code already running on an asyncio loop. ### Result attributes ```python @dataclass class PluginLlmCompleteResult: text: str # the assistant's response provider: str # e.g. "openrouter", "anthropic" model: str # whatever the provider returned for this call agent_id: str # whose model/auth was used usage: PluginLlmUsage # tokens + cache + cost estimate audit: Dict[str, Any] # plugin_id, purpose, profile @dataclass class PluginLlmStructuredResult(PluginLlmCompleteResult): parsed: Optional[Any] # JSON object when content_type == "json" content_type: str # "json" or "text" # audit also carries schema_name when supplied ``` `usage` carries `input_tokens`, `output_tokens`, `total_tokens`, `cache_read_tokens`, `cache_write_tokens`, and `cost_usd` when the provider returns those fields. ## Trust gate The default behaviour is fail-closed. With no `plugins.entries` config block, a plugin can: * run any of the four methods against the user's active provider and model, * set request-shaping arguments (`temperature`, `max_tokens`, `timeout`, `system_prompt`, `purpose`, `messages`, `instructions`, `input`, `json_schema`), …and that's it. `provider=`, `model=`, `agent_id=`, and `profile=` arguments raise `PluginLlmTrustError` until the operator opts in. **Most plugins never need this section.** A plugin that just calls `ctx.llm.complete(messages=...)` with no overrides runs against whatever the user has active and works zero-config. The block below is only relevant when a plugin specifically wants to pin to a different model or provider than the user. ```yaml plugins: entries: my-plugin: llm: # Allow this plugin to choose a different Hermes provider # (must be one Hermes already knows about — same names as # `hermes model` and config.yaml model.provider). allow_provider_override: true # Optionally restrict which providers. Use ["*"] for any. allowed_providers: - openrouter - anthropic # Allow this plugin to ask for a specific model. allow_model_override: true # Optionally restrict which models. Use ["*"] for any. # Models are matched literally against whatever string the # plugin sends — Hermes does not look anything up. allowed_models: - openai/gpt-4o-mini - anthropic/claude-3-5-haiku # Allow cross-agent calls (rare). allow_agent_id_override: false # Allow the plugin to request a specific stored auth profile # (e.g. a different OAuth account on the same provider). allow_profile_override: false ``` The plugin id is the manifest `name:` field for flat plugins, or the path-derived key for nested plugins (`image_gen/openai`, `memory/honcho`, etc.). ### What the gate enforces | Override | Default | Config key | | --------------- | ------- | -------------------------------- | | `provider=` | denied | `allow_provider_override: true` | | ↳ allowlist | — | `allowed_providers: [...]` | | `model=` | denied | `allow_model_override: true` | | ↳ allowlist | — | `allowed_models: [...]` | | `agent_id=` | denied | `allow_agent_id_override: true` | | `profile=` | denied | `allow_profile_override: true` | Each override is independently gated. Granting `allow_model_override` does **not** also grant `allow_provider_override` — a plugin trusted to pick a model is still pinned to the user's active provider unless it gets the provider gate as well. ### What the gate does NOT need to enforce * Request-shaping arguments — `temperature`, `max_tokens`, `timeout`, `system_prompt`, `purpose`, `messages`, `instructions`, `input`, `json_schema`, `schema_name`, `json_mode` — are always allowed; they don't pick credentials or routes. * The default deny posture means an unconfigured plugin can still do useful work — it just runs against the active provider and model. Operators only need to think about `plugins.entries` for plugins that want finer routing. ## What the host owns A complete list of the things `ctx.llm` does for the plugin so you don't have to: * **Provider resolution.** Reads `model.provider` + `model.model` from the user's config (or the explicit overrides when trusted). * **Auth.** Pulls API keys, OAuth tokens, or refresh tokens from `~/.hermes/auth.json` / env, including the credential pool when one is configured. The plugin never sees them. * **Vision routing.** When image input is supplied and the user's active text model is text-only, the host falls back to the configured vision model automatically. * **Fallback chain.** If the user's primary provider 5xxs or 429s, the request goes through Hermes' usual aggregator-aware fallback before it returns an error to the plugin. * **Timeout.** Honours your `timeout=` argument, falling back to `auxiliary..timeout` config or the global aux default. * **JSON shaping.** Sends `response_format` to the provider when you ask for JSON, then re-parses locally from a code-fenced response if the provider returned one. * **Schema validation.** Validates against your `json_schema` when `jsonschema` is installed; logs a debug line and skips strict validation otherwise. * **Audit log.** Each call writes one INFO line to `agent.log` with the plugin id, provider/model, purpose, and token totals. ## What the plugin owns * **Request shape.** `messages` for chat, `instructions` + `input` for structured. The plugin builds the prompt; the host runs it. * **Schema.** Whatever shape you want back. The host doesn't infer it for you. * **Error handling.** `complete_structured()` raises `ValueError` on empty inputs and on schema-validation failure. `PluginLlmTrustError` fires when the trust gate denies an override. Anything else (provider 5xx, no credentials configured, timeout) raises whatever `auxiliary_client.call_llm()` raises. * **Cost.** Every call runs against the user's paid provider. Don't loop on `complete()` for every gateway message without thinking about token spend. ## Where this fits in the plugin surface Existing `ctx.*` methods extend an existing Hermes subsystem: | `ctx.register_tool` | adds a tool the agent can call | | `ctx.register_platform` | wires a new gateway adapter | | `ctx.register_image_gen_provider` | replaces an image-gen backend | | `ctx.register_memory_provider` | replaces the memory backend | | `ctx.register_context_engine` | replaces the context compressor | | `ctx.register_hook` | observes a lifecycle event | `ctx.llm` is the first surface that lets a plugin run the same model the user is talking to, *out of band*, without any of the above. That's its only job. If your plugin needs to register a tool the agent invokes, use `register_tool`. If it needs to react to a lifecycle event, use `register_hook`. If it needs to make its own model call — for any reason, structured or not — `ctx.llm`. ## Reference * Implementation: [`agent/plugin_llm.py`](https://github.com/NousResearch/hermes-agent/blob/main/agent/plugin_llm.py) * Tests: [`tests/agent/test_plugin_llm.py`](https://github.com/NousResearch/hermes-agent/blob/main/tests/agent/test_plugin_llm.py) * Reference plugins (companion repo): * [`plugin-llm-example`](https://github.com/NousResearch/hermes-example-plugins/tree/main/plugin-llm-example) — sync structured extraction with image input * [`plugin-llm-async-example`](https://github.com/NousResearch/hermes-example-plugins/tree/main/plugin-llm-async-example) — async with `asyncio.gather()` * Auxiliary client (the engine under the hood): see [Provider Runtime](/docs/developer-guide/provider-runtime).