mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-30 06:41:51 +00:00
The locale switcher appeared broken because hardcoded markdown links (`](/docs/X)`) got double-prefixed by Docusaurus to `/docs/<locale>/docs/X` (404) in non-English locales, and the MDX hero `<a href>` on the index page escaped locale routing entirely. Changes: - Rewrite 922 `](/docs/X)` -> `](/X)` across 166 docs files (strip trailing .md too). Docusaurus prepends locale + baseUrl itself. - docs/index.md -> index.mdx; hero "Get Started" anchor -> Docusaurus <Link> so it stays inside the active locale. - Drop `ko` locale entirely from docusaurus.config.ts + delete i18n/ko/ (4 stale auto-translated kanban pages, <2% coverage, misleading). Verified `npm run build` succeeds for both en and zh-Hans; `build/zh-Hans/ index.html` has no /docs/zh-Hans/docs/... double-prefixed paths. PR2 will translate the 335 English docs into i18n/zh-Hans/.
465 lines
18 KiB
Markdown
465 lines
18 KiB
Markdown
---
|
||
sidebar_position: 11
|
||
title: "Plugin LLM Access"
|
||
description: "Run any LLM call from inside a plugin via ctx.llm — chat or structured, sync or async. Host-owned auth, fail-closed trust gate, optional JSON Schema validation."
|
||
---
|
||
|
||
# Plugin LLM Access
|
||
|
||
`ctx.llm` is the supported way for a plugin to make an LLM call.
|
||
Chat completion, structured extraction, sync, async, with or without
|
||
images — same surface, same trust gate, same host-owned credentials.
|
||
|
||
Plugins reach for this when they need to do something that involves
|
||
the model but isn't part of the agent's conversation. A hook that
|
||
rewrites a tool error into something a non-engineer can read. A
|
||
gateway adapter that translates an inbound message before queuing
|
||
it. A slash command that summarises a long paste. A scheduled job
|
||
that scores yesterday's activity and writes one line to a status
|
||
board. A pre-filter that decides whether a message is worth waking
|
||
the agent up for at all.
|
||
|
||
These are jobs the agent shouldn't be in the loop on. They want one
|
||
LLM call, a typed answer, and to be done.
|
||
|
||
## The smallest possible call
|
||
|
||
```python
|
||
result = ctx.llm.complete(messages=[{"role": "user", "content": "ping"}])
|
||
return result.text
|
||
```
|
||
|
||
That's the whole API in one line. No keys, no provider config, no
|
||
SDK initialisation. The plugin runs against whatever provider and
|
||
model the user is currently using — when they switch providers, the
|
||
plugin follows them automatically.
|
||
|
||
## A more complete chat example
|
||
|
||
```python
|
||
result = ctx.llm.complete(
|
||
messages=[
|
||
{"role": "system", "content": "Rewrite errors as one short sentence a non-engineer can act on."},
|
||
{"role": "user", "content": traceback_text},
|
||
],
|
||
max_tokens=64,
|
||
purpose="hooks.error-rewrite",
|
||
)
|
||
return result.text
|
||
```
|
||
|
||
`purpose` is a free-form audit string — it shows up in `agent.log`
|
||
and in `result.audit` so operators can see which plugin made which
|
||
call. Optional but recommended for anything that fires often.
|
||
|
||
## Structured output
|
||
|
||
When the plugin needs a typed answer, switch to the structured lane:
|
||
|
||
```python
|
||
result = ctx.llm.complete_structured(
|
||
instructions="Score this support reply for urgency (0–1) and pick a category.",
|
||
input=[{"type": "text", "text": message_body}],
|
||
json_schema=TRIAGE_SCHEMA,
|
||
purpose="support.triage",
|
||
temperature=0.0,
|
||
max_tokens=128,
|
||
)
|
||
|
||
if result.parsed["urgency"] > 0.8:
|
||
await dispatch_to_oncall(result.parsed["category"], message_body)
|
||
```
|
||
|
||
The host requests JSON output from the provider, parses it locally
|
||
as a fallback, validates against your schema if `jsonschema` is
|
||
installed, and hands back a Python object on `result.parsed`. If the
|
||
model couldn't produce valid JSON, `result.parsed` is `None` and
|
||
`result.text` carries the raw response.
|
||
|
||
## What this lane gives you
|
||
|
||
* **One call, four shapes.** `complete()` for chat,
|
||
`complete_structured()` for typed JSON, `acomplete()` and
|
||
`acomplete_structured()` for asyncio. Same arguments, same result
|
||
objects.
|
||
* **Host-owned credentials.** OAuth tokens, refresh flows, the
|
||
credential pool, per-task aux overrides — every credential
|
||
concept Hermes already has applies. The plugin never sees a
|
||
token; the host attributes the call back through `result.audit`.
|
||
* **Bounded.** Single sync or async call. No streaming, no tool
|
||
loops, no conversation state to manage. State the input, get the
|
||
result, return.
|
||
* **Fail-closed trust.** A plugin you've never configured cannot
|
||
pick its own provider, model, agent, or stored credential. The
|
||
default posture is "use what the user is using." Operators opt in
|
||
to specific overrides, per plugin, in `config.yaml`.
|
||
|
||
## Quick start
|
||
|
||
Two complete plugins below — one chat, one structured. Both ship
|
||
inside a single `register(ctx)` function and need zero outside
|
||
configuration to run against whatever model the user has active.
|
||
|
||
### Chat completion — `/tldr`
|
||
|
||
```python
|
||
def register(ctx):
|
||
ctx.register_command(
|
||
name="tldr",
|
||
handler=lambda raw: _tldr(ctx, raw),
|
||
description="Summarise the supplied text in one paragraph.",
|
||
args_hint="<text>",
|
||
)
|
||
|
||
|
||
def _tldr(ctx, raw_args: str) -> str:
|
||
text = raw_args.strip()
|
||
if not text:
|
||
return "Usage: /tldr <text to summarise>"
|
||
result = ctx.llm.complete(
|
||
messages=[
|
||
{"role": "system",
|
||
"content": "Summarise the user's text in one tight paragraph. No preamble."},
|
||
{"role": "user", "content": text},
|
||
],
|
||
max_tokens=256,
|
||
temperature=0.3,
|
||
purpose="tldr",
|
||
)
|
||
return result.text
|
||
```
|
||
|
||
`result.text` is the model's response; `result.usage` carries token
|
||
counts; `result.provider` and `result.model` carry attribution.
|
||
|
||
### Structured extraction — `/paste-to-tasks`
|
||
|
||
```python
|
||
def register(ctx):
|
||
ctx.register_command(
|
||
name="paste-to-tasks",
|
||
handler=lambda raw: _paste_to_tasks(ctx, raw),
|
||
description="Turn freeform meeting notes into structured tasks.",
|
||
args_hint="<text>",
|
||
)
|
||
|
||
|
||
_TASKS_SCHEMA = {
|
||
"type": "object",
|
||
"properties": {
|
||
"tasks": {
|
||
"type": "array",
|
||
"items": {
|
||
"type": "object",
|
||
"properties": {
|
||
"owner": {"type": "string"},
|
||
"action": {"type": "string"},
|
||
"due": {"type": "string", "description": "ISO date or empty"},
|
||
},
|
||
"required": ["action"],
|
||
},
|
||
},
|
||
},
|
||
"required": ["tasks"],
|
||
}
|
||
|
||
|
||
def _paste_to_tasks(ctx, raw_args: str) -> str:
|
||
if not raw_args.strip():
|
||
return "Usage: /paste-to-tasks <meeting notes>"
|
||
result = ctx.llm.complete_structured(
|
||
instructions=(
|
||
"Extract concrete action items from these meeting notes. "
|
||
"One task per actionable line. If no owner is named, leave 'owner' blank."
|
||
),
|
||
input=[{"type": "text", "text": raw_args}],
|
||
json_schema=_TASKS_SCHEMA,
|
||
schema_name="meeting.tasks",
|
||
purpose="paste-to-tasks",
|
||
temperature=0.0,
|
||
max_tokens=512,
|
||
)
|
||
if result.parsed is None:
|
||
return f"Couldn't parse a response. Raw output:\n{result.text}"
|
||
lines = [f"- [{t.get('owner') or '?'}] {t['action']}" for t in result.parsed["tasks"]]
|
||
return "\n".join(lines) or "(no tasks found)"
|
||
```
|
||
|
||
A third worked example, this time with image input, lives in the
|
||
[`hermes-example-plugins`](https://github.com/NousResearch/hermes-example-plugins/tree/main/plugin-llm-example)
|
||
repo (companion repo for reference plugins — not bundled with
|
||
hermes-agent itself). For the async surface (`acomplete()` /
|
||
`acomplete_structured()` with `asyncio.gather()`), see
|
||
[`plugin-llm-async-example`](https://github.com/NousResearch/hermes-example-plugins/tree/main/plugin-llm-async-example)
|
||
in the same repo.
|
||
|
||
## When to use which
|
||
|
||
| You want… | Reach for |
|
||
|---|---|
|
||
| A free-form text response (translation, summary, rewrite, generation) | `complete()` |
|
||
| A multi-turn prompt (system + few-shot examples + user) | `complete()` |
|
||
| A typed dict back, validated against a schema | `complete_structured()` |
|
||
| Image-or-text input with a typed dict back | `complete_structured()` |
|
||
| The same call from async code (gateway adapters, async hooks) | `acomplete()` / `acomplete_structured()` |
|
||
|
||
Everything else — provider selection, model resolution, auth, fallback,
|
||
timeout, vision routing — is the same across all four.
|
||
|
||
## API surface
|
||
|
||
`ctx.llm` is an instance of `agent.plugin_llm.PluginLlm`.
|
||
|
||
### `complete()`
|
||
|
||
```python
|
||
result = ctx.llm.complete(
|
||
messages=[{"role": "user", "content": "Hi"}],
|
||
provider=None, # optional, gated — Hermes provider id (e.g. "openrouter")
|
||
model=None, # optional, gated — whatever string that provider expects
|
||
temperature=None,
|
||
max_tokens=None,
|
||
timeout=None, # seconds
|
||
agent_id=None, # optional, gated
|
||
profile=None, # optional, gated — explicit auth-profile name
|
||
purpose="optional-audit-string",
|
||
)
|
||
# → PluginLlmCompleteResult(text, provider, model, agent_id, usage, audit)
|
||
```
|
||
|
||
Plain chat completion. `messages` is the standard OpenAI shape — a
|
||
list of `{"role": "...", "content": "..."}` dicts. Multi-turn
|
||
prompts (system + few-shot user/assistant pairs + final user) work
|
||
exactly as they would with the OpenAI SDK.
|
||
|
||
`provider=` and `model=` are independent and follow the same shape
|
||
as the host's main config (`model.provider` + `model.model`). Set
|
||
just `model=` to use the user's active provider with a different
|
||
model on it. Set both to switch providers entirely. Either argument
|
||
without operator opt-in raises `PluginLlmTrustError`.
|
||
|
||
### `complete_structured()`
|
||
|
||
```python
|
||
result = ctx.llm.complete_structured(
|
||
instructions="What you want extracted.",
|
||
input=[
|
||
{"type": "text", "text": "..."},
|
||
{"type": "image", "data": b"...", "mime_type": "image/png"},
|
||
{"type": "image", "url": "https://..."},
|
||
],
|
||
json_schema={...}, # optional — triggers parsed result + validation
|
||
json_mode=False, # set True without a schema to ask for JSON anyway
|
||
schema_name=None, # optional human-readable schema name
|
||
system_prompt=None,
|
||
provider=None, # optional, gated
|
||
model=None, # optional, gated
|
||
temperature=None,
|
||
max_tokens=None,
|
||
timeout=None,
|
||
agent_id=None,
|
||
profile=None,
|
||
purpose=None,
|
||
)
|
||
# → PluginLlmStructuredResult(text, provider, model, agent_id,
|
||
# usage, parsed, content_type, audit)
|
||
```
|
||
|
||
Inputs are typed text or image blocks (raw bytes get base64 encoded
|
||
as a `data:` URL automatically). When `json_schema` or
|
||
`json_mode=True` is supplied, the host requests JSON output via
|
||
`response_format`, parses it locally as a fallback, and validates
|
||
against your schema if `jsonschema` is installed.
|
||
|
||
* `result.content_type == "json"` — `result.parsed` is a Python
|
||
object that matches your schema.
|
||
* `result.content_type == "text"` — parsing or validation failed;
|
||
inspect `result.text` for the raw model response.
|
||
|
||
### Async
|
||
|
||
```python
|
||
result = await ctx.llm.acomplete(messages=...)
|
||
result = await ctx.llm.acomplete_structured(instructions=..., input=...)
|
||
```
|
||
|
||
Same arguments and result types as their sync counterparts. Use
|
||
these from gateway adapters, async hooks, or any plugin code
|
||
already running on an asyncio loop.
|
||
|
||
### Result attributes
|
||
|
||
```python
|
||
@dataclass
|
||
class PluginLlmCompleteResult:
|
||
text: str # the assistant's response
|
||
provider: str # e.g. "openrouter", "anthropic"
|
||
model: str # whatever the provider returned for this call
|
||
agent_id: str # whose model/auth was used
|
||
usage: PluginLlmUsage # tokens + cache + cost estimate
|
||
audit: Dict[str, Any] # plugin_id, purpose, profile
|
||
|
||
@dataclass
|
||
class PluginLlmStructuredResult(PluginLlmCompleteResult):
|
||
parsed: Optional[Any] # JSON object when content_type == "json"
|
||
content_type: str # "json" or "text"
|
||
# audit also carries schema_name when supplied
|
||
```
|
||
|
||
`usage` carries `input_tokens`, `output_tokens`, `total_tokens`,
|
||
`cache_read_tokens`, `cache_write_tokens`, and `cost_usd` when the
|
||
provider returns those fields.
|
||
|
||
## Trust gate
|
||
|
||
The default behaviour is fail-closed. With no `plugins.entries`
|
||
config block, a plugin can:
|
||
|
||
* run any of the four methods against the user's active provider
|
||
and model,
|
||
* set request-shaping arguments (`temperature`, `max_tokens`,
|
||
`timeout`, `system_prompt`, `purpose`, `messages`, `instructions`,
|
||
`input`, `json_schema`),
|
||
|
||
…and that's it. `provider=`, `model=`, `agent_id=`, and `profile=`
|
||
arguments raise `PluginLlmTrustError` until the operator opts in.
|
||
|
||
**Most plugins never need this section.** A plugin that just calls
|
||
`ctx.llm.complete(messages=...)` with no overrides runs against
|
||
whatever the user has active and works zero-config. The block below
|
||
is only relevant when a plugin specifically wants to pin to a
|
||
different model or provider than the user.
|
||
|
||
```yaml
|
||
plugins:
|
||
entries:
|
||
my-plugin:
|
||
llm:
|
||
# Allow this plugin to choose a different Hermes provider
|
||
# (must be one Hermes already knows about — same names as
|
||
# `hermes model` and config.yaml model.provider).
|
||
allow_provider_override: true
|
||
|
||
# Optionally restrict which providers. Use ["*"] for any.
|
||
allowed_providers:
|
||
- openrouter
|
||
- anthropic
|
||
|
||
# Allow this plugin to ask for a specific model.
|
||
allow_model_override: true
|
||
|
||
# Optionally restrict which models. Use ["*"] for any.
|
||
# Models are matched literally against whatever string the
|
||
# plugin sends — Hermes does not look anything up.
|
||
allowed_models:
|
||
- openai/gpt-4o-mini
|
||
- anthropic/claude-3-5-haiku
|
||
|
||
# Allow cross-agent calls (rare).
|
||
allow_agent_id_override: false
|
||
|
||
# Allow the plugin to request a specific stored auth profile
|
||
# (e.g. a different OAuth account on the same provider).
|
||
allow_profile_override: false
|
||
```
|
||
|
||
The plugin id is the manifest `name:` field for flat plugins, or the
|
||
path-derived key for nested plugins (`image_gen/openai`,
|
||
`memory/honcho`, etc.).
|
||
|
||
### What the gate enforces
|
||
|
||
| Override | Default | Config key |
|
||
| --------------- | ------- | -------------------------------- |
|
||
| `provider=` | denied | `allow_provider_override: true` |
|
||
| ↳ allowlist | — | `allowed_providers: [...]` |
|
||
| `model=` | denied | `allow_model_override: true` |
|
||
| ↳ allowlist | — | `allowed_models: [...]` |
|
||
| `agent_id=` | denied | `allow_agent_id_override: true` |
|
||
| `profile=` | denied | `allow_profile_override: true` |
|
||
|
||
Each override is independently gated. Granting `allow_model_override`
|
||
does **not** also grant `allow_provider_override` — a plugin trusted
|
||
to pick a model is still pinned to the user's active provider unless
|
||
it gets the provider gate as well.
|
||
|
||
### What the gate does NOT need to enforce
|
||
|
||
* Request-shaping arguments — `temperature`, `max_tokens`,
|
||
`timeout`, `system_prompt`, `purpose`, `messages`, `instructions`,
|
||
`input`, `json_schema`, `schema_name`, `json_mode` — are always
|
||
allowed; they don't pick credentials or routes.
|
||
* The default deny posture means an unconfigured plugin can still do
|
||
useful work — it just runs against the active provider and model.
|
||
Operators only need to think about `plugins.entries` for plugins
|
||
that want finer routing.
|
||
|
||
## What the host owns
|
||
|
||
A complete list of the things `ctx.llm` does for the plugin so you
|
||
don't have to:
|
||
|
||
* **Provider resolution.** Reads `model.provider` + `model.model`
|
||
from the user's config (or the explicit overrides when trusted).
|
||
* **Auth.** Pulls API keys, OAuth tokens, or refresh tokens from
|
||
`~/.hermes/auth.json` / env, including the credential pool when
|
||
one is configured. The plugin never sees them.
|
||
* **Vision routing.** When image input is supplied and the user's
|
||
active text model is text-only, the host falls back to the
|
||
configured vision model automatically.
|
||
* **Fallback chain.** If the user's primary provider 5xxs or 429s,
|
||
the request goes through Hermes' usual aggregator-aware fallback
|
||
before it returns an error to the plugin.
|
||
* **Timeout.** Honours your `timeout=` argument, falling back to
|
||
`auxiliary.<task>.timeout` config or the global aux default.
|
||
* **JSON shaping.** Sends `response_format` to the provider when
|
||
you ask for JSON, then re-parses locally from a code-fenced
|
||
response if the provider returned one.
|
||
* **Schema validation.** Validates against your `json_schema` when
|
||
`jsonschema` is installed; logs a debug line and skips strict
|
||
validation otherwise.
|
||
* **Audit log.** Each call writes one INFO line to `agent.log` with
|
||
the plugin id, provider/model, purpose, and token totals.
|
||
|
||
## What the plugin owns
|
||
|
||
* **Request shape.** `messages` for chat, `instructions` + `input`
|
||
for structured. The plugin builds the prompt; the host runs it.
|
||
* **Schema.** Whatever shape you want back. The host doesn't infer
|
||
it for you.
|
||
* **Error handling.** `complete_structured()` raises `ValueError` on
|
||
empty inputs and on schema-validation failure. `PluginLlmTrustError`
|
||
fires when the trust gate denies an override. Anything else
|
||
(provider 5xx, no credentials configured, timeout) raises whatever
|
||
`auxiliary_client.call_llm()` raises.
|
||
* **Cost.** Every call runs against the user's paid provider. Don't
|
||
loop on `complete()` for every gateway message without thinking
|
||
about token spend.
|
||
|
||
## Where this fits in the plugin surface
|
||
|
||
Existing `ctx.*` methods extend an existing Hermes subsystem:
|
||
|
||
| `ctx.register_tool` | adds a tool the agent can call |
|
||
| `ctx.register_platform` | wires a new gateway adapter |
|
||
| `ctx.register_image_gen_provider` | replaces an image-gen backend |
|
||
| `ctx.register_memory_provider` | replaces the memory backend |
|
||
| `ctx.register_context_engine` | replaces the context compressor |
|
||
| `ctx.register_hook` | observes a lifecycle event |
|
||
|
||
`ctx.llm` is the first surface that lets a plugin run the same
|
||
model the user is talking to, *out of band*, without any of the
|
||
above. That's its only job. If your plugin needs to register a
|
||
tool the agent invokes, use `register_tool`. If it needs to react
|
||
to a lifecycle event, use `register_hook`. If it needs to make its
|
||
own model call — for any reason, structured or not — `ctx.llm`.
|
||
|
||
## Reference
|
||
|
||
* Implementation: [`agent/plugin_llm.py`](https://github.com/NousResearch/hermes-agent/blob/main/agent/plugin_llm.py)
|
||
* Tests: [`tests/agent/test_plugin_llm.py`](https://github.com/NousResearch/hermes-agent/blob/main/tests/agent/test_plugin_llm.py)
|
||
* Reference plugins (companion repo):
|
||
* [`plugin-llm-example`](https://github.com/NousResearch/hermes-example-plugins/tree/main/plugin-llm-example) — sync structured extraction with image input
|
||
* [`plugin-llm-async-example`](https://github.com/NousResearch/hermes-example-plugins/tree/main/plugin-llm-async-example) — async with `asyncio.gather()`
|
||
* Auxiliary client (the engine under the hood): see
|
||
[Provider Runtime](/developer-guide/provider-runtime).
|