docs: context engine plugin system + unified hermes plugins UI

New page:
- developer-guide/context-engine-plugin.md — full guide for building
  context engine plugins (ABC contract, lifecycle, tools, registration)

Updated pages (11 files):
- plugins.md — plugin types table, composite UI documentation with
  screenshot-style example, provider plugin config format
- cli-commands.md — hermes plugins section rewritten for composite UI
  with provider plugin config keys documented
- context-compression-and-caching.md — new 'Pluggable Context Engine'
  section explaining the ABC, config-driven selection, resolution order
- configuration.md — new 'Context Engine' config section with examples
- architecture.md — context_engine.py and plugins/context_engine/ added
  to directory trees, plugin system description updated
- memory-provider-plugin.md — cross-reference tip to context engines
- memory-providers.md — hermes plugins as alternative setup path
- agent-loop.md — context_engine.py added to file reference table
- overview.md — plugins description expanded to cover all 3 types
- build-a-hermes-plugin.md — tip box linking to specialized plugin guides
- sidebars.ts — context-engine-plugin added to Extending category
This commit is contained in:
Teknium 2026-04-10 19:01:41 -07:00 committed by Teknium
parent 436dfd5ab5
commit 79198eb3a0
12 changed files with 312 additions and 11 deletions

View file

@ -226,7 +226,8 @@ After each turn:
|------|---------|
| `run_agent.py` | AIAgent class — the complete agent loop (~9,200 lines) |
| `agent/prompt_builder.py` | System prompt assembly from memory, skills, context files, personality |
| `agent/context_compressor.py` | Conversation compression algorithm |
| `agent/context_engine.py` | ContextEngine ABC — pluggable context management |
| `agent/context_compressor.py` | Default engine — lossy summarization algorithm |
| `agent/prompt_caching.py` | Anthropic prompt caching markers and cache metrics |
| `agent/auxiliary_client.py` | Auxiliary LLM client for side tasks (vision, summarization) |
| `model_tools.py` | Tool schema collection, `handle_function_call()` dispatch |

View file

@ -62,7 +62,8 @@ hermes-agent/
├── agent/ # Agent internals
│ ├── prompt_builder.py # System prompt assembly
│ ├── context_compressor.py # Conversation compression algorithm
│ ├── context_engine.py # ContextEngine ABC (pluggable)
│ ├── context_compressor.py # Default engine — lossy summarization
│ ├── prompt_caching.py # Anthropic prompt caching
│ ├── auxiliary_client.py # Auxiliary LLM for side tasks (vision, summarization)
│ ├── model_metadata.py # Model context lengths, token estimation
@ -123,6 +124,7 @@ hermes-agent/
├── acp_adapter/ # ACP server (VS Code / Zed / JetBrains)
├── cron/ # Scheduler (jobs.py, scheduler.py)
├── plugins/memory/ # Memory provider plugins
├── plugins/context_engine/ # Context engine plugins
├── environments/ # RL training environments (Atropos)
├── skills/ # Bundled skills (always available)
├── optional-skills/ # Official optional skills (install explicitly)
@ -227,7 +229,7 @@ Long-running process with 14 platform adapters, unified session routing, user au
### Plugin System
Three discovery sources: `~/.hermes/plugins/` (user), `.hermes/plugins/` (project), and pip entry points. Plugins register tools, hooks, and CLI commands through a context API. Memory providers are a specialized plugin type under `plugins/memory/`.
Three discovery sources: `~/.hermes/plugins/` (user), `.hermes/plugins/` (project), and pip entry points. Plugins register tools, hooks, and CLI commands through a context API. Two specialized plugin types exist: memory providers (`plugins/memory/`) and context engines (`plugins/context_engine/`). Both are single-select — only one of each can be active at a time, configured via `hermes plugins` or `config.yaml`.
→ [Plugin Guide](/docs/guides/build-a-hermes-plugin), [Memory Provider Plugin](./memory-provider-plugin.md)

View file

@ -3,10 +3,37 @@
Hermes Agent uses a dual compression system and Anthropic prompt caching to
manage context window usage efficiently across long conversations.
Source files: `agent/context_compressor.py`, `agent/prompt_caching.py`,
`gateway/run.py` (session hygiene), `run_agent.py` (search for `_compress_context`)
Source files: `agent/context_engine.py` (ABC), `agent/context_compressor.py` (default engine),
`agent/prompt_caching.py`, `gateway/run.py` (session hygiene), `run_agent.py` (search for `_compress_context`)
## Pluggable Context Engine
Context management is built on the `ContextEngine` ABC (`agent/context_engine.py`). The built-in `ContextCompressor` is the default implementation, but plugins can replace it with alternative engines (e.g., Lossless Context Management).
```yaml
context:
engine: "compressor" # default — built-in lossy summarization
engine: "lcm" # example — plugin providing lossless context
```
The engine is responsible for:
- Deciding when compaction should fire (`should_compress()`)
- Performing compaction (`compress()`)
- Optionally exposing tools the agent can call (e.g., `lcm_grep`)
- Tracking token usage from API responses
Selection is config-driven via `context.engine` in `config.yaml`. The resolution order:
1. Check `plugins/context_engine/<name>/` directory
2. Check general plugin system (`register_context_engine()`)
3. Fall back to built-in `ContextCompressor`
Plugin engines are **never auto-activated** — the user must explicitly set `context.engine` to the plugin's name. The default `"compressor"` always uses the built-in.
Configure via `hermes plugins` → Provider Plugins → Context Engine, or edit `config.yaml` directly.
For building a context engine plugin, see [Context Engine Plugins](/docs/developer-guide/context-engine-plugin).
## Dual Compression System
Hermes has two separate compression layers that operate independently:

View file

@ -0,0 +1,189 @@
---
sidebar_position: 9
title: "Context Engine Plugins"
description: "How to build a context engine plugin that replaces the built-in ContextCompressor"
---
# Building a Context Engine Plugin
Context engine plugins replace the built-in `ContextCompressor` with an alternative strategy for managing conversation context. For example, a Lossless Context Management (LCM) engine that builds a knowledge DAG instead of lossy summarization.
## How it works
The agent's context management is built on the `ContextEngine` ABC (`agent/context_engine.py`). The built-in `ContextCompressor` is the default implementation. Plugin engines must implement the same interface.
Only **one** context engine can be active at a time. Selection is config-driven:
```yaml
# config.yaml
context:
engine: "compressor" # default built-in
engine: "lcm" # activates a plugin engine named "lcm"
```
Plugin engines are **never auto-activated** — the user must explicitly set `context.engine` to the plugin's name.
## Directory structure
Each context engine lives in `plugins/context_engine/<name>/`:
```
plugins/context_engine/lcm/
├── __init__.py # exports the ContextEngine subclass
├── plugin.yaml # metadata (name, description, version)
└── ... # any other modules your engine needs
```
## The ContextEngine ABC
Your engine must implement these **required** methods:
```python
from agent.context_engine import ContextEngine
class LCMEngine(ContextEngine):
@property
def name(self) -> str:
"""Short identifier, e.g. 'lcm'. Must match config.yaml value."""
return "lcm"
def update_from_response(self, usage: dict) -> None:
"""Called after every LLM call with the usage dict.
Update self.last_prompt_tokens, self.last_completion_tokens,
self.last_total_tokens from the response.
"""
def should_compress(self, prompt_tokens: int = None) -> bool:
"""Return True if compaction should fire this turn."""
def compress(self, messages: list, current_tokens: int = None) -> list:
"""Compact the message list and return a new (possibly shorter) list.
The returned list must be a valid OpenAI-format message sequence.
"""
```
### Class attributes your engine must maintain
The agent reads these directly for display and logging:
```python
last_prompt_tokens: int = 0
last_completion_tokens: int = 0
last_total_tokens: int = 0
threshold_tokens: int = 0 # when compression triggers
context_length: int = 0 # model's full context window
compression_count: int = 0 # how many times compress() has run
```
### Optional methods
These have sensible defaults in the ABC. Override as needed:
| Method | Default | Override when |
|--------|---------|--------------|
| `on_session_start(session_id, **kwargs)` | No-op | You need to load persisted state (DAG, DB) |
| `on_session_end(session_id, messages)` | No-op | You need to flush state, close connections |
| `on_session_reset()` | Resets token counters | You have per-session state to clear |
| `update_model(model, context_length, ...)` | Updates context_length + threshold | You need to recalculate budgets on model switch |
| `get_tool_schemas()` | Returns `[]` | Your engine provides agent-callable tools (e.g., `lcm_grep`) |
| `handle_tool_call(name, args, **kwargs)` | Returns error JSON | You implement tool handlers |
| `should_compress_preflight(messages)` | Returns `False` | You can do a cheap pre-API-call estimate |
| `get_status()` | Standard token/threshold dict | You have custom metrics to expose |
## Engine tools
Context engines can expose tools the agent calls directly. Return schemas from `get_tool_schemas()` and handle calls in `handle_tool_call()`:
```python
def get_tool_schemas(self):
return [{
"name": "lcm_grep",
"description": "Search the context knowledge graph",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"],
},
}]
def handle_tool_call(self, name, args, **kwargs):
if name == "lcm_grep":
results = self._search_dag(args["query"])
return json.dumps({"results": results})
return json.dumps({"error": f"Unknown tool: {name}"})
```
Engine tools are injected into the agent's tool list at startup and dispatched automatically — no registry registration needed.
## Registration
### Via directory (recommended)
Place your engine in `plugins/context_engine/<name>/`. The `__init__.py` must export a `ContextEngine` subclass. The discovery system finds and instantiates it automatically.
### Via general plugin system
A general plugin can also register a context engine:
```python
def register(ctx):
engine = LCMEngine(context_length=200000)
ctx.register_context_engine(engine)
```
Only one engine can be registered. A second plugin attempting to register is rejected with a warning.
## Lifecycle
```
1. Engine instantiated (plugin load or directory discovery)
2. on_session_start() — conversation begins
3. update_from_response() — after each API call
4. should_compress() — checked each turn
5. compress() — called when should_compress() returns True
6. on_session_end() — session boundary (CLI exit, /reset, gateway expiry)
```
`on_session_reset()` is called on `/new` or `/reset` to clear per-session state without a full shutdown.
## Configuration
Users select your engine via `hermes plugins` → Provider Plugins → Context Engine, or by editing `config.yaml`:
```yaml
context:
engine: "lcm" # must match your engine's name property
```
The `compression` config block (`compression.threshold`, `compression.protect_last_n`, etc.) is specific to the built-in `ContextCompressor`. Your engine should define its own config format if needed, reading from `config.yaml` during initialization.
## Testing
```python
from agent.context_engine import ContextEngine
def test_engine_satisfies_abc():
engine = YourEngine(context_length=200000)
assert isinstance(engine, ContextEngine)
assert engine.name == "your-name"
def test_compress_returns_valid_messages():
engine = YourEngine(context_length=200000)
msgs = [{"role": "user", "content": "hello"}]
result = engine.compress(msgs)
assert isinstance(result, list)
assert all("role" in m for m in result)
```
See `tests/agent/test_context_engine.py` for the full ABC contract test suite.
## See also
- [Context Compression and Caching](/docs/developer-guide/context-compression-and-caching) — how the built-in compressor works
- [Memory Provider Plugins](/docs/developer-guide/memory-provider-plugin) — analogous single-select plugin system for memory
- [Plugins](/docs/user-guide/features/plugins) — general plugin system overview

View file

@ -8,6 +8,10 @@ description: "How to build a memory provider plugin for Hermes Agent"
Memory provider plugins give Hermes Agent persistent, cross-session knowledge beyond the built-in MEMORY.md and USER.md. This guide covers how to build one.
:::tip
Memory providers are one of two **provider plugin** types. The other is [Context Engine Plugins](/docs/developer-guide/context-engine-plugin), which replace the built-in context compressor. Both follow the same pattern: single-select, config-driven, managed via `hermes plugins`.
:::
## Directory Structure
Each memory provider lives in `plugins/memory/<name>/`: