Fix variable name breakage (run_agent, hermes_constants, etc.) where
import rewriter changed 'import X' to 'import hermes_agent.Y' but
test code still referenced 'X' as a variable name.
Fix package-vs-module confusion (cli.auth, cli.models, cli.ui) where
single files became directories.
Fix hardcoded file paths in tests pointing to old locations.
Fix tool registry to discover tools in subpackage directories.
Fix stale import in hermes_agent/tools/__init__.py.
Part of #14182, #14183
Rewrite all import statements, patch() targets, sys.modules keys,
importlib.import_module() strings, and subprocess -m references to use
hermes_agent.* paths.
Strip sys.path.insert hacks from production code (rely on editable install).
Update COMPONENT_PREFIXES for logger filtering.
Fix 3 hardcoded getLogger() calls to use __name__.
Update transport and tool registry discovery paths.
Update plugin module path strings.
Add legacy process-name patterns for gateway PID detection.
Add main() to skills_sync for console_script entry point.
Fix _get_bundled_dir() path traversal after move.
Part of #14182, #14183
* feat(plugins): pluggable image_gen backends + OpenAI provider
Adds a ImageGenProvider ABC so image generation backends register as
bundled plugins under `plugins/image_gen/<name>/`. The plugin scanner
gains three primitives to make this work generically:
- `kind:` manifest field (`standalone` | `backend` | `exclusive`).
Bundled `kind: backend` plugins auto-load — no `plugins.enabled`
incantation. User-installed backends stay opt-in.
- Path-derived keys: `plugins/image_gen/openai/` gets key
`image_gen/openai`, so a future `tts/openai` cannot collide.
- Depth-2 recursion into category namespaces (parent dirs without a
`plugin.yaml` of their own).
Includes `OpenAIImageGenProvider` as the first consumer (gpt-image-1.5
default, plus gpt-image-1, gpt-image-1-mini, DALL-E 3/2). Base64
responses save to `$HERMES_HOME/cache/images/`; URL responses pass
through.
FAL stays in-tree for this PR — a follow-up ports it into
`plugins/image_gen/fal/` so the in-tree `image_generation_tool.py`
slims down. The dispatch shim in `_handle_image_generate` only fires
when `image_gen.provider` is explicitly set to a non-FAL value, so
existing FAL setups are untouched.
- 41 unit tests (scanner recursion, kind parsing, gate logic,
registry, OpenAI payload shapes)
- E2E smoke verified: bundled plugin autoloads, registers, and
`_handle_image_generate` routes to OpenAI when configured
* fix(image_gen/openai): don't send response_format to gpt-image-*
The live API rejects it: 'Unknown parameter: response_format'
(verified 2026-04-21 with gpt-image-1.5). gpt-image-* models return
b64_json unconditionally, so the parameter was both unnecessary and
actively broken.
* feat(image_gen/openai): gpt-image-2 only, drop legacy catalog
gpt-image-2 is the latest/best OpenAI image model (released 2026-04-21)
and there's no reason to expose the older gpt-image-1.5 / gpt-image-1 /
dall-e-3 / dall-e-2 alongside it — slower, lower quality, or awkward
(dall-e-2 squares only). Trim the catalog down to a single model.
Live-verified end-to-end: landscape 1536x1024 render of a Moog-style
synth matches prompt exactly, 2.4MB PNG saved to cache.
* feat(image_gen/openai): expose gpt-image-2 as three quality tiers
Users pick speed/fidelity via the normal model picker instead of a
hidden quality knob. All three tier IDs resolve to the single underlying
gpt-image-2 API model with a different quality parameter:
gpt-image-2-low ~15s fast iteration
gpt-image-2-medium ~40s default
gpt-image-2-high ~2min highest fidelity
Live-measured on OpenAI's API today: 15.4s / 40.8s / 116.9s for the
same 1024x1024 prompt.
Config:
image_gen.openai.model: gpt-image-2-high
# or
image_gen.model: gpt-image-2-low
# or env var for scripts/tests
OPENAI_IMAGE_MODEL=gpt-image-2-medium
Live-verified end-to-end with the low tier: 18.8s landscape render of a
golden retriever in wildflowers, vision-confirmed exact match.
* feat(tools_config): plugin image_gen providers inject themselves into picker
'hermes tools' → Image Generation now shows plugin-registered backends
alongside Nous Subscription and FAL.ai without tools_config.py needing
to know about them. OpenAI appears as a third option today; future
backends appear automatically as they're added.
Mechanism:
- ImageGenProvider gains an optional get_setup_schema() hook
(name, badge, tag, env_vars). Default derived from display_name.
- tools_config._plugin_image_gen_providers() pulls the schemas from
every registered non-FAL plugin provider.
- _visible_providers() appends those rows when rendering the Image
Generation category.
- _configure_provider() handles the new image_gen_plugin_name marker:
writes image_gen.provider and routes to the plugin's list_models()
catalog for the model picker.
- _toolset_needs_configuration_prompt('image_gen') stops demanding a
FAL key when any plugin provider reports is_available().
FAL is skipped in the plugin path because it already has hardcoded
TOOL_CATEGORIES rows — when it gets ported to a plugin in a follow-up
PR the hardcoded rows go away and it surfaces through the same path
as OpenAI.
Verified live: picker shows Nous Subscription / FAL.ai / OpenAI.
Picking OpenAI prompts for OPENAI_API_KEY, then shows the
gpt-image-2-low/medium/high model picker sourced from the plugin.
397 tests pass across plugins/, tools_config, registry, and picker.
* fix(image_gen): close final gaps for plugin-backend parity with FAL
Two small places that still hardcoded FAL:
- hermes_cli/setup.py status line: an OpenAI-only setup showed
'Image Generation: missing FAL_KEY'. Now probes plugin providers
and reports '(OpenAI)' when one is_available() — or falls back to
'missing FAL_KEY or OPENAI_API_KEY' if nothing is configured.
- image_generate tool schema description: said 'using FAL.ai, default
FLUX 2 Klein 9B'. Rewrote provider-neutral — 'backend and model are
user-configured' — and notes the 'image' field can be a URL or an
absolute path, which the gateway delivers either way via
extract_local_files().
Plugins now require explicit consent to load. Discovery still finds every
plugin — user-installed, bundled, and pip — so they all show up in
`hermes plugins` and `/plugins`, but the loader only instantiates
plugins whose name appears in `plugins.enabled` in config.yaml. This
removes the previous ambient-execution risk where a newly-installed or
bundled plugin could register hooks, tools, and commands on first run
without the user opting in.
The three-state model is now explicit:
enabled — in plugins.enabled, loads on next session
disabled — in plugins.disabled, never loads (wins over enabled)
not enabled — discovered but never opted in (default for new installs)
`hermes plugins install <repo>` prompts "Enable 'name' now? [y/N]"
(defaults to no). New `--enable` / `--no-enable` flags skip the prompt
for scripted installs. `hermes plugins enable/disable` manage both lists
so a disabled plugin stays explicitly off even if something later adds
it to enabled.
Config migration (schema v20 → v21): existing user plugins already
installed under ~/.hermes/plugins/ (minus anything in plugins.disabled)
are auto-grandfathered into plugins.enabled so upgrades don't silently
break working setups. Bundled plugins are NOT grandfathered — even
existing users have to opt in explicitly.
Also: HERMES_DISABLE_BUNDLED_PLUGINS env var removed (redundant with
opt-in default), cmd_list now shows bundled + user plugins together with
their three-state status, interactive UI tags bundled entries
[bundled], docs updated across plugins.md and built-in-plugins.md.
Validation: 442 plugin/config tests pass. E2E: fresh install discovers
disk-cleanup but does not load it; `hermes plugins enable disk-cleanup`
activates hooks; migration grandfathers existing user plugins correctly
while leaving bundled plugins off.
The original name was cute but non-obvious; disk-cleanup says what it
does. Plugin directory, script, state path, log lines, slash command,
and test module all renamed. No user-visible state exists yet, so no
migration path is needed.
New website page "Built-in Plugins" documents the <repo>/plugins/<name>/
source, how discovery interacts with user/project plugins, the
HERMES_DISABLE_BUNDLED_PLUGINS escape hatch, disk-cleanup's hook
behaviour and deletion rules, and guidance on when a plugin belongs
bundled vs. user-installable. Added to the Features → Core sidebar next
to the main Plugins page, with a cross-reference from plugins.md.
Rewires @LVT382009's disk-guardian (PR #12212) from a skill-plus-script
into a plugin that runs entirely via hooks — no agent compliance needed.
- post_tool_call hook auto-tracks files created by write_file / terminal
/ patch when they match test_/tmp_/*.test.* patterns under HERMES_HOME
- on_session_end hook runs cmd_quick cleanup when test files were
auto-tracked during the turn; stays quiet otherwise
- /disk-guardian slash command keeps status / dry-run / quick / deep /
track / forget for manual use
- Deterministic cleanup rules, path safety, atomic writes, and audit
logging preserved from the original contribution
- Protect well-known top-level state dirs (logs/, memories/, sessions/,
cron/, cache/, etc.) from empty-dir removal so fresh installs don't
get gutted on first session end
The plugin system gains a bundled-plugin discovery path (<repo>/plugins/
<name>/) alongside user/project/entry-point sources. Memory and
context_engine subdirs are skipped — they keep their own discovery
paths. HERMES_DISABLE_BUNDLED_PLUGINS=1 suppresses the scan; the test
conftest sets it by default so existing plugin tests stay clean.
Co-authored-by: LVT382009 <levantam.98.2324@gmail.com>
Cuts shard-3 local runtime in half by neutralizing real wall-clock
waits across three classes of slow test:
## 1. Retry backoff mocks
- tests/run_agent/conftest.py (NEW): autouse fixture mocks
jittered_backoff to 0.0 so the `while time.time() < sleep_end`
busy-loop exits immediately. No global time.sleep mock (would
break threading tests).
- test_anthropic_error_handling, test_413_compression,
test_run_agent_codex_responses, test_fallback_model: per-file
fixtures mock time.sleep / asyncio.sleep for retry / compression
paths.
- test_retaindb_plugin: cap the retaindb module's bound time.sleep
to 0.05s via a per-test shim (background writer-thread retries
sleep 2s after errors; tests don't care about exact duration).
Plus replace arbitrary time.sleep(N) waits with short polling
loops bounded by deadline.
## 2. Subprocess sleeps in production code
- test_update_gateway_restart: mock time.sleep. Production code
does time.sleep(3) after `systemctl restart` to verify the
service survived. Tests mock subprocess.run \u2014 nothing actually
restarts \u2014 so the wait is dead time.
## 3. Network / IMDS timeouts (biggest single win)
- tests/conftest.py: add AWS_EC2_METADATA_DISABLED=true plus
AWS_METADATA_SERVICE_TIMEOUT=1 and ATTEMPTS=1. boto3 falls back
to IMDS (169.254.169.254) when no AWS creds are set. Any test
hitting has_aws_credentials() / resolve_aws_auth_env_var() (e.g.
test_status, test_setup_copilot_acp, anything that touches
provider auto-detect) burned ~2-4s waiting for that to time out.
- test_exit_cleanup_interrupt: explicitly mock
resolve_runtime_provider which was doing real network auto-detect
(~4s). Tests don't care about provider resolution \u2014 the agent
is already mocked.
- test_timezone: collapse the 3-test "TZ env in subprocess" suite
into 2 tests by checking both injection AND no-leak in the same
subprocess spawn (was 3 \u00d7 3.2s, now 2 \u00d7 4s).
## Validation
| Test | Before | After |
|---|---|---|
| test_anthropic_error_handling (8 tests) | ~80s | ~15s |
| test_413_compression (14 tests) | ~18s | 2.3s |
| test_retaindb_plugin (67 tests) | ~13s | 1.3s |
| test_status_includes_tavily_key | 4.0s | 0.05s |
| test_setup_copilot_acp_skips_same_provider_pool_step | 8.0s | 0.26s |
| test_update_gateway_restart (5 tests) | ~18s total | ~0.35s total |
| test_exit_cleanup_interrupt (2 tests) | 8s | 1.5s |
| **Matrix shard 3 local** | **108s** | **50s** |
No behavioral contract changed \u2014 tests still verify retry happens,
service restart logic runs, etc.; they just don't burn real seconds
waiting for it.
Supersedes PR #11779 (those changes are included here).
First pass of test-suite reduction to address flaky CI and bloat.
Removed tests that fall into these change-detector patterns:
1. Source-grep tests (tests/gateway/test_feishu.py, test_email.py): tests
that call inspect.getsource() on production modules and grep for string
literals. Break on any refactor/rename even when behavior is correct.
2. Platform enum tautologies (every gateway/test_X.py): assertions like
`Platform.X.value == 'x'` duplicated across ~9 adapter test files.
3. Toolset/PLATFORM_HINTS/setup-wizard registry-presence checks: tests that
only verify a key exists in a dict. Data-layout tests, not behavior.
4. Argparse wiring tests (test_argparse_flag_propagation, test_subparser_routing
_fallback): tests that do parser.parse_args([...]) then assert args.field.
Tests Python's argparse, not our code.
5. Pure dispatch tests (test_plugins_cmd.TestPluginsCommandDispatch): patch
cmd_X, call plugins_command with matching action, assert mock called.
Tests the if/elif chain, not behavior.
6. Kwarg-to-mock verification (test_auxiliary_client ~45 tests,
test_web_tools_config, test_gemini_cloudcode, test_retaindb_plugin): tests
that mock the external API client, call our function, and assert exact
kwargs. Break on refactor even when behavior is preserved.
7. Schedule-internal "function-was-called" tests (acp/test_server scheduling
tests): tests that patch own helper method, then assert it was called.
Kept behavioral tests throughout: error paths (pytest.raises), security
tests (path traversal, SSRF, redaction), message alternation invariants,
provider API format conversion, streaming logic, memory contract, real
config load/merge tests.
Net reduction: 169 tests removed. 38 empty classes cleaned up.
Collected before: 12,522 tests
Collected after: 12,353 tests
Port missing features from the hindsight-hermes external integration
package into the native plugin. Only touches plugin files — no core
changes.
Features:
- Tags on retain/recall (tags, recall_tags, recall_tags_match)
- Recall config (recall_max_tokens, recall_max_input_chars, recall_types,
recall_prompt_preamble)
- Retain controls (retain_every_n_turns, auto_retain, auto_recall,
retain_async via aretain_batch, retain_context)
- Bank config via Banks API (bank_mission, bank_retain_mission)
- Structured JSON retain with per-message timestamps
- Full session accumulation with document_id for dedup
- Custom post_setup() wizard with curses picker
- Mode-aware dep install (hindsight-client for cloud, hindsight-all for local)
- local_external mode and openai_compatible LLM provider
- OpenRouter support with auto base URL
- Auto-upgrade of hindsight-client to >=0.4.22 on session start
- Comprehensive debug logging across all operations
- 46 unit tests
- Updated README and website docs
Based on PR #5413 spec by MaheshtheDev (Mahesh Sanikommu).
Changes:
- Add search_mode config (hybrid/memories/documents) passed to SDK
- Add {identity} template support in container_tag for profile-scoped containers
- Add SUPERMEMORY_CONTAINER_TAG env var override (priority over config)
- Add multi-container mode: enable_custom_container_tags, custom_containers,
custom_container_instructions in supermemory.json
- Dynamic tool schemas when multi-container enabled (optional container_tag param)
- Whitelist validation for custom container tags in tool calls
- Simplify get_config_schema() to only prompt for API key during setup
- Defer container_tag sanitization to initialize() (after template resolution)
- Add custom_id support to documents.add calls
- Update README with multi-container docs, search_mode, identity template,
support links (Discord, email)
- Update memory-providers.md with new features and multi-container example
- Update memory-provider-plugin.md with minimal vs full schema guidance
- Add 12 new tests covering identity template, search_mode, multi-container,
config schema, and env var override
Consolidated salvage from PRs #5301 (qaqcvc), #5339 (lance0),
#5058 and #5098 (maymuneth).
Mem0 API v2 compatibility (#5301):
- All reads use filters={user_id: ...} instead of bare user_id= kwarg
- All writes use filters with user_id + agent_id for attribution
- Response unwrapping for v2 dict format {results: [...]}
- Split _read_filters() vs _write_filters() — reads are user-scoped
only for cross-session recall, writes include agent_id
- Preserved 'hermes-user' default (no breaking change for existing users)
- Omitted run_id scoping from #5301 — cross-session memory is Mem0's
core value, session-scoping reads would defeat that purpose
Memory prefetch context fencing (#5339):
- Wraps prefetched memory in <memory-context> fenced blocks with system
note marking content as recalled context, NOT user input
- Sanitizes provider output to strip fence-escape sequences, preventing
injection where memory content breaks out of the fence
- API-call-time only — never persisted to session history
Secret redaction (#5058, #5098):
- Added prefix patterns for Groq (gsk_), Matrix (syt_), RetainDB
(retaindb_), Hindsight (hsk-), Mem0 (mem0_), ByteRover (brv_)