fix(security): redact credentials before persistence in session capture

Two-layer redaction at the persistence boundary so credentials never reach
state.db, session_*.json, or compression:

1. agent/chat_completion_helpers.py :: build_assistant_message
   - Redact assistant content before the message dict is constructed
     (catches PATs / API keys the model inlines into natural language)
   - Redact tool_call.function.arguments at the same site (catches secrets
     inlined into tool args, e.g. terminal command=curl -H 'Authorization: ...')
   Tool execution uses the raw API response object, not this dict, so
   redacting the persisted shape is safe.

2. run_agent.py :: _save_session_log
   - Add _redact_message_content() static helper that handles both string
     content and OpenAI/Anthropic multimodal list-of-parts (image parts
     pass through untouched, only text/content fields are redacted)
   - Apply to every message + the cached system prompt before writing
     session_*.json

Both layers respect HERMES_REDACT_SECRETS via redact_sensitive_text —
no-op when disabled.

Tests (TestSaveSessionLogRedactsSecrets, 4 cases):
  - api key in tool content
  - api key in user message
  - api key in system prompt
  - multimodal list-of-parts (image part preserved, text redacted)
Tests use an autouse fixture to force _REDACT_ENABLED=True because the
hermetic conftest defaults the env var to false.

Salvaged from PR #24758 by @vgocoder (build_assistant_message + session_log)
+ PR #19855 by @liuhao1024 (multimodal list helper, system_prompt redaction).
Kept only the redaction concern from #19855; its unrelated whatsapp npm
timeout + PATCH_SCHEMA changes are out of scope and dropped.

Refs #19798 (PAT leak via assistant inline mention), #19845 (session capture
credential leak).

Co-authored-by: liuhao1024 <liuhao03@bilibili.com>
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
This commit is contained in:
vgocoder 2026-05-24 17:57:50 -07:00 committed by Teknium
parent 243ebc7a61
commit dcc163ee28
3 changed files with 133 additions and 1 deletions

View file

@ -124,6 +124,7 @@ from agent.memory_manager import StreamingContextScrubber, build_memory_context_
from agent.think_scrubber import StreamingThinkScrubber
from agent.retry_utils import jittered_backoff
from agent.error_classifier import classify_api_error, FailoverReason
from agent.redact import redact_sensitive_text
from agent.prompt_builder import (
DEFAULT_AGENT_IDENTITY, PLATFORM_HINTS,
MEMORY_GUIDANCE, SESSION_SEARCH_GUIDANCE, SKILLS_GUIDANCE,
@ -1546,6 +1547,36 @@ class AIAgent:
content = re.sub(r'(</think>)\n+', r'\1\n', content)
return content.strip()
@staticmethod
def _redact_message_content(content):
"""Apply secret redaction to message content (str or list-of-parts).
Handles both plain-string content and the OpenAI/Anthropic multimodal
shape where ``content`` is a list of ``{"type": "text", "text": ...}``
/ ``{"type": "image_url", ...}`` / ``{"type": "input_text", "content": ...}``
parts. Image / binary parts are left untouched; only text fields are
passed through ``redact_sensitive_text``.
Respects ``HERMES_REDACT_SECRETS`` via ``redact_sensitive_text``
when disabled the helper is effectively a no-op.
"""
if content is None:
return content
if isinstance(content, str):
return redact_sensitive_text(content)
if isinstance(content, list):
redacted = []
for part in content:
if isinstance(part, dict):
part = dict(part)
if isinstance(part.get("text"), str):
part["text"] = redact_sensitive_text(part["text"])
if isinstance(part.get("content"), str):
part["content"] = redact_sensitive_text(part["content"])
redacted.append(part)
return redacted
return content
def _save_session_log(self, messages: List[Dict[str, Any]] = None):
"""Optional per-session JSON snapshot writer.
@ -1581,6 +1612,14 @@ class AIAgent:
if msg.get("role") == "assistant" and msg.get("content"):
msg = dict(msg)
msg["content"] = self._clean_session_content(msg["content"])
# Defence-in-depth: redact credentials from every message
# content before persistence. Catches PATs / API keys / Bearer
# tokens that may have leaked into assistant responses, tool
# output, or user paste. Respects HERMES_REDACT_SECRETS via
# redact_sensitive_text — no-op when disabled. (#19798, #19845)
if "content" in msg:
msg = dict(msg)
msg["content"] = self._redact_message_content(msg.get("content"))
cleaned.append(msg)
# Guard: never overwrite a larger session log with fewer messages.
@ -1606,7 +1645,7 @@ class AIAgent:
"platform": self.platform,
"session_start": self.session_start.isoformat(),
"last_updated": datetime.now().isoformat(),
"system_prompt": self._cached_system_prompt or "",
"system_prompt": redact_sensitive_text(self._cached_system_prompt or ""),
"tools": self.tools or [],
"message_count": len(cleaned),
"messages": cleaned,