Commit graph

15 commits

Author SHA1 Message Date
Teknium
de76f4dbcf
fix(secrets): only apply external secrets once per HERMES_HOME per process (#32271)
`load_hermes_dotenv()` is called at module-import time from cli.py,
hermes_cli/main.py, run_agent.py, trajectory_compressor.py, gateway/run.py,
tui_gateway/server.py, acp_adapter/entry.py, and a few others. Each call
triggered `_apply_external_secret_sources()`, which re-parsed config,
re-fetched from Bitwarden Secrets Manager (its own 300s cache mostly absorbed
this), re-ran the ASCII sanitization sweep, and reprinted

  Bitwarden Secrets Manager: applied N secret(s) (...)

to stderr. Users saw the status line 3-5x per CLI startup.

Guard the function with a process-level set of HERMES_HOME paths that have
already had external secrets applied. Subsequent calls for the same home_path
are no-ops. `reset_secret_source_cache()` lets tests (and any future
long-running consumer that wants to refresh after a config change) force a
re-pull.
2026-05-25 15:18:55 -07:00
Teknium
0219b0408a
perf(cli): cut hermes startup 63% — flip head-to-head vs codex (#31968)
* perf(bitwarden): persist secret-fetch cache across CLI invocations

Every `hermes` invocation paid a ~380ms tax for `bws secret list` to
Bitwarden Secrets Manager because the existing cache was in-process only.
Back-to-back `hermes chat -q`, gateway-spawned agents, and cron-launched
runs all re-fetched.

Adds a disk-persisted L2 cache at `<hermes_home>/cache/bws_cache.json`
(mode 0600, never contains the access token — only the SHA-256
fingerprint prefix). Same TTL as the in-process cache. Read on miss,
write on bws success, ignored on key mismatch / corruption / expiry.

Measured on a startup profile:
  load_hermes_dotenv() cold: 372ms → warm (disk cache hit): 20ms

End-to-end `hermes --version` cold→warm: 666ms → ~295ms.

In a hermes-vs-codex benchmark across 11 single- and multi-turn tasks
(framework overhead = wall − llm − tool_exec, median over 3 trials):

  cohort               before    after    saved
  single-turn (median)  2.96s    2.31s   -0.65s
  multi-turn  (5-turn)  9.40s    8.95s   -0.45s (≈0.3s/turn)

Hermes now wins head-to-head on 6/11 tasks vs codex (was 4/11 before).
The remaining ~0.6s single-turn delta is mostly Python's own import
cost in hermes_cli.main, which is a separate optimization.

* perf(cli): lazy-load model catalog + dedupe config.yaml reads at startup

Two import-time wins on top of the bws disk-cache fix:

1. Lazy-load `hermes_cli.models._PROVIDER_MODELS` via PEP 562
   module-level `__getattr__`. The catalog is ~55ms of work that was
   eagerly imported on every CLI invocation (line 4557 `if not
   _is_termux_startup_environment(): from hermes_cli.models import
   _PROVIDER_MODELS`). Audit showed every internal call site already
   does its own function-local import; only test code reads
   `hermes_cli.main._PROVIDER_MODELS` as a module attribute, and
   __getattr__ keeps that working transparently. First access triggers
   the import once and caches the result on the module via
   `globals()[name] = ...`, so subsequent reads are dict lookups.

2. Dedupe the double config.yaml read in the top-of-module bootstrap.
   Previously: one raw yaml.safe_load for the `security.redact_secrets`
   bridge, then a separate full `load_config()` (with deep-merge) for
   `network.force_ipv4`. Both keys come from the same file. Merged
   into one raw yaml load.

Combined with the bws cache fix in the previous commit:

  hermes --version wall time:
    original (cold):           666 ms
    after bws fix (warm):      295 ms
    after lazy-load + dedupe:  228 ms   (-67 ms additional, -66% from original)

Tests:
  - tests/hermes_cli/test_api_key_providers.py: 173/173 pass
    (lazy __getattr__ correctly handles
     `from hermes_cli.main import _PROVIDER_MODELS`)
  - tests/test_ipv4_preference.py + tests/hermes_cli/test_redact_config_bridge.py +
    tests/agent/test_redact.py: 93/93 pass (dedupe preserves both bridges)
  - tests/test_bitwarden_secrets.py + env_loader tests: 49/49 pass
2026-05-25 03:06:39 -07:00
Hasan Ali
d7c5d5dee5
fix: avoid persisting borrowed credential secrets (#31416) 2026-05-25 00:32:08 -07:00
Teknium
bc3f1f4f34
feat(secrets/bitwarden): EU Cloud + self-hosted server URL support (#31378)
Closes #31370.

bws defaults to the US identity endpoint, so EU Cloud and self-hosted
machine-account tokens fail with [400 Bad Request] {"error":"invalid_client"}
during 'hermes secrets bitwarden setup'. The token is valid — it's just
being checked against the wrong region.

Add a Bitwarden region step to the wizard between the access-token and
project-list steps:

  Step 1  Install bws
  Step 2  Provide access token
  Step 3  Pick region   <-- new (US / EU / self-hosted-custom-URL)
  Step 4  Pick project  (now talks to the right endpoint)
  Step 5  Test fetch

Region is stored in config.yaml as secrets.bitwarden.server_url and
plumbed into every bws subprocess as BWS_SERVER_URL (project list,
secret list, test fetch, and the env_loader startup pull).

Also:
- Non-interactive: 'hermes secrets bitwarden setup --server-url ...'
- Pre-existing BWS_SERVER_URL in the shell is detected and reused
- Cache key includes server_url so EU/US fetches don't collide
- 'hermes secrets bitwarden status' shows the configured region
- 'invalid_client' / '400 Bad Request' from bws now triggers a hint
  pointing at the region setting instead of looking like a bad token
2026-05-24 02:19:57 -07:00
Yuan Li
75643a6154 fix(env): strip null bytes from .env before python-dotenv loads
Null bytes in API key values (introduced by copy-paste) crash
    os.environ[k] = v with ValueError: embedded null byte, preventing
    hermes from starting at all.
2026-05-23 17:17:05 -07:00
Teknium
c25f9d1d36
feat(secrets): label detected credentials with their source (Bitwarden) (#30364)
When Bitwarden Secrets Manager supplies a provider key, 'hermes model'
and the setup wizard show 'credentials ✓' with no hint of where the
key came from — identical to the .env case. Users assume the integration
isn't wired up and re-enter the key (or hit Enter and cancel).

env_loader now tracks which env vars were injected by an external secret
source and exposes get_secret_source() / format_secret_source_suffix() so
the provider flows can render 'Anthropic credentials: sk-ant-... ✓
(from Bitwarden)' instead of an unlabeled checkmark.

Wired into _prompt_api_key (kimi, z.ai, minimax, opencode, ...), the
Anthropic provider flow, the Bedrock flow, and the GitHub Copilot token
display.

Future secret sources (Vault, 1Password, etc.) drop in by setting their
own label in _SECRET_SOURCES; format_secret_source_suffix() has a generic
fallback so no call sites need updating.
2026-05-22 03:32:58 -07:00
Teknium
552e9c7881
feat(secrets): Bitwarden Secrets Manager integration with lazy bws install (#30035)
* feat(secrets): Bitwarden Secrets Manager integration with lazy bws install

Pull API keys from Bitwarden Secrets Manager at process startup
instead of storing them all in plaintext in ~/.hermes/.env.  One
bootstrap token (BWS_ACCESS_TOKEN) replaces N per-provider keys, and
rotating a credential becomes a single change in the Bitwarden web
app.

Bitwarden defaults to source of truth: secrets pulled from BSM
overwrite any matching env vars on startup so rotations actually
take effect.  Set secrets.bitwarden.override_existing: false in
config.yaml to invert.

The bws binary is auto-downloaded into ~/.hermes/bin/bws on first
use (pinned to v2.0.0, SHA-256 verified against the GitHub release
checksum file).  No apt, brew, or sudo required.

New surfaces:
  hermes secrets bitwarden setup    — interactive wizard
  hermes secrets bitwarden status   — config + binary + token state
  hermes secrets bitwarden sync     — dry-run fetch / --apply exports
  hermes secrets bitwarden disable  — flip enabled: false
  hermes secrets bitwarden install  — just download the binary

Failures (missing binary, bad token, no network) never block Hermes
startup — they emit a one-line warning to stderr and continue with
whatever credentials .env already had.

Docs: website/docs/user-guide/secrets/{index,bitwarden}.md
Tests: tests/test_bitwarden_secrets.py (26 tests, hermetic — bws
       subprocess and HTTP downloads fully mocked)

* chore(infographic): add bitwarden-secrets-manager bento-grid retro-pop-grid

Generated for PR #30035 — Bitwarden Secrets Manager integration.
Style picked via pick_pr_infographic_style.py rotation:
  layout: bento-grid
  style:  retro-pop-grid
  aspect: 1:1 square

Saved at infographic/bitwarden-secrets-manager/infographic.png
2026-05-21 14:10:34 -07:00
Teknium
cc38282b04 feat(cross-platform): psutil for PID/process management + Windows footgun checker
## Why

Hermes supports Linux, macOS, and native Windows, but the codebase grew up
POSIX-first and has accumulated patterns that silently break (or worse,
silently kill!) on Windows:

- `os.kill(pid, 0)` as a liveness probe — on Windows this maps to
  CTRL_C_EVENT and broadcasts Ctrl+C to the target's entire console
  process group (bpo-14484, open since 2012).
- `os.killpg` — doesn't exist on Windows at all (AttributeError).
- `os.setsid` / `os.getuid` / `os.geteuid` — same.
- `signal.SIGKILL` / `signal.SIGHUP` / `signal.SIGUSR1` — module-attr
  errors at runtime on Windows.
- `open(path)` / `open(path, "r")` without explicit encoding= — inherits
  the platform default, which is cp1252/mbcs on Windows (UTF-8 on POSIX),
  causing mojibake round-tripping between hosts.
- `wmic` — removed from Windows 10 21H1+.

This commit does three things:

1. Makes `psutil` a core dependency and migrates critical callsites to it.
2. Adds a grep-based CI gate (`scripts/check-windows-footguns.py`) that
   blocks new instances of any of the above patterns.
3. Fixes every existing instance in the codebase so the baseline is clean.

## What changed

### 1. psutil as a core dependency (pyproject.toml)

Added `psutil>=5.9.0,<8` to core deps. psutil is the canonical
cross-platform answer for "is this PID alive" and "kill this process
tree" — its `pid_exists()` uses `OpenProcess + GetExitCodeProcess` on
Windows (NOT a signal call), and its `Process.children(recursive=True)`
+ `.kill()` combo replaces `os.killpg()` portably.

### 2. `gateway/status.py::_pid_exists`

Rewrote to call `psutil.pid_exists()` first, falling back to the
hand-rolled ctypes `OpenProcess + WaitForSingleObject` dance on Windows
(and `os.kill(pid, 0)` on POSIX) only if psutil is somehow missing —
e.g. during the scaffold phase of a fresh install before pip finishes.

### 3. `os.killpg` migration to psutil (7 callsites, 5 files)

- `tools/code_execution_tool.py`
- `tools/process_registry.py`
- `tools/tts_tool.py`
- `tools/environments/local.py` (3 sites kept as-is, suppressed with
  `# windows-footgun: ok` — the pgid semantics psutil can't replicate,
  and the calls are already Windows-guarded at the outer branch)
- `gateway/platforms/whatsapp.py`

### 4. `scripts/check-windows-footguns.py` (NEW, 500 lines)

Grep-based checker with 11 rules covering every Windows cross-platform
footgun we've hit so far:

1. `os.kill(pid, 0)` — the silent killer
2. `os.setsid` without guard
3. `os.killpg` (recommends psutil)
4. `os.getuid` / `os.geteuid` / `os.getgid`
5. `os.fork`
6. `signal.SIGKILL`
7. `signal.SIGHUP/SIGUSR1/SIGUSR2/SIGALRM/SIGCHLD/SIGPIPE/SIGQUIT`
8. `subprocess` shebang script invocation
9. `wmic` without `shutil.which` guard
10. Hardcoded `~/Desktop` (OneDrive trap)
11. `asyncio.add_signal_handler` without try/except
12. `open()` without `encoding=` on text mode

Features:
- Triple-quoted-docstring aware (won't flag prose inside docstrings)
- Trailing-comment aware (won't flag mentions in `# os.kill(pid, 0)` comments)
- Guard-hint aware (skips lines with `hasattr(os, ...)`,
  `shutil.which(...)`, `if platform.system() != 'Windows'`, etc.)
- Inline suppression with `# windows-footgun: ok — <reason>`
- `--list` to print all rules with fixes
- `--all` / `--diff <ref>` / staged-files (default) modes
- Scans 380 files in under 2 seconds

### 5. CI integration

A GitHub Actions workflow that runs the checker on every PR and push is
staged at `/tmp/hermes-stash/windows-footguns.yml` — not included in this
commit because the GH token on the push machine lacks `workflow` scope.
A maintainer with `workflow` permissions should add it as
`.github/workflows/windows-footguns.yml` in a follow-up. Content:

```yaml
name: Windows footgun check
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: {python-version: "3.11"}
      - run: python scripts/check-windows-footguns.py --all
```

### 6. CONTRIBUTING.md — "Cross-Platform Compatibility" expansion

Expanded from 5 to 16 rules, each with message, example, and fix.
Recommends psutil as the preferred API for PID / process-tree operations.

### 7. Baseline cleanup (91 → 0 findings)

- 14 `open()` sites → added `encoding='utf-8'` (internal logs/caches) or
  `encoding='utf-8-sig'` (user-editable files that Notepad may BOM)
- 23 POSIX-only callsites in systemd helpers, pty_bridge, and plugin
  tool subprocess management → annotated with
  `# windows-footgun: ok — <reason>`
- 7 `os.killpg` sites → migrated to psutil (see §3 above)

## Verification

```
$ python scripts/check-windows-footguns.py --all
✓ No Windows footguns found (380 file(s) scanned).

$ python -c "from gateway.status import _pid_exists; import os
> print('self:', _pid_exists(os.getpid())); print('bogus:', _pid_exists(999999))"
self: True
bogus: False
```

Proof-of-repro that `os.kill(pid, 0)` was actually killing processes
before this fix — see commit `1cbe39914` and bpo-14484. This commit
removes the last hand-rolled ctypes path from the hot liveness-check
path and defers to the best-maintained cross-platform answer.
2026-05-08 14:27:40 -07:00
Teknium
b61d9b297a refactor: consolidate symlink-safe atomic replace into shared helper
Extract the islink/realpath guard from the 16743 fix into a single
atomic_replace() helper in utils.py, then migrate every os.replace()
call site in the codebase to use it.

The original PR #16777 correctly identified and fixed the bug, but
only patched 9 of ~24 call sites. The same bug class (managed
deployments that symlink state files silently losing the link on
every write) still existed at auth.json, sessions file, gateway
config, env_loader, webhook subscriptions, debug store, model
catalog, pairing, google OAuth, nous rate guard, and more.

Rather than add another 10+ copies of the same three-line guard,
consolidate into atomic_replace(tmp, target) which:
- resolves symlinks via os.path.realpath before os.replace
- returns the resolved real path so callers can re-apply permissions
- is a drop-in replacement for os.replace at the use sites

Changes:
- utils.py: new atomic_replace() helper + atomic_json_write /
  atomic_yaml_write now call it instead of inlining the guard
- 16 files: all os.replace() call sites migrated to atomic_replace()
  - agent/{google_oauth, nous_rate_guard, shell_hooks}.py
  - cron/jobs.py
  - gateway/{pairing, session, platforms/telegram}.py
  - hermes_cli/{auth, config, debug, env_loader, model_catalog, webhook}.py
  - tools/{memory_tool, skill_manager_tool, skills_sync}.py

Tests: tests/test_atomic_replace_symlinks.py pins the invariant for
atomic_replace + atomic_json_write + atomic_yaml_write, covers plain
files, first-time creates, broken symlinks, and permission preservation.

Refs #16743
Builds on #16777 by @vominh1919.
2026-04-28 04:58:22 -07:00
hharry11
83cb9a03ee fix(cli): ensure project .env is sanitized before loading 2026-04-22 05:51:44 -07:00
Teknium
fdd0ecaf13
fix(env_loader): warn when non-ASCII stripped from credential env vars (#13300)
Load-time sanitizer silently removed non-ASCII codepoints from any
env var ending in _API_KEY / _TOKEN / _SECRET / _KEY, turning
copy-paste artifacts (Unicode lookalikes, ZWSP, NBSP) into opaque
provider-side API_KEY_INVALID errors.

Warn once per key to stderr with the offending codepoints (U+XXXX)
and guidance to re-copy from the provider dashboard.
2026-04-20 22:14:03 -07:00
Teknium
da528a8207 fix: detect and strip non-ASCII characters from API keys (#6843)
API keys containing Unicode lookalike characters (e.g. ʋ U+028B instead
of v) cause UnicodeEncodeError when httpx encodes the Authorization
header as ASCII.  This commonly happens when users copy-paste keys from
PDFs, rich-text editors, or web pages with decorative fonts.

Three layers of defense:

1. **Save-time validation** (hermes_cli/config.py):
   _check_non_ascii_credential() strips non-ASCII from credential values
   when saving to .env, with a clear warning explaining the issue.

2. **Load-time sanitization** (hermes_cli/env_loader.py):
   _sanitize_loaded_credentials() strips non-ASCII from credential env
   vars (those ending in _API_KEY, _TOKEN, _SECRET, _KEY) after dotenv
   loads them, so the rest of the codebase never sees non-ASCII keys.

3. **Runtime recovery** (run_agent.py):
   The UnicodeEncodeError recovery block now also sanitizes self.api_key
   and self._client_kwargs['api_key'], fixing the gap where message/tool
   sanitization succeeded but the API key still caused httpx to fail on
   the Authorization header.

Also: hermes_logging.py RotatingFileHandler now explicitly sets
encoding='utf-8' instead of relying on locale default (defensive
hardening for ASCII-locale systems).
2026-04-14 20:20:31 -07:00
Mil Wang (from Dev Box)
e469f3f3db fix: sanitize .env before loading to prevent token duplication (#8908)
When .env files become corrupted (e.g. concatenated KEY=VALUE pairs on
a single line due to concurrent writes or encoding issues), both
python-dotenv and load_env() would parse the entire concatenated string
as a single value. This caused bot tokens to appear duplicated up to 8×,
triggering InvalidToken errors from the Telegram API.

Root cause: _sanitize_env_lines() — which correctly splits concatenated
lines — was only called during save_env_value() writes, not during reads.

Fix:
- load_env() now calls _sanitize_env_lines() before parsing
- env_loader.load_hermes_dotenv() sanitizes the .env file on disk
  before python-dotenv reads it, so os.getenv() also returns clean values
- Added tests reproducing the exact corruption pattern from #8908

Closes #8908
2026-04-13 04:35:37 -07:00
Teknium
8bb1d15da4
chore: remove ~100 unused imports across 55 files (#3016)
Automated cleanup via pyflakes + autoflake with manual review.

Changes:
- Removed unused stdlib imports (os, sys, json, pathlib.Path, etc.)
- Removed unused typing imports (List, Dict, Any, Optional, Tuple, Set, etc.)
- Removed unused internal imports (hermes_cli.auth, hermes_cli.config, etc.)
- Fixed cli.py: removed 8 shadowed banner imports (imported from hermes_cli.banner
  then immediately redefined locally — only build_welcome_banner is actually used)
- Added noqa comments to imports that appear unused but serve a purpose:
  - Re-exports (gateway/session.py SessionResetPolicy, tools/terminal_tool.py
    is_interrupted/_interrupt_event)
  - SDK presence checks in try/except (daytona, fal_client, discord)
  - Test mock targets (auxiliary_client.py Path, mcp_config.py get_hermes_home)

Zero behavioral changes. Full test suite passes (6162/6162, 2 pre-existing
streaming test failures unrelated to this change).
2026-03-25 15:02:03 -07:00
teknium1
f24c00a5bf fix(config): reload .env over stale shell overrides
Hermes startup entrypoints now load ~/.hermes/.env and project fallback env files with user config taking precedence over stale shell-exported values. This makes model/provider/base URL changes in .env actually take effect after restarting Hermes. Adds a shared env loader plus regression coverage, and reproduces the original bug case where OPENAI_BASE_URL and HERMES_INFERENCE_PROVIDER remained stuck on old shell values before import.
2026-03-15 06:46:28 -07:00