fix: auto-detect D-Bus session bus for systemctl --user on headless servers (#1601)

* fix: Anthropic OAuth compatibility — Claude Code identity fingerprinting

Anthropic routes OAuth/subscription requests based on Claude Code's
identity markers. Without them, requests get intermittent 500 errors
(~25% failure rate observed). This matches what pi-ai (clawdbot) and
OpenCode both implement for OAuth compatibility.

Changes (OAuth tokens only — API key users unaffected):

1. Headers: user-agent 'claude-cli/2.1.2 (external, cli)' + x-app 'cli'
2. System prompt: prepend 'You are Claude Code, Anthropic's official CLI'
3. System prompt sanitization: replace Hermes/Nous references
4. Tool names: prefix with 'mcp_' (Claude Code convention for non-native tools)
5. Tool name stripping: remove 'mcp_' prefix from response tool calls

Before: 9/12 OK, 1 hard fail, 4 needed retries (~25% error rate)
After: 16/16 OK, 0 failures, 0 retries (0% error rate)

* fix: auto-detect DBUS_SESSION_BUS_ADDRESS for systemctl --user on headless servers

On SSH sessions to headless servers, DBUS_SESSION_BUS_ADDRESS and
XDG_RUNTIME_DIR may not be set even when the user's systemd instance
is running via linger. This causes 'systemctl --user' to fail with
'Failed to connect to bus: No medium found', breaking gateway
restart/start/stop as a service and falling back to foreground mode.

Add _ensure_user_systemd_env() that detects the standard D-Bus socket
at /run/user/<UID>/bus and sets the env vars before any systemctl --user
call. Called from _systemctl_cmd() so all existing call sites benefit
automatically with zero changes.

Fixes: gateway restart falling back to foreground on headless servers

* fix: show linger guidance when gateway restart fails during update and gateway restart

When systemctl --user restart fails during 'hermes update' or
'hermes gateway restart', check linger status and tell the user
exactly what to run (sudo -S -p '' loginctl enable-linger) instead of
silently falling back to foreground mode.

Also applies _ensure_user_systemd_env() to the raw systemctl calls
in cmd_update so they work properly on SSH sessions where D-Bus
env vars are missing.
This commit is contained in:
Teknium 2026-03-16 17:45:48 -07:00 committed by GitHub
parent ce430fed4c
commit 60e38e82ec
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 136 additions and 2 deletions

View file

@ -2307,8 +2307,9 @@ def cmd_update(args):
try:
from gateway.status import get_running_pid, remove_pid_file
from hermes_cli.gateway import (
get_service_name, get_launchd_plist_path, is_macos,
get_service_name, get_launchd_plist_path, is_macos, is_linux,
refresh_launchd_plist_if_needed,
_ensure_user_systemd_env, get_systemd_linger_status,
)
import signal as _signal
@ -2318,6 +2319,7 @@ def cmd_update(args):
has_launchd_service = False
try:
_ensure_user_systemd_env()
check = subprocess.run(
["systemctl", "--user", "is-active", _gw_service_name],
capture_output=True, text=True, timeout=5,
@ -2366,7 +2368,20 @@ def cmd_update(args):
print("✓ Gateway restarted.")
else:
print(f"⚠ Gateway restart failed: {restart.stderr.strip()}")
print(" Try manually: hermes gateway restart")
# Check if linger is the issue
if is_linux():
linger_ok, _detail = get_systemd_linger_status()
if linger_ok is not True:
import getpass
_username = getpass.getuser()
print()
print(" Linger must be enabled for the gateway user service to function.")
print(f" Run: sudo loginctl enable-linger {_username}")
print()
print(" Then restart the gateway:")
print(" hermes gateway restart")
else:
print(" Try manually: hermes gateway restart")
elif has_launchd_service:
# Refresh the plist first (picks up --replace and other
# changes from the update we just pulled).