mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
fix(gateway): detect legacy hermes.service + mark --replace SIGTERM as planned (#11909)
* fix(gateway): detect legacy hermes.service units from pre-rename installs Older Hermes installs used a different service name (hermes.service) before the rename to hermes-gateway.service. When both units remain installed, they fight over the same bot token — after PR #5646's signal-recovery change, this manifests as a 30-second SIGTERM flap loop between the two services. Detection is an explicit allowlist (no globbing) plus an ExecStart content check, so profile units (hermes-gateway-<profile>.service) and unrelated third-party services named 'hermes' are never matched. Wired into systemd_install, systemd_status, gateway_setup wizard, and the main hermes setup flow — anywhere we already warn about scope conflicts now also warns about legacy units. * feat(gateway): add migrate-legacy command + install-time removal prompt - New hermes_cli.gateway.remove_legacy_hermes_units() removes legacy unit files with stop → disable → unlink → daemon-reload. Handles user and system scopes separately; system scope returns path list when not running as root so the caller can tell the user to re-run with sudo. - New 'hermes gateway migrate-legacy' subcommand (with --dry-run and -y) routes to remove_legacy_hermes_units via gateway_command dispatch. - systemd_install now offers to remove legacy units BEFORE installing the new hermes-gateway.service, preventing the SIGTERM flap loop that hits users who still have pre-rename hermes.service around. Profile units (hermes-gateway-<profile>.service) remain untouched in all paths — the legacy allowlist is explicit (_LEGACY_SERVICE_NAMES) and the ExecStart content check further narrows matches. * fix(gateway): mark --replace SIGTERM as planned so target exits 0 PR #5646 made SIGTERM exit the gateway with code 1 so systemd's Restart=on-failure revives it after unexpected kills. But when a user has two gateway units fighting for the same bot token (e.g. legacy hermes.service + hermes-gateway.service from a pre-rename install), the --replace takeover itself becomes the 'unexpected' SIGTERM — the loser exits 1, systemd revives it 30s later, and the cycle flaps indefinitely. Before calling terminate_pid(), --replace now writes a short-lived marker file naming the target PID + start_time. The target's shutdown_signal_handler consumes the marker and, when it names this process, leaves _signal_initiated_shutdown=False so the final exit code stays 0. Staleness defences: - PID + start_time combo prevents PID reuse matching an old marker - Marker older than 60s is treated as stale and discarded - Marker is unlinked on first read even if it doesn't match this process - Replacer clears the marker post-loop + on permission-denied give-up
This commit is contained in:
parent
38436eb4e3
commit
07db20c72d
8 changed files with 1273 additions and 2 deletions
|
|
@ -6442,6 +6442,31 @@ For more help on a command:
|
|||
# gateway setup
|
||||
gateway_subparsers.add_parser("setup", help="Configure messaging platforms")
|
||||
|
||||
# gateway migrate-legacy
|
||||
gateway_migrate_legacy = gateway_subparsers.add_parser(
|
||||
"migrate-legacy",
|
||||
help="Remove legacy hermes.service units from pre-rename installs",
|
||||
description=(
|
||||
"Stop, disable, and remove legacy Hermes gateway unit files "
|
||||
"(e.g. hermes.service) left over from older installs. Profile "
|
||||
"units (hermes-gateway-<profile>.service) and unrelated "
|
||||
"third-party services are never touched."
|
||||
),
|
||||
)
|
||||
gateway_migrate_legacy.add_argument(
|
||||
"--dry-run",
|
||||
dest="dry_run",
|
||||
action="store_true",
|
||||
help="List what would be removed without doing it",
|
||||
)
|
||||
gateway_migrate_legacy.add_argument(
|
||||
"-y",
|
||||
"--yes",
|
||||
dest="yes",
|
||||
action="store_true",
|
||||
help="Skip the confirmation prompt",
|
||||
)
|
||||
|
||||
gateway_parser.set_defaults(func=cmd_gateway)
|
||||
|
||||
# =========================================================================
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue