fix(docker): tee supervised gateway stdout to docker logs

Follow-up to #33583 (the gateway-run-supervised redirect).

Before this fix, the supervised gateway's stdout (most visibly the
"Hermes Gateway Starting…" rich-console banner) was swallowed by
`s6-log` into the rotated file at
`${HERMES_HOME}/logs/gateways/<profile>/current` and never reached
`docker logs`. Operational signal lived in two places:

  * **docker logs** — saw stderr (Python `logging` defaults to
    stderr), so warnings/errors were visible.
  * **the rotated file** — saw stdout (rich banners, `print()`
    output, third-party libs that wrote to fd 1).

This was surprising for users coming from the pre-s6 image, where
`docker run … gateway run` produced a single unified stream in
`docker logs`. They'd see partial output, conclude something was
broken, and dig around for the missing pieces.

Fix: add the `1` s6-log action directive before the file destination
so each line is forwarded to s6-log's stdout — which propagates up
the s6-supervise pipeline to /init's stdout = container stdout =
`docker logs`. The file destination is preserved as a second
destination, so the rotated log (with ISO 8601 timestamps) still
exists for `hermes logs` and for survival across container restarts.

Trade-off considered: timestamps. Putting `T` between `1` and the
file destination (not before `1`) means:

  * docker logs sees raw lines — Python's logging formatter has its
    own timestamps, and `docker logs --timestamps` adds another
    layer when desired. No double-stamping in the common reading
    path.
  * The persisted file gets s6-log's ISO 8601 timestamp so even
    output that lacked a Python-logger timestamp (rich banners,
    third-party raw prints) is correlatable in `current`.

Verification:

  * New unit-test assertion in `test_service_manager.py` locks the
    `s6-log 1` directive into the rendered run-script. Mutation-
    tested by reverting to the pre-fix script (no `1`); the assert
    catches it cleanly.
  * New docker-harness test `test_supervised_gateway_stdout_reaches_docker_logs`
    builds the image, runs `docker run … gateway run`, and asserts
    the unique `⚕` banner glyph reaches `docker logs`. Also verifies
    the rotated file still contains the banner (no regression on
    the existing file destination). Mutation-tested end-to-end: built
    a deliberately-broken image without the `1` directive and the
    test failed exactly as designed, citing the banner present in
    `current` but absent from `docker logs`.
  * `website/docs/user-guide/docker.md` gains a new `:::note Where
    gateway logs go` admonition documenting both destinations and
    the audit-log file at `${HERMES_HOME}/logs/container-boot.log`.

Existing functionality preserved: every other docker-harness test
still passes against the new image. Unit-test sweep across
`tests/hermes_cli/` (5561 tests) is green.
This commit is contained in:
Ben Barclay 2026-05-28 13:06:11 +10:00 committed by Ben Barclay
parent 912e6e2274
commit b345323195
4 changed files with 118 additions and 1 deletions

View file

@ -327,3 +327,69 @@ def test_dashboard_supervised_when_env_set(
assert _svstat_wants_up(container_name, "dashboard"), (
f"dashboard slot not up: {_svstat(container_name, 'dashboard')!r}"
)
def test_supervised_gateway_stdout_reaches_docker_logs(
built_image: str, container_name: str,
) -> None:
"""The supervised gateway's stdout — including the rich-console
startup banner must reach ``docker logs``, not just the rotated
log file under ``${HERMES_HOME}/logs/gateways/<profile>/current``.
Without the ``1`` action directive in ``_render_log_run``, s6-log
swallows the gateway's stdout into the file and ``docker logs``
only sees stderr (Python ``logging`` defaults to stderr). That's
a poor user experience: the iconic "Hermes Gateway Starting…"
banner with the symbol is the most visible "yes, your gateway
started" signal, and forcing users to ``docker exec`` + ``tail``
the log file just to see it is friction users don't expect.
With the ``1`` directive, s6-log forwards every line to its own
stdout (which propagates up through the s6-supervise pipeline to
/init's stdout = container stdout = ``docker logs``) AND also
writes a timestamped copy to the rotated file. Best of both.
We assert by looking for the literal banner glyph (````) a
distinctive character that won't appear in stderr-routed
Python-logging output, so its presence in ``docker logs`` proves
the stdout-tee is working.
"""
subprocess.run(
["docker", "run", "-d", "--name", container_name, built_image,
"gateway", "run"],
check=True, capture_output=True, timeout=30,
)
# Banner is printed during gateway startup — give it time to
# initialize past the imports + config-load phase.
time.sleep(8)
logs = subprocess.run(
["docker", "logs", container_name],
capture_output=True, text=True, timeout=10,
)
combined = logs.stdout + logs.stderr
# The banner ⚕ symbol is the load-bearing assertion — it's unique
# to gateway startup stdout output and won't appear in stderr
# (Python logging) or s6 boot messages.
assert "" in combined or "Hermes Gateway Starting" in combined, (
"Supervised gateway's stdout banner did not reach docker logs. "
"This means the `1` action directive in _render_log_run isn't "
"forwarding stdout to /init. "
f"docker logs (last 2000 chars):\n{combined[-2000:]}\n"
f"file contents:\n{_sh(container_name, 'cat /opt/data/logs/gateways/default/current').stdout}"
)
# Cross-check: the same banner must also be in the rotated log
# file (we kept the file destination, just added stdout). The
# file version has s6-log's ISO 8601 timestamp prefix; the
# docker logs version is raw.
file_contents = _sh(
container_name, "cat /opt/data/logs/gateways/default/current",
).stdout
assert "" in file_contents or "Hermes Gateway Starting" in file_contents, (
"Banner also missing from rotated log file — the file "
"destination may have been dropped by the new s6-log script. "
f"File contents:\n{file_contents}"
)