hermes-agent/plugins/google_meet
Teknium 1cbe399149 fix(windows): os.kill(pid, 0) is NOT a no-op on Windows — route through new _pid_exists helper
On Windows, Python's ``os.kill(pid, 0)`` is NOT a no-op. CPython's
implementation (``Modules/posixmodule.c::os_kill_impl``) treats sig=0
as ``CTRL_C_EVENT`` because the two integer values collide at the C
layer, and routes it through ``GenerateConsoleCtrlEvent(0, pid)`` —
which sends a Ctrl+C to the ENTIRE console process group containing
the target PID, not just the PID itself. Any caller that wanted to
check "is PID X alive" via the classic POSIX ``os.kill(pid, 0)``
idiom was silently killing that process (and often unrelated
processes in the same console group) on Windows. Long-standing
Python Windows quirk; see bpo-14484 (open since 2012).

This manifested in Hermes as: every ``hermes gateway status``
invocation would read the gateway's PID from the PID file, call
``os.kill(pid, 0)`` via ``gateway.status.get_running_pid()`` as a
"liveness check", and instantly terminate the gateway it was trying
to report on. No shutdown log, no traceback, no atexit hook fire,
no exit-diag entry — just silent termination of the detached pythonw
process. "Bot answered one message then stopped typing" was the
characteristic end-user symptom because `os.kill(pid, 0)` fires
mid-response-send and kills the gateway between logs.

Reproduction (verified in this branch before the fix):

  $ hermes gateway start       # gateway alive, PID 37520
  $ hermes gateway status      # reports "No gateway process detected"
  $ tasklist /FI "PID eq 37520"  # INFO: No tasks are running
                                 # — gateway terminated silently

Root-cause fix is a new ``gateway.status._pid_exists(pid)`` helper:

- On Windows: Win32 ``OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION |
  SYNCHRONIZE, False, pid)`` + ``WaitForSingleObject(handle, 0)``
  via ctypes. Zero signal delivery, zero console-group side effects.
  Pins ctypes return types to avoid DWORD-vs-signed-int parse bugs
  on WAIT_TIMEOUT (0x102). Distinguishes ERROR_INVALID_PARAMETER
  (PID gone) from ERROR_ACCESS_DENIED (alive but another user).
- On POSIX: the canonical ``os.kill(pid, 0)`` idiom that actually is
  a no-op there.

Then patch every ``os.kill(pid, 0)`` liveness-check callsite to
route through ``_pid_exists`` instead. Total 14 callsites across
11 files; every single one was a latent silent-kill on Windows:

  gateway/run.py:2810      — /restart watcher (inline subprocess)
  gateway/run.py:15195     — --replace wait loop
  gateway/status.py:572    — acquire_gateway_runtime_lock stale check
  gateway/status.py:828    — get_running_pid (THE killer for status)
  gateway/platforms/whatsapp.py:111
  hermes_cli/gateway.py:228, 522, 1012  — gateway-related drain loops
  hermes_cli/kanban_db.py:2826         — _pid_alive was claiming to
                                         be cross-platform but used
                                         os.kill(pid, 0) on Windows
  hermes_cli/main.py:5792        — CLI process-kill polling
  hermes_cli/profiles.py:782     — profile stop wait loop
  plugins/google_meet/process_manager.py:74
  tools/browser_tool.py:1215, 1255  — browser daemon ownership probes
  tools/mcp_tool.py:1255, 3374     — MCP stdio orphan tracking

The watcher source in gateway/run.py:2810 is a multi-line string
that gets spawned as an inline ``python -c "..."`` subprocess, so
it can't import gateway.status. The fix for that callsite inlines
the same ctypes probe directly into the watcher source.

Tested on Windows 10 with the hermes gateway + Telegram bot:
- gateway start → alive
- 5 consecutive ``hermes gateway status`` invocations → gateway
  alive after every one, same PID reported each time (37520, 21952)
- gateway.log shows uninterrupted operation; no spurious shutdown
  entries; cron ticker and kanban dispatcher still running on
  their 60-second cadence
- bot continues answering Telegram messages throughout

Ships alongside an exit-path diagnostic wrapper in
``hermes_cli/gateway.py::run_gateway()`` that captures every way
``asyncio.run(start_gateway(...))`` can return (success, SystemExit,
KeyboardInterrupt, BaseException, atexit) with full traceback to
``logs/gateway-exit-diag.log``. This was used to prove the gateway
was being hard-killed externally (no exit event fired) and should
be kept for future Windows debugging.

Refs: https://bugs.python.org/issue14484
See also: references/windows-subprocess-sigint-storm.md in
the hermes-agent skill.
2026-05-08 12:34:27 -07:00
..
node fix(security): bind Meet node server to localhost and restrict token file to owner read 2026-05-04 01:42:59 -07:00
realtime feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364) 2026-04-27 06:22:25 -07:00
__init__.py feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364) 2026-04-27 06:22:25 -07:00
audio_bridge.py feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364) 2026-04-27 06:22:25 -07:00
cli.py feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364) 2026-04-27 06:22:25 -07:00
meet_bot.py feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364) 2026-04-27 06:22:25 -07:00
plugin.yaml feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364) 2026-04-27 06:22:25 -07:00
process_manager.py fix(windows): os.kill(pid, 0) is NOT a no-op on Windows — route through new _pid_exists helper 2026-05-08 12:34:27 -07:00
README.md feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364) 2026-04-27 06:22:25 -07:00
SKILL.md feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364) 2026-04-27 06:22:25 -07:00
tools.py feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364) 2026-04-27 06:22:25 -07:00

google_meet plugin

Let the hermes agent join a Google Meet call, transcribe it, optionally speak in it, and do the followup work afterwards.

What ships

Version What Status
v1 Transcribe-only: Playwright joins Meet, scrapes captions to transcript file ✓ ships by default
v2 Realtime duplex audio: bot speaks in-call via OpenAI Realtime + BlackHole/PulseAudio null-sink ✓ opt in with mode='realtime'
v3 Remote node host: run the bot on a different machine than the gateway ✓ opt in with node='<name>'

Architecture

┌─ gateway (Linux box, where hermes runs) ────────────────────────────┐
│                                                                      │
│   agent → meet_join(url, mode='realtime', node='my-mac')             │
│         │                                                            │
│         └─ NodeClient ─── ws ────┐                                   │
│                                  │                                   │
└──────────────────────────────────┼───────────────────────────────────┘
                                   │ wss (token auth)
                                   ▼
┌─ node host (user's Mac, signed-in Chrome lives here) ───────────────┐
│                                                                      │
│   NodeServer (from `hermes meet node run`)                           │
│     │                                                                │
│     ├─ start_bot → process_manager.start() → spawns meet_bot         │
│     │                                                                │
│     └─ meet_bot (Playwright)                                         │
│        ├─ Chromium → meet.google.com                                 │
│        ├─ caption scraper → transcript.txt                           │
│        └─ (realtime mode only) RealtimeSpeaker thread                │
│             ↓                                                        │
│           OpenAI Realtime WS → speaker.pcm                           │
│             ↓                                                        │
│           paplay → null-sink ← Chrome fake mic                       │
│                                                                      │
└──────────────────────────────────────────────────────────────────────┘

Without v3: the whole right column runs on the gateway machine. Without v2: the "realtime" path is skipped; transcribe runs alone.

Files

Path Purpose
plugin.yaml manifest
__init__.py register(ctx) — registers 5 tools + on_session_end hook + hermes meet CLI
meet_bot.py Playwright bot subprocess (standalone, python -m plugins.google_meet.meet_bot)
process_manager.py local bot lifecycle + enqueue_say
tools.py agent-facing tools + node-routing helper
cli.py hermes meet setup / auth / join / status / transcript / say / stop / node ...
audio_bridge.py v2: PulseAudio null-sink (Linux) + BlackHole probe (macOS)
realtime/openai_client.py v2: RealtimeSession + RealtimeSpeaker (file-queue → OpenAI Realtime WS → PCM)
node/protocol.py v3: message envelope + validation
node/registry.py v3: $HERMES_HOME/workspace/meetings/nodes.json
node/server.py v3: NodeServer (runs on host machine)
node/client.py v3: NodeClient (used by tool handlers + CLI on gateway)
node/cli.py v3: hermes meet node {run,list,approve,remove,status,ping}
SKILL.md agent usage guide

Local quick start

hermes plugins enable google_meet
hermes meet install                                      # pip + Chromium
hermes meet setup                                        # preflight
hermes meet auth                                         # optional
hermes meet join https://meet.google.com/abc-defg-hij    # transcribe

Realtime mode

Linux (preferred, most automated):

hermes meet install --realtime                     # installs pulseaudio-utils
echo 'OPENAI_API_KEY=sk-...' >> ~/.hermes/.env
hermes meet join https://meet.google.com/abc-defg-hij --mode realtime
# then from the agent or CLI:
hermes meet say "Good morning everyone, I'm the note-taker bot."

macOS:

hermes meet install --realtime     # runs: brew install blackhole-2ch ffmpeg
# then — manually! — open System Settings → Sound → Input → BlackHole 2ch
echo 'OPENAI_API_KEY=sk-...' >> ~/.hermes/.env
hermes meet join https://meet.google.com/abc-defg-hij --mode realtime

On macOS, hermes will not switch your system audio input automatically — the user has to do it. This is deliberate: switching default input on a whim would be a surprising side effect.

Remote node host

On the node machine (e.g. user's Mac with a signed-in Chrome):

pip install playwright websockets
python -m playwright install chromium
hermes plugins enable google_meet
hermes meet node run --display-name my-mac --host 0.0.0.0 --port 18789
# prints the bearer token on first run; copy it

On the gateway:

hermes meet node approve my-mac ws://<mac-ip>:18789 <token>
hermes meet node ping my-mac
# now any meet_* tool call accepts node='my-mac' (or 'auto')

Safety

  • URL gate: only https://meet.google.com/abc-defg-hij, /new, /lookup/<id>.
  • No calendar scanning, no auto-dial, no auto-consent announcement.
  • Node server uses bearer-token auth; no key exchange, no TLS termination built in — run it on a LAN or behind a reverse proxy you trust.
  • One active meeting per (gateway, node) pair. A second meet_join leaves the first.
  • meet_say refuses unless the active meeting was started with mode='realtime'.

Out of scope

  • Calendar scanning — deliberately not implemented. Join URLs must be explicit.
  • Multi-tenant node sharing — a node serves one gateway at a time.
  • Windows — audio bridging isn't tested; register() no-ops on Windows.
  • System audio input switching on macOS — user responsibility, not the bot's.