hermes-agent/plugins/google_meet
Teknium 3dfb357001 feat(cross-platform): psutil for PID/process management + Windows footgun checker
## Why

Hermes supports Linux, macOS, and native Windows, but the codebase grew up
POSIX-first and has accumulated patterns that silently break (or worse,
silently kill!) on Windows:

- `os.kill(pid, 0)` as a liveness probe — on Windows this maps to
  CTRL_C_EVENT and broadcasts Ctrl+C to the target's entire console
  process group (bpo-14484, open since 2012).
- `os.killpg` — doesn't exist on Windows at all (AttributeError).
- `os.setsid` / `os.getuid` / `os.geteuid` — same.
- `signal.SIGKILL` / `signal.SIGHUP` / `signal.SIGUSR1` — module-attr
  errors at runtime on Windows.
- `open(path)` / `open(path, "r")` without explicit encoding= — inherits
  the platform default, which is cp1252/mbcs on Windows (UTF-8 on POSIX),
  causing mojibake round-tripping between hosts.
- `wmic` — removed from Windows 10 21H1+.

This commit does three things:

1. Makes `psutil` a core dependency and migrates critical callsites to it.
2. Adds a grep-based CI gate (`scripts/check-windows-footguns.py`) that
   blocks new instances of any of the above patterns.
3. Fixes every existing instance in the codebase so the baseline is clean.

## What changed

### 1. psutil as a core dependency (pyproject.toml)

Added `psutil>=5.9.0,<8` to core deps. psutil is the canonical
cross-platform answer for "is this PID alive" and "kill this process
tree" — its `pid_exists()` uses `OpenProcess + GetExitCodeProcess` on
Windows (NOT a signal call), and its `Process.children(recursive=True)`
+ `.kill()` combo replaces `os.killpg()` portably.

### 2. `gateway/status.py::_pid_exists`

Rewrote to call `psutil.pid_exists()` first, falling back to the
hand-rolled ctypes `OpenProcess + WaitForSingleObject` dance on Windows
(and `os.kill(pid, 0)` on POSIX) only if psutil is somehow missing —
e.g. during the scaffold phase of a fresh install before pip finishes.

### 3. `os.killpg` migration to psutil (7 callsites, 5 files)

- `tools/code_execution_tool.py`
- `tools/process_registry.py`
- `tools/tts_tool.py`
- `tools/environments/local.py` (3 sites kept as-is, suppressed with
  `# windows-footgun: ok` — the pgid semantics psutil can't replicate,
  and the calls are already Windows-guarded at the outer branch)
- `gateway/platforms/whatsapp.py`

### 4. `scripts/check-windows-footguns.py` (NEW, 500 lines)

Grep-based checker with 11 rules covering every Windows cross-platform
footgun we've hit so far:

1. `os.kill(pid, 0)` — the silent killer
2. `os.setsid` without guard
3. `os.killpg` (recommends psutil)
4. `os.getuid` / `os.geteuid` / `os.getgid`
5. `os.fork`
6. `signal.SIGKILL`
7. `signal.SIGHUP/SIGUSR1/SIGUSR2/SIGALRM/SIGCHLD/SIGPIPE/SIGQUIT`
8. `subprocess` shebang script invocation
9. `wmic` without `shutil.which` guard
10. Hardcoded `~/Desktop` (OneDrive trap)
11. `asyncio.add_signal_handler` without try/except
12. `open()` without `encoding=` on text mode

Features:
- Triple-quoted-docstring aware (won't flag prose inside docstrings)
- Trailing-comment aware (won't flag mentions in `# os.kill(pid, 0)` comments)
- Guard-hint aware (skips lines with `hasattr(os, ...)`,
  `shutil.which(...)`, `if platform.system() != 'Windows'`, etc.)
- Inline suppression with `# windows-footgun: ok — <reason>`
- `--list` to print all rules with fixes
- `--all` / `--diff <ref>` / staged-files (default) modes
- Scans 380 files in under 2 seconds

### 5. CI integration

A GitHub Actions workflow that runs the checker on every PR and push is
staged at `/tmp/hermes-stash/windows-footguns.yml` — not included in this
commit because the GH token on the push machine lacks `workflow` scope.
A maintainer with `workflow` permissions should add it as
`.github/workflows/windows-footguns.yml` in a follow-up. Content:

```yaml
name: Windows footgun check
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: {python-version: "3.11"}
      - run: python scripts/check-windows-footguns.py --all
```

### 6. CONTRIBUTING.md — "Cross-Platform Compatibility" expansion

Expanded from 5 to 16 rules, each with message, example, and fix.
Recommends psutil as the preferred API for PID / process-tree operations.

### 7. Baseline cleanup (91 → 0 findings)

- 14 `open()` sites → added `encoding='utf-8'` (internal logs/caches) or
  `encoding='utf-8-sig'` (user-editable files that Notepad may BOM)
- 23 POSIX-only callsites in systemd helpers, pty_bridge, and plugin
  tool subprocess management → annotated with
  `# windows-footgun: ok — <reason>`
- 7 `os.killpg` sites → migrated to psutil (see §3 above)

## Verification

```
$ python scripts/check-windows-footguns.py --all
✓ No Windows footguns found (380 file(s) scanned).

$ python -c "from gateway.status import _pid_exists; import os
> print('self:', _pid_exists(os.getpid())); print('bogus:', _pid_exists(999999))"
self: True
bogus: False
```

Proof-of-repro that `os.kill(pid, 0)` was actually killing processes
before this fix — see commit `1cbe39914` and bpo-14484. This commit
removes the last hand-rolled ctypes path from the hot liveness-check
path and defers to the best-maintained cross-platform answer.
2026-05-08 12:57:33 -07:00
..
node fix(security): bind Meet node server to localhost and restrict token file to owner read 2026-05-04 01:42:59 -07:00
realtime feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 12:57:33 -07:00
__init__.py feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364) 2026-04-27 06:22:25 -07:00
audio_bridge.py feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364) 2026-04-27 06:22:25 -07:00
cli.py feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364) 2026-04-27 06:22:25 -07:00
meet_bot.py feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364) 2026-04-27 06:22:25 -07:00
plugin.yaml feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364) 2026-04-27 06:22:25 -07:00
process_manager.py feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 12:57:33 -07:00
README.md feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364) 2026-04-27 06:22:25 -07:00
SKILL.md feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364) 2026-04-27 06:22:25 -07:00
tools.py feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364) 2026-04-27 06:22:25 -07:00

google_meet plugin

Let the hermes agent join a Google Meet call, transcribe it, optionally speak in it, and do the followup work afterwards.

What ships

Version What Status
v1 Transcribe-only: Playwright joins Meet, scrapes captions to transcript file ✓ ships by default
v2 Realtime duplex audio: bot speaks in-call via OpenAI Realtime + BlackHole/PulseAudio null-sink ✓ opt in with mode='realtime'
v3 Remote node host: run the bot on a different machine than the gateway ✓ opt in with node='<name>'

Architecture

┌─ gateway (Linux box, where hermes runs) ────────────────────────────┐
│                                                                      │
│   agent → meet_join(url, mode='realtime', node='my-mac')             │
│         │                                                            │
│         └─ NodeClient ─── ws ────┐                                   │
│                                  │                                   │
└──────────────────────────────────┼───────────────────────────────────┘
                                   │ wss (token auth)
                                   ▼
┌─ node host (user's Mac, signed-in Chrome lives here) ───────────────┐
│                                                                      │
│   NodeServer (from `hermes meet node run`)                           │
│     │                                                                │
│     ├─ start_bot → process_manager.start() → spawns meet_bot         │
│     │                                                                │
│     └─ meet_bot (Playwright)                                         │
│        ├─ Chromium → meet.google.com                                 │
│        ├─ caption scraper → transcript.txt                           │
│        └─ (realtime mode only) RealtimeSpeaker thread                │
│             ↓                                                        │
│           OpenAI Realtime WS → speaker.pcm                           │
│             ↓                                                        │
│           paplay → null-sink ← Chrome fake mic                       │
│                                                                      │
└──────────────────────────────────────────────────────────────────────┘

Without v3: the whole right column runs on the gateway machine. Without v2: the "realtime" path is skipped; transcribe runs alone.

Files

Path Purpose
plugin.yaml manifest
__init__.py register(ctx) — registers 5 tools + on_session_end hook + hermes meet CLI
meet_bot.py Playwright bot subprocess (standalone, python -m plugins.google_meet.meet_bot)
process_manager.py local bot lifecycle + enqueue_say
tools.py agent-facing tools + node-routing helper
cli.py hermes meet setup / auth / join / status / transcript / say / stop / node ...
audio_bridge.py v2: PulseAudio null-sink (Linux) + BlackHole probe (macOS)
realtime/openai_client.py v2: RealtimeSession + RealtimeSpeaker (file-queue → OpenAI Realtime WS → PCM)
node/protocol.py v3: message envelope + validation
node/registry.py v3: $HERMES_HOME/workspace/meetings/nodes.json
node/server.py v3: NodeServer (runs on host machine)
node/client.py v3: NodeClient (used by tool handlers + CLI on gateway)
node/cli.py v3: hermes meet node {run,list,approve,remove,status,ping}
SKILL.md agent usage guide

Local quick start

hermes plugins enable google_meet
hermes meet install                                      # pip + Chromium
hermes meet setup                                        # preflight
hermes meet auth                                         # optional
hermes meet join https://meet.google.com/abc-defg-hij    # transcribe

Realtime mode

Linux (preferred, most automated):

hermes meet install --realtime                     # installs pulseaudio-utils
echo 'OPENAI_API_KEY=sk-...' >> ~/.hermes/.env
hermes meet join https://meet.google.com/abc-defg-hij --mode realtime
# then from the agent or CLI:
hermes meet say "Good morning everyone, I'm the note-taker bot."

macOS:

hermes meet install --realtime     # runs: brew install blackhole-2ch ffmpeg
# then — manually! — open System Settings → Sound → Input → BlackHole 2ch
echo 'OPENAI_API_KEY=sk-...' >> ~/.hermes/.env
hermes meet join https://meet.google.com/abc-defg-hij --mode realtime

On macOS, hermes will not switch your system audio input automatically — the user has to do it. This is deliberate: switching default input on a whim would be a surprising side effect.

Remote node host

On the node machine (e.g. user's Mac with a signed-in Chrome):

pip install playwright websockets
python -m playwright install chromium
hermes plugins enable google_meet
hermes meet node run --display-name my-mac --host 0.0.0.0 --port 18789
# prints the bearer token on first run; copy it

On the gateway:

hermes meet node approve my-mac ws://<mac-ip>:18789 <token>
hermes meet node ping my-mac
# now any meet_* tool call accepts node='my-mac' (or 'auto')

Safety

  • URL gate: only https://meet.google.com/abc-defg-hij, /new, /lookup/<id>.
  • No calendar scanning, no auto-dial, no auto-consent announcement.
  • Node server uses bearer-token auth; no key exchange, no TLS termination built in — run it on a LAN or behind a reverse proxy you trust.
  • One active meeting per (gateway, node) pair. A second meet_join leaves the first.
  • meet_say refuses unless the active meeting was started with mode='realtime'.

Out of scope

  • Calendar scanning — deliberately not implemented. Join URLs must be explicit.
  • Multi-tenant node sharing — a node serves one gateway at a time.
  • Windows — audio bridging isn't tested; register() no-ops on Windows.
  • System audio input switching on macOS — user responsibility, not the bot's.