hermes-agent/website/docs/guides/operate-teams-meeting-pipeline.md
Teknium 242da9db96 docs(teams-pipeline): cron renewal recipe, sidebar wiring, skill rewrite
Fifth and final slice polish on top of @dlkakbs's docs + skill. Three
things ship here:

1. Subscription renewal cron recipe (the #1 operational footgun).

   Microsoft Graph webhook subscriptions expire at 72 hours max and
   don't auto-renew. The shipped operator runbook mentioned
   `maintain-subscriptions --dry-run` as a "daily or periodic check"
   but never told operators how to actually automate it. Without a
   scheduled job, any production deployment silently stops ingesting
   meetings three days after go-live.

   Adds an "Automating subscription renewal (REQUIRED for production)"
   section to website/docs/guides/operate-teams-meeting-pipeline.md
   with three concrete options and copy-pasteable configs:

   - Option 1: Hermes cron (`hermes cron add --schedule "0 */12 * * *"
     --script-only --command "hermes teams-pipeline maintain-subscriptions"`)
   - Option 2: systemd service + timer (12h cadence, Persistent=true
     so missed runs catch up after reboots)
   - Option 3: plain crontab with a wrapper that sources .env for
     credentials

   Go-Live Checklist gains a bolded mandatory item for the schedule
   being in place, with a cross-link to the section.

   website/docs/user-guide/messaging/teams-meetings.md adds a
   `::⚠️::` admonition right after the manual `subscribe`
   examples so anyone who creates a subscription manually is told
   the same day that it will silently expire in 72 hours.

2. Sidebar wiring. Shela's new docs pages (teams-meetings.md and
   operate-teams-meeting-pipeline.md) weren't in website/sidebars.ts,
   so they were orphaned URLs — reachable only if someone knew the
   path. Wired teams-meetings into Messaging Platforms next to the
   existing teams entry, and operate-teams-meeting-pipeline into
   Guides & Tutorials next to microsoft-graph-app-registration from
   PR #21922. Adjacent placement keeps the related pages discoverable
   from each other.

3. SKILL.md rewrite (v1.0.0 → v1.1.0).

   The original skill had five Turkish-only trigger phrases, which
   works in a Turkish-speaking session but doesn't match English
   triggers. Rewrote the skill to:

   - Describe triggers by intent instead of exact phrases, with
     explicit "works in any language" framing and example phrases
     in both English and Turkish.
   - Add a Decision Tree section covering the three most common user
     asks (missing summary, setup verification, re-run request) and
     the specific CLI command sequence for each.
   - Add a dedicated "Critical pitfall: Graph subscriptions expire
     in 72 hours" section that tells the agent exactly what to do
     when a user reports "worked yesterday, nothing today" — the
     most common operational failure mode.
   - Expand the command reference into three labeled groups (Status
     and inspection / Re-running and debugging / Subscription
     management) so the agent can reach for the right command
     without scanning.
   - Add cross-links to all four related docs pages (Azure app
     registration, webhook listener setup, full pipeline setup,
     operator runbook).

Validation:
- npm run build: all new pages route, anchor to
  #automating-subscription-renewal-required-for-production resolves
  from both the runbook TOC and the teams-meetings.md admonition.
- scripts/run_tests.sh on the relevant test suites (607 tests): all
  pass.
2026-05-08 12:41:41 -07:00

277 lines
7.9 KiB
Markdown

---
title: "Operate the Teams Meeting Pipeline"
description: "Runbook, go-live checklist, and operator worksheet for the Microsoft Teams meeting pipeline"
---
# Operate the Teams Meeting Pipeline
Use this guide after you have already enabled the feature from [Teams Meetings](/docs/user-guide/messaging/teams-meetings).
This page covers:
- operator CLI flows
- routine subscription maintenance
- failure triage
- go-live checks
- rollout worksheet
## Core Operator Commands
### Validate the config snapshot
```bash
hermes teams-pipeline validate
```
Use this first after any config change.
### Inspect token health
```bash
hermes teams-pipeline token-health
hermes teams-pipeline token-health --force-refresh
```
Use `--force-refresh` when you suspect stale auth state.
### Inspect subscriptions
```bash
hermes teams-pipeline subscriptions
```
### Renew near-expiry subscriptions
```bash
hermes teams-pipeline maintain-subscriptions
hermes teams-pipeline maintain-subscriptions --dry-run
```
### Automating subscription renewal (REQUIRED for production)
**Microsoft Graph subscriptions expire in at most 72 hours.** If nothing renews them, meeting notifications silently stop after 3 days and the pipeline looks "broken." This is the #1 operational failure mode for any Graph-backed integration.
You MUST run `maintain-subscriptions` on a schedule. Pick one of these three options:
#### Option 1: Hermes cron (recommended if you already run the Hermes gateway)
Hermes ships a built-in cron scheduler. Add a script-only cron job that runs every 12 hours (gives 6x headroom against the 72h expiry window):
```bash
hermes cron add \
--name "teams-pipeline-maintain-subscriptions" \
--schedule "0 */12 * * *" \
--script-only \
--command "hermes teams-pipeline maintain-subscriptions"
```
Verify it was registered and inspect the next run time:
```bash
hermes cron list
hermes cron show teams-pipeline-maintain-subscriptions
```
#### Option 2: systemd timer (recommended for Linux production deployments)
Create `/etc/systemd/system/hermes-teams-pipeline-maintain.service`:
```ini
[Unit]
Description=Hermes Teams pipeline subscription maintenance
After=network-online.target
[Service]
Type=oneshot
User=hermes
EnvironmentFile=/etc/hermes/env
ExecStart=/usr/local/bin/hermes teams-pipeline maintain-subscriptions
```
And `/etc/systemd/system/hermes-teams-pipeline-maintain.timer`:
```ini
[Unit]
Description=Run Hermes Teams pipeline subscription maintenance every 12 hours
[Timer]
OnBootSec=5min
OnUnitActiveSec=12h
Persistent=true
[Install]
WantedBy=timers.target
```
Enable:
```bash
sudo systemctl daemon-reload
sudo systemctl enable --now hermes-teams-pipeline-maintain.timer
systemctl list-timers hermes-teams-pipeline-maintain.timer
```
#### Option 3: Plain crontab
```cron
0 */12 * * * /usr/local/bin/hermes teams-pipeline maintain-subscriptions >> /var/log/hermes/teams-pipeline-maintain.log 2>&1
```
Make sure the cron environment has the `MSGRAPH_*` credentials. Simplest fix: source `~/.hermes/.env` at the top of a wrapper script that crontab calls.
#### Verifying renewal is working
After you've set up the schedule, check renewal activity after the first scheduled run:
```bash
hermes teams-pipeline subscriptions # should show expirationDateTime advanced
hermes teams-pipeline maintain-subscriptions --dry-run # should show "0 expiring soon" most of the time
```
If you ever see your Graph webhook mysteriously "stop working" after exactly ~72 hours, this is the first thing to check: did the renewal job actually run?
### Inspect recent jobs
```bash
hermes teams-pipeline list
hermes teams-pipeline list --status failed
hermes teams-pipeline show <job-id>
```
### Replay a stored job
```bash
hermes teams-pipeline run <job-id>
```
### Dry-run meeting artifact fetches
```bash
hermes teams-pipeline fetch --meeting-id <meeting-id>
hermes teams-pipeline fetch --join-web-url "<join-url>"
```
## Routine Runbook
### After first setup
Run these in order:
```bash
hermes teams-pipeline validate
hermes teams-pipeline token-health --force-refresh
hermes teams-pipeline subscriptions
```
Then trigger or wait for a real meeting event and confirm:
```bash
hermes teams-pipeline list
hermes teams-pipeline show <job-id>
```
### Daily or periodic checks
- run `hermes teams-pipeline maintain-subscriptions --dry-run`
- inspect `hermes teams-pipeline list --status failed`
- verify the Teams delivery target is still the correct chat or channel
### Before changing webhook URLs or delivery targets
- update the public notification URL or Teams target config
- run `hermes teams-pipeline validate`
- renew or recreate affected subscriptions
- confirm new events land in the expected sink
## Failure Triage
### No jobs are being created
Check:
- `msgraph_webhook` is enabled
- the public notification URL points to `/msgraph/webhook`
- the client state in the subscription matches `MSGRAPH_WEBHOOK_CLIENT_STATE`
- subscriptions still exist remotely and are not expired
### Jobs stay in retry or fail before summarization
Check:
- transcript permissions and availability
- recording permissions and artifact availability
- `ffmpeg` availability if recording fallback is enabled
- Graph token health
### Summaries are produced but not delivered to Teams
Check:
- `platforms.teams.enabled: true`
- `delivery_mode`
- `incoming_webhook_url` for webhook mode
- `chat_id` or `team_id` plus `channel_id` for Graph mode
- Teams auth config if Graph posting is used
### Duplicate or unexpected replays
Check:
- whether you manually replayed a job with `hermes teams-pipeline run`
- whether the sink record already exists for that meeting
- whether you intentionally enabled a resend path in your local config
## Go-Live Checklist
- [ ] Graph credentials are present and correct
- [ ] `msgraph_webhook` is enabled and reachable from the public internet
- [ ] `MSGRAPH_WEBHOOK_CLIENT_STATE` is set and matches subscriptions
- [ ] transcript subscription is created
- [ ] recording subscription is created if STT fallback is required
- [ ] `ffmpeg` is installed if recording fallback is enabled
- [ ] Teams outbound delivery target is configured and verified
- [ ] Notion and Linear sinks are configured only if actually needed
- [ ] `hermes teams-pipeline validate` returns an OK snapshot
- [ ] `hermes teams-pipeline token-health --force-refresh` succeeds
- [ ] **`maintain-subscriptions` is scheduled** (Hermes cron, systemd timer, or crontab — see [Automating subscription renewal](#automating-subscription-renewal-required-for-production)). Without this, Graph subscriptions silently expire within 72 hours.
- [ ] a real end-to-end meeting event has produced a stored job
- [ ] at least one summary has reached the intended delivery sink
## Delivery-Mode Decision Guide
| Mode | Use when | Tradeoff |
|------|----------|----------|
| `incoming_webhook` | you only need simple posting into Teams | simplest setup, less control |
| `graph` | you need channel or chat posting through Graph | more control, more auth and target config |
## Operator Worksheet
Fill this out before rollout:
| Item | Value |
|------|-------|
| Public notification URL | |
| Graph tenant ID | |
| Graph client ID | |
| Webhook client state | |
| Transcript resource subscription | |
| Recording resource subscription | |
| Teams delivery mode | |
| Teams chat ID or team/channel | |
| Notion database ID | |
| Linear team ID | |
| Store path override, if any | |
| Owner for daily checks | |
## Change Review Worksheet
Use this before changing the deployment:
| Question | Answer |
|----------|--------|
| Are we changing the public webhook URL? | |
| Are we rotating Graph credentials? | |
| Are we changing Teams delivery mode? | |
| Are we moving to a new Teams chat or channel? | |
| Do subscriptions need to be recreated or renewed? | |
| Do we need a fresh end-to-end verification run? | |
## Related Docs
- [Teams Meetings setup](/docs/user-guide/messaging/teams-meetings)
- [Microsoft Teams bot setup](/docs/user-guide/messaging/teams)