hermes-agent/website/docs/user-guide/features/deliverable-mode.md
Teknium f2fdb9a178
feat(gateway): deliverable mode — ship artifacts as native uploads from any agent surface (#27813)
The agent can now produce a chart, PDF, spreadsheet, or any other supported
file type and have it land in Slack / Discord / Telegram / WhatsApp / etc.
as a native attachment, just by mentioning the absolute path in its
response. Same primitive works for kanban-worker completions: workers
attach artifacts via kanban_complete(artifacts=[...]) and the gateway
notifier uploads them alongside the completion message.

Changes:

- gateway/platforms/base.py: extract_local_files now covers PDFs, docx,
  spreadsheets (xlsx/csv/json/yaml), presentations (pptx), archives
  (zip/tar/gz), audio (mp3/wav/...), and html — not just images and video.
  Image/video extensions still embed inline; everything else routes to
  send_document via the existing dispatch partition in gateway/run.py.

- tools/kanban_tools.py + hermes_cli/kanban_db.py: kanban_complete gains
  an explicit ``artifacts`` parameter. The handler stashes it in
  metadata.artifacts (for downstream workers) and the kernel promotes
  it onto the completed-event payload so the notifier can find it
  without a second SQL round-trip.

- gateway/run.py: _kanban_notifier_watcher now calls a new helper
  _deliver_kanban_artifacts after sending the completion text. The
  helper reads payload.artifacts (preferred), falls back to scanning
  the payload summary and task.result with extract_local_files, then
  partitions images / videos / documents and uploads each via
  send_multiple_images / send_video / send_document.

- website/docs/user-guide/features/deliverable-mode.md + sidebars.ts:
  user-facing docs page covering the extension list, the kanban
  artifacts pattern, and the MCP-for-connector-breadth recommendation.

Tests:

- tests/gateway/test_extract_local_files.py: 7 new test cases
  (documents, spreadsheets, presentations, audio, archives, html,
  chart-pdf canonical case). 44 passing, 0 regressions.
- tests/tools/test_kanban_tools.py: 4 new cases covering the artifacts
  arg shape (list / string / merge with existing metadata / type
  rejection). 17 passing.
- tests/hermes_cli/test_kanban_notify.py: 2 new cases covering full
  notifier → artifact-upload path and missing-file silent-skip. 12
  passing.
- E2E (real files, real kanban kernel, real BasePlatformAdapter):
  worker calls kanban_complete(artifacts=[png,pdf,csv]) → metadata +
  event payload land → notifier helper partitions correctly →
  send_multiple_images called once with the PNG, send_document called
  twice with PDF + CSV.

What's NOT in this PR (deferred to follow-ups):

- Ad-hoc "research this for two hours, ping the thread when done"
  slash command — covered today by kanban subscriptions; a dedicated
  slash command can ride a follow-up PR if needed.
- Setup-wizard prompt for recommended MCP servers (Notion, GitHub,
  Linear, etc.) — docs page lists them; UI is a separate change.

Plan and rationale captured in ~/.hermes/docs/perplexity-computer-parity.pdf
(local doc, not shipped).
2026-05-18 02:14:43 -07:00

130 lines
5.4 KiB
Markdown

---
title: Deliverable Mode (Artifacts in Chat)
sidebar_label: Deliverable Mode
description: How the agent ships generated charts, PDFs, spreadsheets, and other files as native attachments in messaging platforms.
---
# Deliverable Mode
When Hermes Agent runs inside a messaging gateway (Slack, Discord, Telegram,
WhatsApp, Signal, etc.), it can deliver generated files directly into the
chat — not as paths the user has to copy, but as native attachments.
A chart shows up as an inline image. A PDF report shows up as a file
download. A spreadsheet uploads as `.xlsx`. The agent does not need to
write a `MEDIA:` tag or do anything special — it just generates the file
and mentions its absolute path in the response. The gateway picks the path
out of the text, removes it from the visible message, and uploads the
file natively.
## How it works
Three pieces fit together:
1. **The agent has tools that produce files.** `execute_code` for charts via
matplotlib, the `latex-pdf-report` skill for PDFs, the `powerpoint` skill
for decks, `image_generate` for images, `text_to_speech` for audio, and so
on.
2. **The gateway scans agent responses for file paths.** Any absolute path
(`/tmp/...`) or home-relative path (`~/...`) ending in a supported
extension gets extracted. Paths inside code blocks and inline code are
ignored so code samples are never mutilated.
3. **The gateway dispatches by file type.** Images embed inline where the
platform supports it; videos embed inline; audio routes to voice/audio
attachments; everything else uploads as a file attachment.
## Supported file extensions
| Category | Extensions | Delivery |
|---|---|---|
| Images | `.png .jpg .jpeg .gif .webp .bmp .tiff .svg` | Inline embed |
| Video | `.mp4 .mov .avi .mkv .webm` | Inline embed (where supported) |
| Audio | `.mp3 .wav .ogg .m4a .flac` | Voice / audio attachment |
| Documents | `.pdf .docx .doc .odt .rtf .txt .md` | File upload |
| Data | `.xlsx .xls .csv .tsv .json .xml .yaml .yml` | File upload |
| Presentations | `.pptx .ppt .odp` | File upload |
| Archives | `.zip .tar .gz .tgz .bz2 .7z` | File upload |
| Web | `.html .htm` | File upload |
`.py`, `.log`, and other source-file extensions are intentionally excluded so
the agent doesn't auto-ship arbitrary source files; if you want to send code
to the user, use a code block.
## Encouraging the agent to produce artifacts
The agent doesn't reach for artifacts by default — it has to know to.
Two ways to nudge it:
**Per-session:** ask explicitly ("send me the comparison as a chart",
"return the data as a CSV") or write your own custom-instructions /
personality entry that biases toward artifact-style replies on
messaging platforms.
**Project-level:** add the bias to `AGENTS.md` / `CLAUDE.md` /
`.cursorrules` in a project the agent works from, or to your global
custom instructions in `~/.hermes/config.yaml` under `agent.custom_instructions`.
The mechanic the agent has to use is simple: render the file to an
absolute path (e.g. `/tmp/q3-revenue.png`) and mention that path as
plain text in the reply. The gateway does the rest. Paths inside
fenced code blocks or backticks are ignored so code samples are never
mutilated.
## Kanban: artifacts ride completion notifications
If you use Hermes' kanban multi-agent workflow, workers can attach
deliverable files to their `kanban_complete` call:
```python
kanban_complete(
summary="rendered Q3 revenue chart and report",
artifacts=[
"/tmp/q3-revenue.png",
"/tmp/q3-report.pdf",
],
)
```
When the gateway notifier delivers the "task completed" message to whoever
subscribed to the task in Slack/Telegram/etc., it also uploads each artifact
as a native attachment to that chat. The human gets the deliverable and the
summary in one place.
Files that don't exist on disk when the notifier runs are silently skipped.
## Connecting more services with MCP
Beyond the artifact-delivery pipeline, the agent can reach into other
services via MCP (Model Context Protocol). The MCP ecosystem ships
community servers for most popular tools — install whichever you need:
| Service | What it unlocks |
|---|---|
| **Notion** | Read/write Notion pages, databases, query workspace |
| **GitHub** | Issues, PRs, comments, repo search beyond the gh CLI |
| **Linear** | Tickets, projects, cycles |
| **Slack** | Workspace-wide search, read other channels |
| **Gmail** | Inbox triage, send mail, label management |
| **Salesforce** | Leads, opportunities, account data |
| **Snowflake / BigQuery** | SQL against data warehouses |
| **Google Drive** | File search, contents, share management |
Install MCP servers via `~/.hermes/config.yaml` under the `mcp_servers`
section. See [MCP integration](./mcp.md) for the full setup guide.
## Comparison to Perplexity Computer in Slack
Perplexity Computer's Slack integration is built around the same idea:
the agent generates a deliverable (chart, PDF, slide deck) and posts it
back into the thread as a native attachment. Hermes Agent's deliverable
mode provides the same user-facing pattern locally:
- Generation happens in the user's own venv / sandbox (no remote tenant).
- Files land in the chat via the same Slack `files.uploadV2` API.
- Connector breadth comes via MCP rather than a curated catalog of 400
hosted integrations — install the ones you actually use.
OAuth tokens stay on the user's machine in `auth.json` / `.env`. No hosted
token storage. No multi-tenant microVM. Same end result.