mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-04 07:31:58 +00:00
feat(gateway): deliverable mode — ship artifacts as native uploads from any agent surface (#27813)
The agent can now produce a chart, PDF, spreadsheet, or any other supported file type and have it land in Slack / Discord / Telegram / WhatsApp / etc. as a native attachment, just by mentioning the absolute path in its response. Same primitive works for kanban-worker completions: workers attach artifacts via kanban_complete(artifacts=[...]) and the gateway notifier uploads them alongside the completion message. Changes: - gateway/platforms/base.py: extract_local_files now covers PDFs, docx, spreadsheets (xlsx/csv/json/yaml), presentations (pptx), archives (zip/tar/gz), audio (mp3/wav/...), and html — not just images and video. Image/video extensions still embed inline; everything else routes to send_document via the existing dispatch partition in gateway/run.py. - tools/kanban_tools.py + hermes_cli/kanban_db.py: kanban_complete gains an explicit ``artifacts`` parameter. The handler stashes it in metadata.artifacts (for downstream workers) and the kernel promotes it onto the completed-event payload so the notifier can find it without a second SQL round-trip. - gateway/run.py: _kanban_notifier_watcher now calls a new helper _deliver_kanban_artifacts after sending the completion text. The helper reads payload.artifacts (preferred), falls back to scanning the payload summary and task.result with extract_local_files, then partitions images / videos / documents and uploads each via send_multiple_images / send_video / send_document. - website/docs/user-guide/features/deliverable-mode.md + sidebars.ts: user-facing docs page covering the extension list, the kanban artifacts pattern, and the MCP-for-connector-breadth recommendation. Tests: - tests/gateway/test_extract_local_files.py: 7 new test cases (documents, spreadsheets, presentations, audio, archives, html, chart-pdf canonical case). 44 passing, 0 regressions. - tests/tools/test_kanban_tools.py: 4 new cases covering the artifacts arg shape (list / string / merge with existing metadata / type rejection). 17 passing. - tests/hermes_cli/test_kanban_notify.py: 2 new cases covering full notifier → artifact-upload path and missing-file silent-skip. 12 passing. - E2E (real files, real kanban kernel, real BasePlatformAdapter): worker calls kanban_complete(artifacts=[png,pdf,csv]) → metadata + event payload land → notifier helper partitions correctly → send_multiple_images called once with the PNG, send_document called twice with PDF + CSV. What's NOT in this PR (deferred to follow-ups): - Ad-hoc "research this for two hours, ping the thread when done" slash command — covered today by kanban subscriptions; a dedicated slash command can ride a follow-up PR if needed. - Setup-wizard prompt for recommended MCP servers (Notion, GitHub, Linear, etc.) — docs page lists them; UI is a separate change. Plan and rationale captured in ~/.hermes/docs/perplexity-computer-parity.pdf (local doc, not shipped).
This commit is contained in:
parent
dadc8aa255
commit
f2fdb9a178
9 changed files with 671 additions and 8 deletions
130
website/docs/user-guide/features/deliverable-mode.md
Normal file
130
website/docs/user-guide/features/deliverable-mode.md
Normal file
|
|
@ -0,0 +1,130 @@
|
|||
---
|
||||
title: Deliverable Mode (Artifacts in Chat)
|
||||
sidebar_label: Deliverable Mode
|
||||
description: How the agent ships generated charts, PDFs, spreadsheets, and other files as native attachments in messaging platforms.
|
||||
---
|
||||
|
||||
# Deliverable Mode
|
||||
|
||||
When Hermes Agent runs inside a messaging gateway (Slack, Discord, Telegram,
|
||||
WhatsApp, Signal, etc.), it can deliver generated files directly into the
|
||||
chat — not as paths the user has to copy, but as native attachments.
|
||||
|
||||
A chart shows up as an inline image. A PDF report shows up as a file
|
||||
download. A spreadsheet uploads as `.xlsx`. The agent does not need to
|
||||
write a `MEDIA:` tag or do anything special — it just generates the file
|
||||
and mentions its absolute path in the response. The gateway picks the path
|
||||
out of the text, removes it from the visible message, and uploads the
|
||||
file natively.
|
||||
|
||||
## How it works
|
||||
|
||||
Three pieces fit together:
|
||||
|
||||
1. **The agent has tools that produce files.** `execute_code` for charts via
|
||||
matplotlib, the `latex-pdf-report` skill for PDFs, the `powerpoint` skill
|
||||
for decks, `image_generate` for images, `text_to_speech` for audio, and so
|
||||
on.
|
||||
|
||||
2. **The gateway scans agent responses for file paths.** Any absolute path
|
||||
(`/tmp/...`) or home-relative path (`~/...`) ending in a supported
|
||||
extension gets extracted. Paths inside code blocks and inline code are
|
||||
ignored so code samples are never mutilated.
|
||||
|
||||
3. **The gateway dispatches by file type.** Images embed inline where the
|
||||
platform supports it; videos embed inline; audio routes to voice/audio
|
||||
attachments; everything else uploads as a file attachment.
|
||||
|
||||
## Supported file extensions
|
||||
|
||||
| Category | Extensions | Delivery |
|
||||
|---|---|---|
|
||||
| Images | `.png .jpg .jpeg .gif .webp .bmp .tiff .svg` | Inline embed |
|
||||
| Video | `.mp4 .mov .avi .mkv .webm` | Inline embed (where supported) |
|
||||
| Audio | `.mp3 .wav .ogg .m4a .flac` | Voice / audio attachment |
|
||||
| Documents | `.pdf .docx .doc .odt .rtf .txt .md` | File upload |
|
||||
| Data | `.xlsx .xls .csv .tsv .json .xml .yaml .yml` | File upload |
|
||||
| Presentations | `.pptx .ppt .odp` | File upload |
|
||||
| Archives | `.zip .tar .gz .tgz .bz2 .7z` | File upload |
|
||||
| Web | `.html .htm` | File upload |
|
||||
|
||||
`.py`, `.log`, and other source-file extensions are intentionally excluded so
|
||||
the agent doesn't auto-ship arbitrary source files; if you want to send code
|
||||
to the user, use a code block.
|
||||
|
||||
## Encouraging the agent to produce artifacts
|
||||
|
||||
The agent doesn't reach for artifacts by default — it has to know to.
|
||||
Two ways to nudge it:
|
||||
|
||||
**Per-session:** ask explicitly ("send me the comparison as a chart",
|
||||
"return the data as a CSV") or write your own custom-instructions /
|
||||
personality entry that biases toward artifact-style replies on
|
||||
messaging platforms.
|
||||
|
||||
**Project-level:** add the bias to `AGENTS.md` / `CLAUDE.md` /
|
||||
`.cursorrules` in a project the agent works from, or to your global
|
||||
custom instructions in `~/.hermes/config.yaml` under `agent.custom_instructions`.
|
||||
|
||||
The mechanic the agent has to use is simple: render the file to an
|
||||
absolute path (e.g. `/tmp/q3-revenue.png`) and mention that path as
|
||||
plain text in the reply. The gateway does the rest. Paths inside
|
||||
fenced code blocks or backticks are ignored so code samples are never
|
||||
mutilated.
|
||||
|
||||
## Kanban: artifacts ride completion notifications
|
||||
|
||||
If you use Hermes' kanban multi-agent workflow, workers can attach
|
||||
deliverable files to their `kanban_complete` call:
|
||||
|
||||
```python
|
||||
kanban_complete(
|
||||
summary="rendered Q3 revenue chart and report",
|
||||
artifacts=[
|
||||
"/tmp/q3-revenue.png",
|
||||
"/tmp/q3-report.pdf",
|
||||
],
|
||||
)
|
||||
```
|
||||
|
||||
When the gateway notifier delivers the "task completed" message to whoever
|
||||
subscribed to the task in Slack/Telegram/etc., it also uploads each artifact
|
||||
as a native attachment to that chat. The human gets the deliverable and the
|
||||
summary in one place.
|
||||
|
||||
Files that don't exist on disk when the notifier runs are silently skipped.
|
||||
|
||||
## Connecting more services with MCP
|
||||
|
||||
Beyond the artifact-delivery pipeline, the agent can reach into other
|
||||
services via MCP (Model Context Protocol). The MCP ecosystem ships
|
||||
community servers for most popular tools — install whichever you need:
|
||||
|
||||
| Service | What it unlocks |
|
||||
|---|---|
|
||||
| **Notion** | Read/write Notion pages, databases, query workspace |
|
||||
| **GitHub** | Issues, PRs, comments, repo search beyond the gh CLI |
|
||||
| **Linear** | Tickets, projects, cycles |
|
||||
| **Slack** | Workspace-wide search, read other channels |
|
||||
| **Gmail** | Inbox triage, send mail, label management |
|
||||
| **Salesforce** | Leads, opportunities, account data |
|
||||
| **Snowflake / BigQuery** | SQL against data warehouses |
|
||||
| **Google Drive** | File search, contents, share management |
|
||||
|
||||
Install MCP servers via `~/.hermes/config.yaml` under the `mcp_servers`
|
||||
section. See [MCP integration](./mcp.md) for the full setup guide.
|
||||
|
||||
## Comparison to Perplexity Computer in Slack
|
||||
|
||||
Perplexity Computer's Slack integration is built around the same idea:
|
||||
the agent generates a deliverable (chart, PDF, slide deck) and posts it
|
||||
back into the thread as a native attachment. Hermes Agent's deliverable
|
||||
mode provides the same user-facing pattern locally:
|
||||
|
||||
- Generation happens in the user's own venv / sandbox (no remote tenant).
|
||||
- Files land in the chat via the same Slack `files.uploadV2` API.
|
||||
- Connector breadth comes via MCP rather than a curated catalog of 400
|
||||
hosted integrations — install the ones you actually use.
|
||||
|
||||
OAuth tokens stay on the user's machine in `auth.json` / `.env`. No hosted
|
||||
token storage. No multi-tenant microVM. Same end result.
|
||||
Loading…
Add table
Add a link
Reference in a new issue