mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-30 06:41:51 +00:00
The agent can now produce a chart, PDF, spreadsheet, or any other supported file type and have it land in Slack / Discord / Telegram / WhatsApp / etc. as a native attachment, just by mentioning the absolute path in its response. Same primitive works for kanban-worker completions: workers attach artifacts via kanban_complete(artifacts=[...]) and the gateway notifier uploads them alongside the completion message. Changes: - gateway/platforms/base.py: extract_local_files now covers PDFs, docx, spreadsheets (xlsx/csv/json/yaml), presentations (pptx), archives (zip/tar/gz), audio (mp3/wav/...), and html — not just images and video. Image/video extensions still embed inline; everything else routes to send_document via the existing dispatch partition in gateway/run.py. - tools/kanban_tools.py + hermes_cli/kanban_db.py: kanban_complete gains an explicit ``artifacts`` parameter. The handler stashes it in metadata.artifacts (for downstream workers) and the kernel promotes it onto the completed-event payload so the notifier can find it without a second SQL round-trip. - gateway/run.py: _kanban_notifier_watcher now calls a new helper _deliver_kanban_artifacts after sending the completion text. The helper reads payload.artifacts (preferred), falls back to scanning the payload summary and task.result with extract_local_files, then partitions images / videos / documents and uploads each via send_multiple_images / send_video / send_document. - website/docs/user-guide/features/deliverable-mode.md + sidebars.ts: user-facing docs page covering the extension list, the kanban artifacts pattern, and the MCP-for-connector-breadth recommendation. Tests: - tests/gateway/test_extract_local_files.py: 7 new test cases (documents, spreadsheets, presentations, audio, archives, html, chart-pdf canonical case). 44 passing, 0 regressions. - tests/tools/test_kanban_tools.py: 4 new cases covering the artifacts arg shape (list / string / merge with existing metadata / type rejection). 17 passing. - tests/hermes_cli/test_kanban_notify.py: 2 new cases covering full notifier → artifact-upload path and missing-file silent-skip. 12 passing. - E2E (real files, real kanban kernel, real BasePlatformAdapter): worker calls kanban_complete(artifacts=[png,pdf,csv]) → metadata + event payload land → notifier helper partitions correctly → send_multiple_images called once with the PNG, send_document called twice with PDF + CSV. What's NOT in this PR (deferred to follow-ups): - Ad-hoc "research this for two hours, ping the thread when done" slash command — covered today by kanban subscriptions; a dedicated slash command can ride a follow-up PR if needed. - Setup-wizard prompt for recommended MCP servers (Notion, GitHub, Linear, etc.) — docs page lists them; UI is a separate change. Plan and rationale captured in ~/.hermes/docs/perplexity-computer-parity.pdf (local doc, not shipped).
130 lines
5.4 KiB
Markdown
130 lines
5.4 KiB
Markdown
---
|
|
title: Deliverable Mode (Artifacts in Chat)
|
|
sidebar_label: Deliverable Mode
|
|
description: How the agent ships generated charts, PDFs, spreadsheets, and other files as native attachments in messaging platforms.
|
|
---
|
|
|
|
# Deliverable Mode
|
|
|
|
When Hermes Agent runs inside a messaging gateway (Slack, Discord, Telegram,
|
|
WhatsApp, Signal, etc.), it can deliver generated files directly into the
|
|
chat — not as paths the user has to copy, but as native attachments.
|
|
|
|
A chart shows up as an inline image. A PDF report shows up as a file
|
|
download. A spreadsheet uploads as `.xlsx`. The agent does not need to
|
|
write a `MEDIA:` tag or do anything special — it just generates the file
|
|
and mentions its absolute path in the response. The gateway picks the path
|
|
out of the text, removes it from the visible message, and uploads the
|
|
file natively.
|
|
|
|
## How it works
|
|
|
|
Three pieces fit together:
|
|
|
|
1. **The agent has tools that produce files.** `execute_code` for charts via
|
|
matplotlib, the `latex-pdf-report` skill for PDFs, the `powerpoint` skill
|
|
for decks, `image_generate` for images, `text_to_speech` for audio, and so
|
|
on.
|
|
|
|
2. **The gateway scans agent responses for file paths.** Any absolute path
|
|
(`/tmp/...`) or home-relative path (`~/...`) ending in a supported
|
|
extension gets extracted. Paths inside code blocks and inline code are
|
|
ignored so code samples are never mutilated.
|
|
|
|
3. **The gateway dispatches by file type.** Images embed inline where the
|
|
platform supports it; videos embed inline; audio routes to voice/audio
|
|
attachments; everything else uploads as a file attachment.
|
|
|
|
## Supported file extensions
|
|
|
|
| Category | Extensions | Delivery |
|
|
|---|---|---|
|
|
| Images | `.png .jpg .jpeg .gif .webp .bmp .tiff .svg` | Inline embed |
|
|
| Video | `.mp4 .mov .avi .mkv .webm` | Inline embed (where supported) |
|
|
| Audio | `.mp3 .wav .ogg .m4a .flac` | Voice / audio attachment |
|
|
| Documents | `.pdf .docx .doc .odt .rtf .txt .md` | File upload |
|
|
| Data | `.xlsx .xls .csv .tsv .json .xml .yaml .yml` | File upload |
|
|
| Presentations | `.pptx .ppt .odp` | File upload |
|
|
| Archives | `.zip .tar .gz .tgz .bz2 .7z` | File upload |
|
|
| Web | `.html .htm` | File upload |
|
|
|
|
`.py`, `.log`, and other source-file extensions are intentionally excluded so
|
|
the agent doesn't auto-ship arbitrary source files; if you want to send code
|
|
to the user, use a code block.
|
|
|
|
## Encouraging the agent to produce artifacts
|
|
|
|
The agent doesn't reach for artifacts by default — it has to know to.
|
|
Two ways to nudge it:
|
|
|
|
**Per-session:** ask explicitly ("send me the comparison as a chart",
|
|
"return the data as a CSV") or write your own custom-instructions /
|
|
personality entry that biases toward artifact-style replies on
|
|
messaging platforms.
|
|
|
|
**Project-level:** add the bias to `AGENTS.md` / `CLAUDE.md` /
|
|
`.cursorrules` in a project the agent works from, or to your global
|
|
custom instructions in `~/.hermes/config.yaml` under `agent.custom_instructions`.
|
|
|
|
The mechanic the agent has to use is simple: render the file to an
|
|
absolute path (e.g. `/tmp/q3-revenue.png`) and mention that path as
|
|
plain text in the reply. The gateway does the rest. Paths inside
|
|
fenced code blocks or backticks are ignored so code samples are never
|
|
mutilated.
|
|
|
|
## Kanban: artifacts ride completion notifications
|
|
|
|
If you use Hermes' kanban multi-agent workflow, workers can attach
|
|
deliverable files to their `kanban_complete` call:
|
|
|
|
```python
|
|
kanban_complete(
|
|
summary="rendered Q3 revenue chart and report",
|
|
artifacts=[
|
|
"/tmp/q3-revenue.png",
|
|
"/tmp/q3-report.pdf",
|
|
],
|
|
)
|
|
```
|
|
|
|
When the gateway notifier delivers the "task completed" message to whoever
|
|
subscribed to the task in Slack/Telegram/etc., it also uploads each artifact
|
|
as a native attachment to that chat. The human gets the deliverable and the
|
|
summary in one place.
|
|
|
|
Files that don't exist on disk when the notifier runs are silently skipped.
|
|
|
|
## Connecting more services with MCP
|
|
|
|
Beyond the artifact-delivery pipeline, the agent can reach into other
|
|
services via MCP (Model Context Protocol). The MCP ecosystem ships
|
|
community servers for most popular tools — install whichever you need:
|
|
|
|
| Service | What it unlocks |
|
|
|---|---|
|
|
| **Notion** | Read/write Notion pages, databases, query workspace |
|
|
| **GitHub** | Issues, PRs, comments, repo search beyond the gh CLI |
|
|
| **Linear** | Tickets, projects, cycles |
|
|
| **Slack** | Workspace-wide search, read other channels |
|
|
| **Gmail** | Inbox triage, send mail, label management |
|
|
| **Salesforce** | Leads, opportunities, account data |
|
|
| **Snowflake / BigQuery** | SQL against data warehouses |
|
|
| **Google Drive** | File search, contents, share management |
|
|
|
|
Install MCP servers via `~/.hermes/config.yaml` under the `mcp_servers`
|
|
section. See [MCP integration](./mcp.md) for the full setup guide.
|
|
|
|
## Comparison to Perplexity Computer in Slack
|
|
|
|
Perplexity Computer's Slack integration is built around the same idea:
|
|
the agent generates a deliverable (chart, PDF, slide deck) and posts it
|
|
back into the thread as a native attachment. Hermes Agent's deliverable
|
|
mode provides the same user-facing pattern locally:
|
|
|
|
- Generation happens in the user's own venv / sandbox (no remote tenant).
|
|
- Files land in the chat via the same Slack `files.uploadV2` API.
|
|
- Connector breadth comes via MCP rather than a curated catalog of 400
|
|
hosted integrations — install the ones you actually use.
|
|
|
|
OAuth tokens stay on the user's machine in `auth.json` / `.env`. No hosted
|
|
token storage. No multi-tenant microVM. Same end result.
|