diff --git a/RELEASE_v0.14.0.md b/RELEASE_v0.14.0.md new file mode 100644 index 00000000000..38d40db8c69 --- /dev/null +++ b/RELEASE_v0.14.0.md @@ -0,0 +1,477 @@ +# Hermes Agent v0.14.0 (v2026.5.16) + +**Release Date:** May 16, 2026 +**Since v0.13.0:** 808 commits · 633 merged PRs · 1393 files changed · 165,061 insertions · 545 issues closed (12 P0, 50 P1) · 215 community contributors (including co-authors) + +> The Foundation Release — Hermes Agent installs and runs anywhere now. Native Windows ships in early beta with a full PowerShell installer story, a `pip install hermes-agent` wheel lands on PyPI, lazy-deps reshape what `pip install hermes-agent` actually pulls down, the supply-chain checker scans every install/upgrade for unsafe versions, and a new OpenAI-compatible local proxy lets Codex / Aider / Cline talk to OAuth-only providers (Claude Pro, ChatGPT Pro, SuperGrok). The cold-start wave shaves ~19 seconds off `hermes` launch, browser-tool CDP calls run 180x faster, and `hermes tools` All-Platforms drops from 14s to under 1.5s. Two new messaging platforms (LINE and SimpleX Chat) and a Microsoft Graph foundation (Teams pipeline + webhook adapter) land alongside `/handoff` that finally transfers sessions live, `vision_analyze` passing pixels through to vision-capable models, `x_search` as a first-class tool, LSP semantic diagnostics on every `write_file` / `patch`, a unified pluggable `video_generate`, a `computer_use` cua-driver backend, cross-session 1-hour Claude prompt caching, a per-turn file-mutation verifier, plus 9 new optional skills. 50+ P1 closures, 12 P0 closures. + +--- + +## ✨ Highlights + +- **Native Windows support (early beta)** — full PowerShell installer, native subprocess/PTY paths, taskkill-based process management, MinGit auto-install, Microsoft Store python stub detection, foreground Ctrl+C preservation, taskkill+ps2 fallback, npm prefix handling, and ~40 follow-up Windows-only fixes across CLI / gateway / TUI / curator / tools. Hermes finally runs natively on `cmd.exe` and PowerShell, no WSL required. ([#21561](https://github.com/NousResearch/hermes-agent/pull/21561), [#22130](https://github.com/NousResearch/hermes-agent/pull/22130), [#22752](https://github.com/NousResearch/hermes-agent/pull/22752), [#26618](https://github.com/NousResearch/hermes-agent/pull/26618), and many more) + +- **`pip install hermes-agent && hermes`** — Hermes Agent is now a real PyPI package. One command, no clone, no git, no shell installer. Wheel includes the Ink TUI bundle and shell launcher. (salvage of [#26350](https://github.com/NousResearch/hermes-agent/pull/26350)) ([#26593](https://github.com/NousResearch/hermes-agent/pull/26593)) + +- **Cold-start performance wave — ~19s off `hermes` launch** — skills cache, lazy Feishu import, no Nous HTTP at startup, plus PEP-562 lazy adapter imports (QQ, Yuanbao, Teams, Google Chat), deferred `fal_client` / `google-cloud` / `httpx` loads, models.dev disk-cache-first lookup, parallel doctor API checks, eager-skip plugin discovery on built-in subcommands, `hermes tools` All-Platforms drops from 14s to <1.5s, welcome banner skipped on `chat -q`. ([#22138](https://github.com/NousResearch/hermes-agent/pull/22138), [#22120](https://github.com/NousResearch/hermes-agent/pull/22120), [#22681](https://github.com/NousResearch/hermes-agent/pull/22681), [#22790](https://github.com/NousResearch/hermes-agent/pull/22790), [#22808](https://github.com/NousResearch/hermes-agent/pull/22808), [#22831](https://github.com/NousResearch/hermes-agent/pull/22831), [#22859](https://github.com/NousResearch/hermes-agent/pull/22859), [#22904](https://github.com/NousResearch/hermes-agent/pull/22904), [#22766](https://github.com/NousResearch/hermes-agent/pull/22766), [#25341](https://github.com/NousResearch/hermes-agent/pull/25341)) + +- **180x faster `browser_console` evaluations** — routed through the supervisor's persistent CDP WebSocket instead of spawning a fresh DevTools session per call. Real-world page interactions feel instant. ([#23226](https://github.com/NousResearch/hermes-agent/pull/23226)) + +- **Supply-chain advisory checker + lazy-deps framework + tiered install fallback** — every `pip install` / `hermes update` scans dependencies against an advisory list, lazy-deps replace heavy import-time loads with first-use installs, and the installer falls back through extras tiers when a wheel rejects on the target platform. ([#24220](https://github.com/NousResearch/hermes-agent/pull/24220)) + +- **OpenAI-compatible local proxy** — `hermes proxy` exposes any OAuth-authed provider (Claude Pro, ChatGPT Pro, SuperGrok) as an OpenAI-compatible endpoint that Codex / Aider / Cline / VS Code Continue can hit. Your subscription, your tools. ([#25969](https://github.com/NousResearch/hermes-agent/pull/25969)) + +- **Cross-session 1-hour Claude prompt cache** — Anthropic / OpenRouter / Nous Portal now share a 1h prefix cache across sessions for Claude models. Fast resume, fast `/new`, lower cost on repeat work. ([#23828](https://github.com/NousResearch/hermes-agent/pull/23828)) + +- **Two new messaging platforms — LINE + SimpleX Chat** — LINE Messaging API lands as a first-class platform, SimpleX Chat salvages #2558 onto the modern adapter spec. Hermes is now on 22 platforms. ([#23197](https://github.com/NousResearch/hermes-agent/pull/23197), [#26232](https://github.com/NousResearch/hermes-agent/pull/26232)) + +- **Microsoft Graph foundation — Teams pipeline + webhook adapter** — `msgraph` auth/client foundation, webhook listener platform, Teams pipeline plugin runtime, and Teams outbound delivery via the existing adapter — Hermes can now read and post to Teams. (salvages of #21408–#21411) ([#21922](https://github.com/NousResearch/hermes-agent/pull/21922), [#21969](https://github.com/NousResearch/hermes-agent/pull/21969), [#22007](https://github.com/NousResearch/hermes-agent/pull/22007), [#22024](https://github.com/NousResearch/hermes-agent/pull/22024)) + +- **`/handoff` actually transfers the session live** — the agent's active session moves to a different model / persona / profile mid-conversation, with messages, tool history, and context preserved. ([#23395](https://github.com/NousResearch/hermes-agent/pull/23395)) + +- **`x_search` — first-class X (Twitter) search tool** — gated tool with OAuth-or-API-key auth, no skill needed to query the timeline. ([#26763](https://github.com/NousResearch/hermes-agent/pull/26763)) + +- **`vision_analyze` returns pixels to vision-capable models** — when the active model can see, `vision_analyze` now hands the image straight through instead of falling back to a text description. ([#22955](https://github.com/NousResearch/hermes-agent/pull/22955)) + +- **LSP semantic diagnostics on every write** — `write_file` and `patch` now run real language-server diagnostics on the post-edit file (delta-only) and surface real errors before they ship downstream. ([#24168](https://github.com/NousResearch/hermes-agent/pull/24168), [#25978](https://github.com/NousResearch/hermes-agent/pull/25978)) + +- **Per-turn file-mutation verifier footer** — after every turn that wrote files, the agent gets a verifier footer summarizing what actually changed on disk — catches silent overwrites and "wrote it but it didn't land" bugs. ([#24498](https://github.com/NousResearch/hermes-agent/pull/24498)) + +- **Unified `video_generate` with pluggable provider backends** — single tool, any backend. Drop in a new video provider as a plugin, no core changes. ([#25126](https://github.com/NousResearch/hermes-agent/pull/25126)) + +- **`computer_use` cua-driver backend** — proper focus-safe ops, non-Anthropic provider support, refresh on `hermes update`. Computer-use is no longer locked to a single SDK. (re-salvage of #16936) ([#21967](https://github.com/NousResearch/hermes-agent/pull/21967), [#24063](https://github.com/NousResearch/hermes-agent/pull/24063)) + +- **xAI Grok OAuth provider — SuperGrok via subscription** — sign in with your xAI account, talk to Grok models from Hermes. ([#26534](https://github.com/NousResearch/hermes-agent/pull/26534)) + +- **Clarify with buttons — native inline keyboards on Telegram + Discord** — the `clarify` tool renders multi-choice prompts as platform-native buttons instead of typed responses. ([#24199](https://github.com/NousResearch/hermes-agent/pull/24199), [#25485](https://github.com/NousResearch/hermes-agent/pull/25485)) + +- **Discord channel history backfill (default on)** — Hermes reads recent channel history when joining a thread so it actually knows what's been said. ([#25984](https://github.com/NousResearch/hermes-agent/pull/25984)) + +- **Watchers skill — RSS / HTTP JSON / GitHub polling via cron `no_agent` mode** — skill recipes that wire change-detection sources directly into cron's script-only watchdog mode. ([#21881](https://github.com/NousResearch/hermes-agent/pull/21881)) + +- **Zed ACP Registry integration + uvx distribution** — Hermes is in the Zed registry, installable via `uvx` (no npm). Plus `hermes acp --setup-browser` bootstraps browser tools for registry installs. (salvage of [#25908](https://github.com/NousResearch/hermes-agent/pull/25908)) ([#26079](https://github.com/NousResearch/hermes-agent/pull/26079), [#26120](https://github.com/NousResearch/hermes-agent/pull/26120), [#26234](https://github.com/NousResearch/hermes-agent/pull/26234)) + +- **OpenRouter Pareto Code router** — wire a new OpenRouter router with `min_coding_score` knob. Pick the cheapest model that meets your quality bar. ([#22838](https://github.com/NousResearch/hermes-agent/pull/22838)) + +- **Optional codex app-server runtime for OpenAI/Codex models** — drives the OpenAI Codex CLI under the hood for OpenAI/Codex paths, with session reuse, wedge retirement, and OAuth refresh classification. ([#24182](https://github.com/NousResearch/hermes-agent/pull/24182), [#25769](https://github.com/NousResearch/hermes-agent/pull/25769)) + +- **`hermes-skills/huggingface` as a trusted default tap** — community skills index from huggingface.co/skills is available by default in the Skills Hub. ([#26219](https://github.com/NousResearch/hermes-agent/pull/26219)) + +- **9 new optional skills** — Hyperliquid (perp/spot trading via SDK + REST) (@kshitijk4poor & Hermes), Yahoo Finance market data, api-testing (REST/GraphQL debug), unified EVM multi-chain skill (folds #25291 + #2010 + base/), darwinian-evolver, osint-investigation (closes #355), pinggy-tunnel, watchers (RSS/HTTP/GitHub via cron), Notion overhaul for the Developer Platform (May 2026). ([#23582](https://github.com/NousResearch/hermes-agent/pull/23582), [#23583](https://github.com/NousResearch/hermes-agent/pull/23583), [#23590](https://github.com/NousResearch/hermes-agent/pull/23590), [#25299](https://github.com/NousResearch/hermes-agent/pull/25299), [#26760](https://github.com/NousResearch/hermes-agent/pull/26760), [#26729](https://github.com/NousResearch/hermes-agent/pull/26729), [#26765](https://github.com/NousResearch/hermes-agent/pull/26765), [#21881](https://github.com/NousResearch/hermes-agent/pull/21881), [#26612](https://github.com/NousResearch/hermes-agent/pull/26612)) + +- **API server exposes run approval events** — long-running runs surface approval requests over the API stream, no more silent stalls. (salvage of [#20311](https://github.com/NousResearch/hermes-agent/pull/20311)) ([#21899](https://github.com/NousResearch/hermes-agent/pull/21899)) + +- **`/subgoal` — user-added criteria appended to active `/goal`** — layer extra success criteria onto a running goal loop. The judge sees them in the prompt, no behavior change when subgoals are empty. ([#25449](https://github.com/NousResearch/hermes-agent/pull/25449)) + +- **Plugins can run any LLM call via `ctx.llm`** — plugins get a first-class hook to make their own LLM requests through the active provider/credentials, no manual wiring. Plus `tool_override` flag for replacing built-in tools. ([#23194](https://github.com/NousResearch/hermes-agent/pull/23194), [#26759](https://github.com/NousResearch/hermes-agent/pull/26759)) + +- **Brave Search (free tier) + DuckDuckGo (DDGS) as web-search providers** — two new free search backends alongside Tavily / SearXNG / Exa. ([#21337](https://github.com/NousResearch/hermes-agent/pull/21337)) + +- **Sudo brute-force block + sudo-stdin/askpass DANGEROUS classification** — closes the `sudo -S` brute-force avenue; approval gates classify stdin-fed and askpass-stripped sudo invocations as dangerous. (salvages of #22194 + #21128) ([#23736](https://github.com/NousResearch/hermes-agent/pull/23736)) + +- **Provider rename — Alibaba Cloud → Qwen Cloud, picker reorder** — matches what the world calls it. Existing config keys still work. ([#24835](https://github.com/NousResearch/hermes-agent/pull/24835)) + + +--- + +## 🪟 Windows — Native Support (Early Beta) + +### Bootstrap & installer +- **Native Windows support (early beta)** — first-class native Windows path across CLI / gateway / TUI / tools ([#21561](https://github.com/NousResearch/hermes-agent/pull/21561)) +- **PyPI wheel packaging — `pip install hermes-agent && hermes`** (salvage of #26350) ([#26593](https://github.com/NousResearch/hermes-agent/pull/26593)) +- **Recognise Shift+Enter as a newline key** + Windows docs (salvage #21545) ([#22130](https://github.com/NousResearch/hermes-agent/pull/22130)) +- **Preserve Ctrl+C for Windows foreground runs** (@helix4u) ([#22752](https://github.com/NousResearch/hermes-agent/pull/22752)) +- **Stop spamming cwd-missing + tirith-spawn warnings on every terminal call** ([#26618](https://github.com/NousResearch/hermes-agent/pull/26618)) +- **Use `--extra all` not `--all-extras`; drop lazy-covered extras from `[all]`** ([#24515](https://github.com/NousResearch/hermes-agent/pull/24515)) + +### Windows-specific fixes (40+ across cli / tools / gateway / curator / TUI) +A long tail of native-Windows fixes shipped alongside the beta — taskkill-based subprocess management, MinGit auto-install, Microsoft Store python stub detection, npm prefix handling, native PTY paths, signal handling differences, foreground process management, ANSI sequence handling, path normalization, file-locking semantics, and many more. Full list in commit log under `fix(windows)` / `feat(windows)` / `windows`. + +--- + +## 🚀 Performance Wave + +### Cold start +- **Cut ~19s from `hermes` cold start** — skills cache + lazy Feishu + no Nous HTTP at startup ([#22138](https://github.com/NousResearch/hermes-agent/pull/22138)) +- **Skip eager plugin discovery on known built-in subcommands** ([#22120](https://github.com/NousResearch/hermes-agent/pull/22120)) +- **Cache Nous auth + .env loads** — `hermes tools` All Platforms from 14s to <1.5s ([#25341](https://github.com/NousResearch/hermes-agent/pull/25341)) +- **Skip welcome banner on `chat -q` single-query mode** ([#22904](https://github.com/NousResearch/hermes-agent/pull/22904)) +- **Defer heavy google-cloud imports in google_chat to first adapter use** ([#22681](https://github.com/NousResearch/hermes-agent/pull/22681)) +- **Defer QQAdapter and YuanbaoAdapter imports via PEP 562** ([#22790](https://github.com/NousResearch/hermes-agent/pull/22790)) +- **Defer httpx import in teams to first webhook call** ([#22831](https://github.com/NousResearch/hermes-agent/pull/22831)) +- **Defer fal_client import to first generation request** ([#22859](https://github.com/NousResearch/hermes-agent/pull/22859)) +- **models.dev cache-first lookup, skip network when disk cache is fresh** ([#22808](https://github.com/NousResearch/hermes-agent/pull/22808)) +- **Parallelize API connectivity checks in `hermes doctor` and disable IMDS** ([#22766](https://github.com/NousResearch/hermes-agent/pull/22766)) + +### Runtime +- **180x faster `browser_console` evaluations** — route through supervisor's persistent CDP WebSocket ([#23226](https://github.com/NousResearch/hermes-agent/pull/23226)) +- **Tune Telegram cadence + adaptive fast-path for short replies** (salvage of #10388) ([#23587](https://github.com/NousResearch/hermes-agent/pull/23587)) +- **Accumulate length-continuation prefix via list+join** ([#26237](https://github.com/NousResearch/hermes-agent/pull/26237)) + +### Prompt caching +- **Cross-session 1h prefix cache for Claude on Anthropic / OpenRouter / Nous Portal** ([#23828](https://github.com/NousResearch/hermes-agent/pull/23828)) +- **Hit prefix cache in background review fork** (salvage #17276 + #25427) ([#25434](https://github.com/NousResearch/hermes-agent/pull/25434)) + +--- + +## 📦 Installation & Distribution + +### PyPI + supply-chain +- **PyPI wheel packaging — `pip install hermes-agent && hermes`** (salvage of #26350) ([#26593](https://github.com/NousResearch/hermes-agent/pull/26593)) +- **Supply-chain advisory checker + lazy-install framework + tiered install fallback** ([#24220](https://github.com/NousResearch/hermes-agent/pull/24220)) +- **Use `--extra all` not `--all-extras`; drop lazy-covered extras from `[all]`** ([#24515](https://github.com/NousResearch/hermes-agent/pull/24515)) +- **Skip browser download when system chromium exists** (@helix4u) ([#25317](https://github.com/NousResearch/hermes-agent/pull/25317)) + +### Nix +- **`extraDependencyGroups` for sealed venv extras** (@alt-glitch) ([#21817](https://github.com/NousResearch/hermes-agent/pull/21817)) +- **Refresh npm lockfile hashes** — keeps Nix flake builds reproducible + +### Docker +- **Bootstrap auth.json from env on first boot** ([#21880](https://github.com/NousResearch/hermes-agent/pull/21880)) +- **Drop manual @hermes/ink build, rely on esbuild bundle** — slimmer image + +### ACP / Zed +- **Zed ACP Registry integration** (salvage of #25908) ([#26079](https://github.com/NousResearch/hermes-agent/pull/26079)) +- **Switch to uvx distribution, drop npm launcher** ([#26120](https://github.com/NousResearch/hermes-agent/pull/26120)) +- **`hermes acp --setup-browser` bootstraps browser tools for registry installs** ([#26234](https://github.com/NousResearch/hermes-agent/pull/26234)) + +--- + +## 🏗️ Core Agent & Architecture + +### Sessions & handoff +- **`/handoff` actually transfers the session live** ([#23395](https://github.com/NousResearch/hermes-agent/pull/23395)) +- **Expose `HERMES_SESSION_ID` env var to agent tools** (@alt-glitch) ([#23847](https://github.com/NousResearch/hermes-agent/pull/23847)) + +### Goals (Ralph loop) +- **`/subgoal` — user-added criteria appended to active `/goal`** ([#25449](https://github.com/NousResearch/hermes-agent/pull/25449)) +- **`/goal` checklist + /subgoal user controls** ([#23456](https://github.com/NousResearch/hermes-agent/pull/23456)) — rolled back in window ([#23813](https://github.com/NousResearch/hermes-agent/pull/23813)); /subgoal returned in simpler form via #25449 + +### Compression +- **Make `protect_first_n` configurable** ([#25447](https://github.com/NousResearch/hermes-agent/pull/25447)) + +### Verification +- **Per-turn file-mutation verifier footer** ([#24498](https://github.com/NousResearch/hermes-agent/pull/24498)) + +### Stream retry +- **Log inner cause, upstream headers, bytes/elapsed on every drop** ([#23005](https://github.com/NousResearch/hermes-agent/pull/23005)) + +--- + +## 🤖 Models & Providers + +### New providers +- **xAI Grok OAuth (SuperGrok Subscription) provider** ([#26534](https://github.com/NousResearch/hermes-agent/pull/26534)) +- **NovitaAI provider** (salvage #7219) (@kshitijk4poor) ([#25507](https://github.com/NousResearch/hermes-agent/pull/25507)) +- **NVIDIA NIM billing origin header** (salvage #25211) ([#26585](https://github.com/NousResearch/hermes-agent/pull/26585)) + +### Provider work +- **OpenRouter Pareto Code router with `min_coding_score` knob** ([#22838](https://github.com/NousResearch/hermes-agent/pull/22838)) +- **Optional codex app-server runtime for OpenAI/Codex models** ([#24182](https://github.com/NousResearch/hermes-agent/pull/24182)) +- **Codex-runtime: retire wedged sessions + post-tool watchdog + OAuth refresh classify** ([#25769](https://github.com/NousResearch/hermes-agent/pull/25769)) +- **Codex-runtime: skip unavailable plugins during migration** ([#25437](https://github.com/NousResearch/hermes-agent/pull/25437)) +- **Codex-runtime: de-dup `[plugins.X]` tables and stop leaking HERMES_HOME into config.toml** (#26250) (@kshitijk4poor) ([#26260](https://github.com/NousResearch/hermes-agent/pull/26260)) +- **Pass `reasoning.effort` to xAI Responses API** ([#22807](https://github.com/NousResearch/hermes-agent/pull/22807)) +- **Custom provider: prompt and persist explicit `api_mode`** ([#25068](https://github.com/NousResearch/hermes-agent/pull/25068)) +- **Rename Alibaba Cloud → Qwen Cloud, reorder picker** ([#24835](https://github.com/NousResearch/hermes-agent/pull/24835)) +- **Restore gpt-5.3-codex-spark for ChatGPT Pro** (salvage #18286 + #19530, fixes #16172) (@kshitijk4poor) ([#22991](https://github.com/NousResearch/hermes-agent/pull/22991)) +- **Inject tool-use enforcement for GLM models** ([#24715](https://github.com/NousResearch/hermes-agent/pull/24715)) +- **Use Nous Portal as model metadata authority** (@rob-maron) ([#24502](https://github.com/NousResearch/hermes-agent/pull/24502)) +- **Unified `client=hermes-client-v` tag on every Portal request** ([#24779](https://github.com/NousResearch/hermes-agent/pull/24779)) +- **Prevent stale Ollama credentials after provider switch** (@kshitijk4poor) ([#21703](https://github.com/NousResearch/hermes-agent/pull/21703)) +- **Auxiliary client: rotate pooled auth after quota failures** (salvage #22779) ([#22792](https://github.com/NousResearch/hermes-agent/pull/22792)) +- **Auxiliary client: skip providers without credentials immediately** (#25395) ([#25487](https://github.com/NousResearch/hermes-agent/pull/25487)) +- **Auth: send Nous refresh token via header** (@shannonsands) ([#21578](https://github.com/NousResearch/hermes-agent/pull/21578)) +- **MiniMax: harden OAuth dashboard and runtime** ([#24165](https://github.com/NousResearch/hermes-agent/pull/24165)) + +### OpenAI-compatible proxy +- **Local OpenAI-compatible proxy for OAuth providers** — Codex / Aider / Cline can hit Claude Pro, ChatGPT Pro, SuperGrok ([#25969](https://github.com/NousResearch/hermes-agent/pull/25969)) + +--- + +## 📱 Messaging Platforms (Gateway) + +### New platforms +- **LINE Messaging API platform plugin** ([#23197](https://github.com/NousResearch/hermes-agent/pull/23197)) +- **SimpleX Chat platform plugin** (salvages #2558) ([#26232](https://github.com/NousResearch/hermes-agent/pull/26232)) + +### Microsoft Graph foundation +- **msgraph: add auth and client foundation** (salvage of #21408) ([#21922](https://github.com/NousResearch/hermes-agent/pull/21922)) +- **msgraph: add webhook listener platform** (salvage of #21409) ([#21969](https://github.com/NousResearch/hermes-agent/pull/21969)) +- **teams-pipeline: add plugin runtime and operator cli** (salvage of #21410) ([#22007](https://github.com/NousResearch/hermes-agent/pull/22007)) +- **teams: add pipeline outbound delivery via existing adapter** (salvage of #21411) ([#22024](https://github.com/NousResearch/hermes-agent/pull/22024)) + +### Cross-platform +- **Per-platform admin/user split for slash commands** (salvage of #4443) ([#23373](https://github.com/NousResearch/hermes-agent/pull/23373)) +- **Forensics on signal handling — non-blocking diag, per-phase timing, stale-unit warning** ([#23285](https://github.com/NousResearch/hermes-agent/pull/23285)) +- **Keep gateway running when platforms fail; add per-platform circuit breaker + `/platform`** ([#26600](https://github.com/NousResearch/hermes-agent/pull/26600)) +- **Wire `clarify` tool with inline keyboard buttons on Telegram** ([#24199](https://github.com/NousResearch/hermes-agent/pull/24199)) +- **Add `chat_id` to `hook_ctx` for message source tracking** ([#24710](https://github.com/NousResearch/hermes-agent/pull/24710)) + +### Telegram +- **Native draft streaming via `sendMessageDraft` (Bot API 9.5+)** (salvage of #3412) ([#23512](https://github.com/NousResearch/hermes-agent/pull/23512)) +- **Stream Telegram edits safely** — salvage of #22264 (@kshitijk4poor) ([#22518](https://github.com/NousResearch/hermes-agent/pull/22518)) +- **Telegram notification mode** (salvage #22772) ([#22793](https://github.com/NousResearch/hermes-agent/pull/22793)) +- **Telegram guest mention mode** (@kshitijk4poor) ([#22759](https://github.com/NousResearch/hermes-agent/pull/22759)) +- **Split-and-deliver oversized edits instead of silent truncation** (salvage of #19537) ([#23576](https://github.com/NousResearch/hermes-agent/pull/23576)) +- **Preserve DM topic routing via reply fallback** (salvage #22053) (@kshitijk4poor) ([#22410](https://github.com/NousResearch/hermes-agent/pull/22410)) +- **Pass `source.thread_id` explicitly on auto-reset notice** (carve-out of #7404) ([#23440](https://github.com/NousResearch/hermes-agent/pull/23440)) + +### Discord +- **Render clarify choices as buttons** ([#25485](https://github.com/NousResearch/hermes-agent/pull/25485)) +- **Channel history backfill — default on, broadened scope** ([#25984](https://github.com/NousResearch/hermes-agent/pull/25984)) +- **`thread_require_mention` for multi-bot threads** (salvage #25313) ([#25445](https://github.com/NousResearch/hermes-agent/pull/25445)) + +### Slack +- **Support `!cmd` as alternate prefix for slash commands in threads** ([#25355](https://github.com/NousResearch/hermes-agent/pull/25355)) + +### WhatsApp +- **Surface quoted reply metadata from Baileys** (#25398) ([#25489](https://github.com/NousResearch/hermes-agent/pull/25489)) + +### Feishu / Google Chat / others +- **Feishu: native update prompt cards** (@kshitijk4poor) ([#22448](https://github.com/NousResearch/hermes-agent/pull/22448)) +- **Google Chat: repair setup prompt imports** (@helix4u) ([#22038](https://github.com/NousResearch/hermes-agent/pull/22038)) +- **Google Chat: honor relay-declared sender_type** (salvage of #22107) (@kshitijk4poor) ([#22432](https://github.com/NousResearch/hermes-agent/pull/22432)) +- **LINE: use `build_source` instead of nonexistent `create_source`** ([#24717](https://github.com/NousResearch/hermes-agent/pull/24717)) +- **Add `weixin, and more` to gateway docs** (salvage of #21063 by @wuwuzhijing) + +--- + +## 🖥️ CLI & TUI + +### CLI +- **Show YOLO mode warning in banner and status bar** ([#26238](https://github.com/NousResearch/hermes-agent/pull/26238)) +- **Confirm prompt for destructive slash commands** (#4069) ([#22687](https://github.com/NousResearch/hermes-agent/pull/22687)) +- **`docker_extra_args` + `display.timestamps`** ([#23599](https://github.com/NousResearch/hermes-agent/pull/23599)) +- **Delegate tool: show user's actual concurrency / spawn-depth limits in description** ([#22694](https://github.com/NousResearch/hermes-agent/pull/22694)) + +### TUI +- **`/sessions` slash command for browsing and resuming previous sessions** (@austinpickett) ([#20805](https://github.com/NousResearch/hermes-agent/pull/20805)) +- **Segment turns with rule above non-first user msgs; trim ticker dead space** (@OutThisLife) ([#21846](https://github.com/NousResearch/hermes-agent/pull/21846)) +- **Support attaching to an existing gateway** (@OutThisLife) ([#21978](https://github.com/NousResearch/hermes-agent/pull/21978)) +- **Resolve markdown links to readable page titles** (@OutThisLife) ([#24013](https://github.com/NousResearch/hermes-agent/pull/24013)) +- **Width-aware markdown table rendering with vertical fallback** (@alt-glitch) ([#26195](https://github.com/NousResearch/hermes-agent/pull/26195)) +- **Keep Ink displayCursor in sync with fast-echo writes so cursor stops drifting** (@OutThisLife) ([#26717](https://github.com/NousResearch/hermes-agent/pull/26717)) +- **Allow transcript scroll + Esc during approval/clarify/confirm prompts** (@OutThisLife) ([#26414](https://github.com/NousResearch/hermes-agent/pull/26414)) +- **Preserve session when switching personality** (@austinpickett) ([#20942](https://github.com/NousResearch/hermes-agent/pull/20942)) +- **Skip native safety net on OSC52-capable terminals** (@benbarclay) ([#20954](https://github.com/NousResearch/hermes-agent/pull/20954)) + +### Dashboard / GUI +- **Route embedded TUI through dashboard gateway** (@OutThisLife) ([#21979](https://github.com/NousResearch/hermes-agent/pull/21979)) +- **Hide token/cost analytics behind config flag (default off)** ([#25438](https://github.com/NousResearch/hermes-agent/pull/25438)) +- **Fix Langfuse observability — trace I/O, tool outputs, placeholder credentials** (closes #22342, #22763) (@kshitijk4poor) ([#26320](https://github.com/NousResearch/hermes-agent/pull/26320)) +- **MiniMax 'Login' button launched Claude OAuth** (salvage #22849) ([#24058](https://github.com/NousResearch/hermes-agent/pull/24058)) +- **Update cron modals** (@austinpickett) ([#25985](https://github.com/NousResearch/hermes-agent/pull/25985)) +- **Analytics: prevent silent token loss and add Claude 4.5–4.7 pricing** (@austinpickett) ([#21455](https://github.com/NousResearch/hermes-agent/pull/21455)) + +--- + +## 🔧 Tools & Capabilities + +### Vision & video +- **`vision_analyze` returns pixels to vision-capable models** ([#22955](https://github.com/NousResearch/hermes-agent/pull/22955)) +- **Unified `video_generate` with pluggable provider backends** ([#25126](https://github.com/NousResearch/hermes-agent/pull/25126)) +- **`image_gen`: actionable setup message when no FAL backend is reachable** ([#26222](https://github.com/NousResearch/hermes-agent/pull/26222)) + +### Computer use +- **`computer_use` cua-driver backend + focus-safe ops + non-Anthropic provider fix** (re-salvage #16936) ([#21967](https://github.com/NousResearch/hermes-agent/pull/21967)) +- **Refresh cua-driver on `hermes update` + add `install --upgrade`** ([#24063](https://github.com/NousResearch/hermes-agent/pull/24063)) + +### LSP & write-time diagnostics +- **Semantic diagnostics from real language servers in `write_file`/`patch`** ([#24168](https://github.com/NousResearch/hermes-agent/pull/24168)) +- **Shift baseline diagnostics into post-edit coordinates** ([#25978](https://github.com/NousResearch/hermes-agent/pull/25978)) + +### Search & web +- **Brave Search (free tier) and DDGS search providers** ([#21337](https://github.com/NousResearch/hermes-agent/pull/21337)) +- **Bearer auth header for Tavily `/crawl` endpoint** ([#24658](https://github.com/NousResearch/hermes-agent/pull/24658)) + +### X (Twitter) +- **Gated `x_search` tool with OAuth-or-API-key auth** ([#26763](https://github.com/NousResearch/hermes-agent/pull/26763)) + +### Browser +- **Route `browser_console` eval through supervisor's persistent CDP WS (180x faster)** ([#23226](https://github.com/NousResearch/hermes-agent/pull/23226)) +- **Support externally managed Camofox sessions** ([#24499](https://github.com/NousResearch/hermes-agent/pull/24499)) + +### MCP +- **`supports_parallel_tool_calls` for MCP servers** (salvage of #9944) ([#26825](https://github.com/NousResearch/hermes-agent/pull/26825)) +- **Codex preset for Codex CLI MCP server** (salvage #22663) ([#22679](https://github.com/NousResearch/hermes-agent/pull/22679)) +- **Stop retrying initial MCP auth failures** (#25624) ([#25776](https://github.com/NousResearch/hermes-agent/pull/25776)) + +### Google Workspace +- **Drive write ops + Docs/Sheets create/append** ([#21895](https://github.com/NousResearch/hermes-agent/pull/21895)) + +### Per-turn verifier +- **Per-turn file-mutation verifier footer** ([#24498](https://github.com/NousResearch/hermes-agent/pull/24498)) + +--- + +## 🧩 Kanban (Multi-Agent) + +- **`specify` — auxiliary LLM fleshes out triage tasks** ([#21435](https://github.com/NousResearch/hermes-agent/pull/21435)) +- **Orchestrator board tools — `kanban_list` + `kanban_unblock`** (carve-out of #20568) ([#23012](https://github.com/NousResearch/hermes-agent/pull/23012)) +- **`stranded_in_ready` diagnostic for unclaimed tasks** ([#23578](https://github.com/NousResearch/hermes-agent/pull/23578)) +- **Dashboard batch QOL upgrade** (salvage of #23240) ([#23550](https://github.com/NousResearch/hermes-agent/pull/23550)) +- **Tooltips and docs link across dashboard** ([#21541](https://github.com/NousResearch/hermes-agent/pull/21541)) +- **Dedupe notifier delivery via atomic claim + rewind on failure** (salvage #22558) ([#23401](https://github.com/NousResearch/hermes-agent/pull/23401)) +- **Keep notifier subscriptions alive across retry cycles** (salvage #21398) ([#23423](https://github.com/NousResearch/hermes-agent/pull/23423)) +- **Drop caller-controlled author override in `kanban_comment`** (salvage of #22109) (@kshitijk4poor) ([#22435](https://github.com/NousResearch/hermes-agent/pull/22435)) +- **Sanitize comment author rendering in `build_worker_context`** ([#22769](https://github.com/NousResearch/hermes-agent/pull/22769)) + +--- + +## 🧠 Plugins & Extension + +### Plugin surface +- **Run any LLM call from inside a plugin via `ctx.llm`** ([#23194](https://github.com/NousResearch/hermes-agent/pull/23194)) +- **`tool_override` flag for replacing built-in tools** (closes #11049) ([#26759](https://github.com/NousResearch/hermes-agent/pull/26759)) +- **`standalone_sender_fn` for out-of-process cron delivery** (@kshitijk4poor) ([#22461](https://github.com/NousResearch/hermes-agent/pull/22461)) +- **`HERMES_PLUGINS_DEBUG=1` surfaces plugin discovery logs** ([#22684](https://github.com/NousResearch/hermes-agent/pull/22684)) +- **Hindsight-client as optional dependency** (@alt-glitch) ([#21818](https://github.com/NousResearch/hermes-agent/pull/21818)) + +### Profile & distribution +- **Shareable profile distributions via git** ([#20831](https://github.com/NousResearch/hermes-agent/pull/20831)) + +--- + +## ⏰ Cron + +- **Routing intent — `deliver=all` fans out to every connected channel** ([#21495](https://github.com/NousResearch/hermes-agent/pull/21495)) +- **Support name-based lookup for job operations** ([#26231](https://github.com/NousResearch/hermes-agent/pull/26231)) +- **Blank Cron dashboard tab + partial-record crashes** (salvage #21042 + #22330) (@kshitijk4poor) ([#22389](https://github.com/NousResearch/hermes-agent/pull/22389)) +- **Do not seed `HERMES_SESSION_*` contextvars from cron origin** (salvage of #22356) (@kshitijk4poor) ([#22382](https://github.com/NousResearch/hermes-agent/pull/22382)) +- **Scan assembled prompt including skill content for prompt injection** (#3968) + +--- + +## 🧩 Skills Ecosystem + +### Skills Hub +- **`hermes-skills/huggingface` as a trusted default tap** (closes #2549) ([#26219](https://github.com/NousResearch/hermes-agent/pull/26219)) +- **Show per-skill pages in the left sidebar** ([#26646](https://github.com/NousResearch/hermes-agent/pull/26646)) +- **Richer info panels on the Skills Hub** ([#22905](https://github.com/NousResearch/hermes-agent/pull/22905)) +- **Refuse `skill_view` name collisions instead of guessing** (closes #6136 @polkn) + +### Curator +- **Show rename map in user-visible summary** ([#22910](https://github.com/NousResearch/hermes-agent/pull/22910)) +- **Hint at `hermes curator pin` in the rename block** ([#23212](https://github.com/NousResearch/hermes-agent/pull/23212)) + +### New optional skills +- **Hyperliquid** — perp/spot trading via SDK + REST (salvage of #1952) ([#23583](https://github.com/NousResearch/hermes-agent/pull/23583)) +- **Yahoo Finance** market data ([#23590](https://github.com/NousResearch/hermes-agent/pull/23590)) +- **api-testing** (REST/GraphQL debug, salvages #1800) ([#23582](https://github.com/NousResearch/hermes-agent/pull/23582)) +- **Unified EVM multi-chain skill** (salvages #25291 + #2010 + folds in base/) ([#25299](https://github.com/NousResearch/hermes-agent/pull/25299)) +- **darwinian-evolver** ([#26760](https://github.com/NousResearch/hermes-agent/pull/26760)) +- **osint-investigation** (closes #355) ([#26729](https://github.com/NousResearch/hermes-agent/pull/26729)) +- **pinggy-tunnel** ([#26765](https://github.com/NousResearch/hermes-agent/pull/26765)) +- **watchers** — RSS / HTTP JSON / GitHub polling via cron no-agent ([#21881](https://github.com/NousResearch/hermes-agent/pull/21881)) +- **Notion overhaul for the Developer Platform** (May 2026) ([#26612](https://github.com/NousResearch/hermes-agent/pull/26612)) + +--- + +## 🔒 Security & Reliability + +### Security hardening +- **Sudo brute-force block + sudo-stdin/askpass DANGEROUS** (salvage of #22194 + #21128) (@kshitijk4poor) ([#23736](https://github.com/NousResearch/hermes-agent/pull/23736)) +- **Drop caller-controlled author override in `kanban_comment`** (salvage of #22109) (@kshitijk4poor) ([#22435](https://github.com/NousResearch/hermes-agent/pull/22435)) +- **Cover remaining SSRF fetch paths in skills-hub** (salvage #22804) ([#22843](https://github.com/NousResearch/hermes-agent/pull/22843)) +- **Use credential_pool for custom endpoint model listing probes** (salvage #22810) ([#22842](https://github.com/NousResearch/hermes-agent/pull/22842)) +- **Require dashboard auth for plugin API routes** (salvage #19541) ([#23220](https://github.com/NousResearch/hermes-agent/pull/23220)) +- **Sanitize env and redact output in quick commands + remove write-only `_pending_messages`** ([#23584](https://github.com/NousResearch/hermes-agent/pull/23584)) +- **Reduce unnecessary `shell=True` in subprocess calls** ([#25149](https://github.com/NousResearch/hermes-agent/pull/25149)) +- **Sanitize Google Chat sender_type from relay** (salvage of #22107) (@kshitijk4poor) ([#22432](https://github.com/NousResearch/hermes-agent/pull/22432)) +- **Supply-chain advisory checker** ([#24220](https://github.com/NousResearch/hermes-agent/pull/24220)) +- **Rewrite security policy around OS-level isolation as the boundary** (@jquesnelle) ([#20317](https://github.com/NousResearch/hermes-agent/pull/20317)) +- **Remove public security advisory page** ([#24253](https://github.com/NousResearch/hermes-agent/pull/24253)) + +### Reliability — notable bug closures +- **SQLite: fall back to `journal_mode=DELETE` on NFS/SMB/FUSE** (fixes `/resume` on network mounts) (@kshitijk4poor) ([#22043](https://github.com/NousResearch/hermes-agent/pull/22043)) +- **Codex-runtime: retire wedged sessions + post-tool watchdog + OAuth refresh classify** ([#25769](https://github.com/NousResearch/hermes-agent/pull/25769)) +- **Codex-runtime: de-dup `[plugins.X]` tables and stop leaking HERMES_HOME** (#26250) (@kshitijk4poor) ([#26260](https://github.com/NousResearch/hermes-agent/pull/26260)) +- **Daytona: migrate legacy-sandbox lookup to cursor-based `list()`** ([#24587](https://github.com/NousResearch/hermes-agent/pull/24587)) +- **MCP: stop retrying initial MCP auth failures** (#25624) ([#25776](https://github.com/NousResearch/hermes-agent/pull/25776)) +- **Gateway: enable text-intercept for multi-choice clarify fallback** (#25587) ([#25778](https://github.com/NousResearch/hermes-agent/pull/25778)) +- **Gateway: keep running when platforms fail; per-platform circuit breaker + `/platform`** ([#26600](https://github.com/NousResearch/hermes-agent/pull/26600)) +- **Delegate: salvage #21933 JSON-string batch + diagnostic logging** (@kshitijk4poor) ([#22436](https://github.com/NousResearch/hermes-agent/pull/22436)) +- **Profiles+banner: exclude infrastructure from `--clone-all` + fix stale update-check repo resolution** (@kshitijk4poor) ([#22475](https://github.com/NousResearch/hermes-agent/pull/22475)) +- **ACP: inline file attachment resources** (salvage #21400 + image support) ([#21407](https://github.com/NousResearch/hermes-agent/pull/21407)) +- **CI: unblock shared PR checks** (@stephenschoettler) ([#21012](https://github.com/NousResearch/hermes-agent/pull/21012), [#25957](https://github.com/NousResearch/hermes-agent/pull/25957)) + +### Notable reverts in window +- **`/goal` checklist + /subgoal feature stack** — rolled back ([#23813](https://github.com/NousResearch/hermes-agent/pull/23813)); `/subgoal` returned in simpler form via [#25449](https://github.com/NousResearch/hermes-agent/pull/25449) +- **Scrollback box width clamp** (#25975) rolled back to restore full-width borders ([#26163](https://github.com/NousResearch/hermes-agent/pull/26163)) +- **`fix(cli): tolerate unreadable dirs when building systemd PATH`** rolled back + +--- + +## 🌍 i18n + +- **Localize all gateway commands + web dashboard, add 8 new locales (16 total)** ([#22914](https://github.com/NousResearch/hermes-agent/pull/22914)) + +--- + +## 📚 Documentation + +- **Repair Voice & TTS provider table** (@nightcityblade, fixes #24101) ([#24138](https://github.com/NousResearch/hermes-agent/pull/24138)) +- **Show per-skill pages in the left sidebar** ([#26646](https://github.com/NousResearch/hermes-agent/pull/26646)) +- **Mention Weixin in gateway help and docstrings** (salvage of #21063 by @wuwuzhijing) +- **Richer info panels on the Skills Hub** ([#22905](https://github.com/NousResearch/hermes-agent/pull/22905)) +- Many more doc updates across providers, platforms, skills, Windows install paths, and dashboard. + +--- + +## 🧪 Testing & CI + +- **Unblock shared PR checks** (@stephenschoettler) ([#21012](https://github.com/NousResearch/hermes-agent/pull/21012)) +- **Stabilize shared test state after 21012** (@stephenschoettler) ([#25957](https://github.com/NousResearch/hermes-agent/pull/25957)) +- A long tail of test additions for platforms, providers, plugins, and edge cases — 8 explicit `test:` PRs plus ~250 fix PRs that also added regression coverage. + +--- + +## 👥 Contributors + +### Core +- @teknium1 — release lead, architecture, ~406 PRs merged in window + +### Top community contributors +- **@kshitijk4poor** — 38 PRs · Telegram cadence/streaming/topic routing, security hardening (sudo, SSRF, kanban_comment, dashboard auth), codex-runtime hygiene, NovitaAI provider, profile/banner fixes, Feishu update cards, gateway QOL across the board +- **@alt-glitch** — 13 PRs · Markdown-table TUI rendering, `HERMES_SESSION_ID` env var, hindsight-client optional dep, Nix `extraDependencyGroups` +- **@OutThisLife** (Brooklyn Nicholson) — 12 PRs · TUI turn segmentation, attach-to-gateway, markdown link titles, embedded TUI via dashboard gateway, Ink cursor sync, scroll/Esc during prompts +- **@austinpickett** — 8 PRs · `/sessions` slash command, personality switching preserves session, cron modals, dashboard analytics +- **@helix4u** — 5 PRs · Google Chat setup, browser install skip on system chromium, Windows Ctrl+C preservation +- **@rob-maron** — 4 PRs · Nous Portal as model metadata authority, provider polish +- **@stephenschoettler** — 3 PRs · CI stabilization +- **@ethernet8023** — 3 PRs · platform/gateway work + +### All contributors (alphabetical) + +@02356abc, @0xbyt4, @0xharryriddle, @1000Delta, @1RB, @29206394, @A-kamal, @aashizpoudel, @Abd0r, +@adybag14-cyber, @AgentArcLab, @ahmedbadr3, @AhmetArif0, @alblez, @Alex-yang00, @ALIYILD, @AllynSheep, +@alt-glitch, @am423, @amathxbt, @amethystani, @ArecaNon, @Arkmusn, @askclaw-vesper, @AsoTora, @austinpickett, +@aydnOktay, @ayushere, @baocin, @Bartok9, @benbarclay, @BennetYrWang, @Bihruze, @binhnt92, @briandevans, +@brooklynnicholson, @btorresgil, @buntingszn, @CalmProton, @chrisworksai, @CoinTheHat, @dandacompany, @Dangooy, +@DanielLSM, @David-0x221Eight, @ddupont808, @dhruv-saxena, @diablozzc, @dlkakbs, @dmahan93, @dmnkhorvath, +@domtriola, @donrhmexe, @Dusk1e, @eloklam, @emozilla, @ephron-ren, @erenkarakus, @EthanGuo-coder, +@ethernet8023, @evgyur, @explainanalyze, @fahdad, @fr33d3m0n, @Freeman-Consulting, @freqyfreqy, @Frowtek, +@fu576, @github-actions[bot], @gnanirahulnutakki, @GodsBoy, @guglielmofonda, @Gutslabs, @hanzckernel, +@heathley, @hekaru-agent, @helix4u, @HenkDz, @HiddenPuppy, @hllqkb, @hrygo, @HuangYuChuh, @Hugo-SEQUIER, @HxT9, +@iacker, @InB4DevOps, @isaachuangGMICLOUD, @iuyup, @Jaaneek, @jackey8616, @jackjin1997, @Jaggia, @jak983464779, +@jelrod27, @jethac, @JithendraNara, @johnisag, @Julientalbot, @Jwd-gity, @kallidean, @keyuyuan, @kfa-ai, +@kidonng, @KiraKatana, @kjames2001, @konsisumer, @Korkyzer, @kshitijk4poor, @KvnGz, @lars-hagen, @leehack, +@leepoweii, @LeonSGP43, @li0near, @libo1106, @liquidchen, @littlewwwhite, @liuhao1024, @liyoungc, @luandiasrj, +@luoyuctl, @luyao618, @magic524, @mbac, @McClean, @memosr, @Mibayy, @ming1523, @mizgyo, @mrshu, @ms-alan, +@MustafaKara7, @nederev, @nicoechaniz, @nidhi-singh02, @nightcityblade, @nik1t7n, @Ninso112, @NivOO5, +@novax635, @nv-kasikritc, @oferlaor, @oswaldb22, @outdoorsea, @oxngon, @PaTTeeL, @pearjelly, @pefontana, +@perng, @PhilipAD, @phuongvm, @polkn, @Prasanna28Devadiga, @princepal9120, @pty819, @purzbeats, @Quarkex, +@quocanh261997, @qWaitCrypto, @Qwinty, @rahimsais, @raymaylee, @ReqX, @rewbs, @RhombusMaximus, @rob-maron, +@Ruzzgar, @ryptotalent, @Sanjays2402, @shannonsands, @shaun0927, @SiliconID, @silv-mt-holdings, @simpolism, +@smwbev, @soichiyo, @sprmn24, @steezkelly, @stephenschoettler, @Sylw3ster, @szymonclawd, @teyrebaz33, +@Tianyu199509, @Tranquil-Flow, @TreyDong, @TurgutKural, @tw2818, @tymrtn, @uzunkuyruk, @v1b3coder, +@vanthinh6886, @VinceZcrikl, @vKongv, @vominh1919, @voteblake, @VTRiot, @wali-reheman, @wesleysimplicio, +@wilsen0, @WorldWriter, @worlldz, @wuli666, @wuwuzhijing, @Wysie, @XiaoXiao0221, @xieNniu, @xxxigm, @yehuosi, +@ygd58, @yifengingit, @yuga-hashimoto, @zccyman, @ZeterMordio, @Zhekinmaksim, @zhengyn0001 + +Also: @Nagatha (Claude Opus 4.7). + +--- + +**Full Changelog**: [v2026.5.7...v2026.5.16](https://github.com/NousResearch/hermes-agent/compare/v2026.5.7...v2026.5.16) diff --git a/acp_adapter/server.py b/acp_adapter/server.py index 71fce1890d1..3031de161fd 100644 --- a/acp_adapter/server.py +++ b/acp_adapter/server.py @@ -18,6 +18,7 @@ import acp from acp.schema import ( AgentCapabilities, AgentMessageChunk, + AgentThoughtChunk, AuthenticateResponse, AvailableCommand, AvailableCommandsUpdate, @@ -788,14 +789,20 @@ class HermesACPAgent(acp.Agent): # ---- Session management ------------------------------------------------- @staticmethod - def _history_message_text(message: dict[str, Any]) -> str: - """Extract displayable text from a persisted OpenAI-style message.""" - content = message.get("content") - if isinstance(content, str): - return content.strip() - if isinstance(content, list): + def _flatten_history_text(value: Any) -> str: + """Normalize a persisted text-or-text-parts value into a single string. + + OpenAI-style assistant content (and provider reasoning fields) can arrive + as either a scalar string or a list of ``{"text": ...}`` / + ``{"type": "text", "content": ...}`` parts. Whitespace-only inputs + collapse to an empty string so callers can treat ``""`` as "nothing to + emit". + """ + if isinstance(value, str): + return value.strip() + if isinstance(value, list): parts: list[str] = [] - for item in content: + for item in value: if isinstance(item, dict): text = item.get("text") if isinstance(text, str): @@ -807,6 +814,29 @@ class HermesACPAgent(acp.Agent): return "\n".join(part.strip() for part in parts if part and part.strip()).strip() return "" + @classmethod + def _history_message_text(cls, message: dict[str, Any]) -> str: + """Extract displayable text from a persisted OpenAI-style message.""" + return cls._flatten_history_text(message.get("content")) + + @classmethod + def _history_reasoning_text(cls, message: dict[str, Any]) -> str: + """Extract displayable reasoning/thought text from a persisted assistant message. + + Returns the first non-empty value among ``reasoning_content`` (the + canonical field used by DeepSeek / Moonshot and the post-#16892 + chat-completions normalizer) and ``reasoning`` (used by the codex + event projector and several other transports). Both keys are + actively written by live code paths, so neither branch is + deprecated — they cover different transports rather than old vs. + new sessions. + """ + for key in ("reasoning_content", "reasoning"): + text = cls._flatten_history_text(message.get(key)) + if text: + return text + return "" + @staticmethod def _history_message_update( *, @@ -827,6 +857,11 @@ class HermesACPAgent(acp.Agent): ) return None + @staticmethod + def _history_thought_update(text: str) -> AgentThoughtChunk: + """Build an ACP history replay update for an assistant thought.""" + return acp.update_agent_thought_text(text) + @staticmethod def _history_tool_call_name_args(tool_call: dict[str, Any]) -> tuple[str, dict[str, Any]]: """Extract function name/arguments from an OpenAI-style tool_call.""" @@ -854,13 +889,17 @@ class HermesACPAgent(acp.Agent): ).strip() async def _replay_session_history(self, state: SessionState) -> None: - """Send persisted user/assistant history to clients during session/load. + """Replay persisted user/assistant history during session/load or session/resume. - Zed's ACP history UI calls ``session/load`` after the user picks an item - from the Agents sidebar. The agent must then replay the full conversation - as user/assistant chunks plus reconstructed tool-call start/completion - notifications; merely restoring server-side state makes Hermes remember - context, but leaves the editor looking like a clean thread. + Invoked inline (``await``) from both ``load_session`` and + ``resume_session`` so that spec-compliant ACP clients receive the + full transcript within the request's lifetime — see the comment at + the call sites for the rationale and prior-art citations. + + Replays the conversation as user/assistant chunks, thinking-mode + thought chunks, plus reconstructed tool-call start/completion + notifications. Merely restoring server-side state makes Hermes + remember context, but leaves the editor looking like a clean thread. """ if not self._conn or not state.history: return @@ -882,24 +921,37 @@ class HermesACPAgent(acp.Agent): for message in state.history: role = str(message.get("role") or "") - if role in {"user", "assistant"}: + if role == "user": + text = self._history_message_text(message) + if text: + update = self._history_message_update(role=role, text=text) + if update is not None and not await _send(update): + return + continue + + if role == "assistant": + thought = self._history_reasoning_text(message) + if thought and not await _send(self._history_thought_update(thought)): + return + text = self._history_message_text(message) if text: update = self._history_message_update(role=role, text=text) if update is not None and not await _send(update): return - if role == "assistant" and isinstance(message.get("tool_calls"), list): - for tool_call in message["tool_calls"]: - if not isinstance(tool_call, dict): - continue - tool_call_id = self._history_tool_call_id(tool_call) - if not tool_call_id: - continue - tool_name, args = self._history_tool_call_name_args(tool_call) - active_tool_calls[tool_call_id] = (tool_name, args) - if not await _send(build_tool_start(tool_call_id, tool_name, args)): - return + tool_calls = message.get("tool_calls") + if isinstance(tool_calls, list): + for tool_call in tool_calls: + if not isinstance(tool_call, dict): + continue + tool_call_id = self._history_tool_call_id(tool_call) + if not tool_call_id: + continue + tool_name, args = self._history_tool_call_name_args(tool_call) + active_tool_calls[tool_call_id] = (tool_name, args) + if not await _send(build_tool_start(tool_call_id, tool_name, args)): + return continue if role == "tool": @@ -942,18 +994,6 @@ class HermesACPAgent(acp.Agent): models=self._build_model_state(state), ) - def _schedule_history_replay(self, state: SessionState) -> None: - """Replay persisted history after session/load or session/resume returns. - - Zed only attaches streamed transcript/tool updates once the load/resume - response has completed. Sending replay notifications while the request is - still in-flight can make the server look correct in logs while the editor - drops or fails to attach the tool-call history. - """ - loop = asyncio.get_running_loop() - replay_coro = self._replay_session_history(state) - loop.call_soon(asyncio.create_task, replay_coro) - async def load_session( self, cwd: str, @@ -967,7 +1007,30 @@ class HermesACPAgent(acp.Agent): return None await self._register_session_mcp_servers(state, mcp_servers) logger.info("Loaded session %s", session_id) - self._schedule_history_replay(state) + # Per ACP spec, `session/load` must stream the prior conversation back + # to the client via `session/update` notifications BEFORE responding, + # so the client receives the full transcript within the load request's + # lifetime. Awaiting the replay here matches Codex / Claude Code / + # OpenCode / Pi and the Zed client (which registers the session-update + # routing entry before awaiting the loadSession RPC specifically so + # in-call history replay updates can find the thread). Deferring this + # via `loop.call_soon` (as we did briefly in May 2026) broke every + # spec-compliant ACP client that measures notifications synchronously + # against the load response — see #12285 follow-up. + try: + await self._replay_session_history(state) + except Exception: + # Replay is best-effort — a corrupted or unexpected message shape + # must not turn a successful session/load into a JSON-RPC error + # response. Per-notification failures are already caught inside + # ``_replay_session_history``; this outer guard covers anything + # raised by the helpers themselves before reaching ``_send``. + logger.warning( + "ACP history replay raised during session/load for %s — " + "load will still succeed, partial transcript may be missing", + session_id, + exc_info=True, + ) self._schedule_available_commands_update(session_id) self._schedule_usage_update(state) return LoadSessionResponse(models=self._build_model_state(state)) @@ -985,7 +1048,18 @@ class HermesACPAgent(acp.Agent): state = self.session_manager.create_session(cwd=cwd) await self._register_session_mcp_servers(state, mcp_servers) logger.info("Resumed session %s", state.session_id) - self._schedule_history_replay(state) + # See `load_session` above for the spec rationale — replay must + # complete before the response so clients receive the full transcript + # within the request's lifetime. + try: + await self._replay_session_history(state) + except Exception: + logger.warning( + "ACP history replay raised during session/resume for %s — " + "resume will still succeed, partial transcript may be missing", + state.session_id, + exc_info=True, + ) self._schedule_available_commands_update(state.session_id) self._schedule_usage_update(state) return ResumeSessionResponse(models=self._build_model_state(state)) diff --git a/agent/anthropic_adapter.py b/agent/anthropic_adapter.py index 4b1134a4c0b..e7e1a8acb6d 100644 --- a/agent/anthropic_adapter.py +++ b/agent/anthropic_adapter.py @@ -1060,10 +1060,12 @@ def _generate_pkce() -> tuple: def run_hermes_oauth_login_pure() -> Optional[Dict[str, Any]]: """Run Hermes-native OAuth PKCE flow and return credential state.""" + import secrets import time import webbrowser verifier, challenge = _generate_pkce() + oauth_state = secrets.token_urlsafe(32) params = { "code": "true", @@ -1073,7 +1075,7 @@ def run_hermes_oauth_login_pure() -> Optional[Dict[str, Any]]: "scope": _OAUTH_SCOPES, "code_challenge": challenge, "code_challenge_method": "S256", - "state": verifier, + "state": oauth_state, } from urllib.parse import urlencode @@ -1110,7 +1112,12 @@ def run_hermes_oauth_login_pure() -> Optional[Dict[str, Any]]: splits = auth_code.split("#") code = splits[0] - state = splits[1] if len(splits) > 1 else "" + received_state = splits[1] if len(splits) > 1 else "" + + # Validate state to prevent CSRF (RFC 6749 §10.12) + if received_state != oauth_state: + logger.warning("OAuth state mismatch — possible CSRF, aborting") + return None try: import urllib.request @@ -1119,7 +1126,7 @@ def run_hermes_oauth_login_pure() -> Optional[Dict[str, Any]]: "grant_type": "authorization_code", "client_id": _OAUTH_CLIENT_ID, "code": code, - "state": state, + "state": received_state, "redirect_uri": _OAUTH_REDIRECT_URI, "code_verifier": verifier, }).encode() diff --git a/agent/copilot_acp_client.py b/agent/copilot_acp_client.py index 3643837bf5b..f1bff1a7190 100644 --- a/agent/copilot_acp_client.py +++ b/agent/copilot_acp_client.py @@ -30,6 +30,28 @@ _DEFAULT_TIMEOUT_SECONDS = 900.0 _TOOL_CALL_BLOCK_RE = re.compile(r"\s*(\{.*?\})\s*", re.DOTALL) _TOOL_CALL_JSON_RE = re.compile(r"\{\s*\"id\"\s*:\s*\"[^\"]+\"\s*,\s*\"type\"\s*:\s*\"function\"\s*,\s*\"function\"\s*:\s*\{.*?\}\s*\}", re.DOTALL) +# Stderr fingerprint of the deprecated `gh copilot` CLI extension +# (https://github.blog/changelog/2025-09-25-upcoming-deprecation-of-gh-copilot-cli-extension). +# We require BOTH the literal product name ("gh-copilot") AND a deprecation +# marker, so generic stderr from the NEW `@github/copilot` CLI — whose repo +# is github.com/github/copilot-cli and which legitimately mentions "copilot-cli" +# in its own banners and error messages — doesn't get misclassified as the +# deprecated extension. +_DEPRECATION_REQUIRED = ("gh-copilot",) +_DEPRECATION_MARKERS = ( + "has been deprecated", + "no commands will be executed", +) + + +def _is_gh_copilot_deprecation_message(stderr_text: str) -> bool: + """True iff stderr looks like the deprecated gh-copilot extension's banner.""" + + lower = stderr_text.lower() + if not any(req in lower for req in _DEPRECATION_REQUIRED): + return False + return any(marker in lower for marker in _DEPRECATION_MARKERS) + def _resolve_command() -> str: return ( @@ -506,6 +528,21 @@ class CopilotACPClient: stderr_text = "\n".join(stderr_tail).strip() if proc.poll() is not None and stderr_text: + if _is_gh_copilot_deprecation_message(stderr_text): + raise RuntimeError( + "Hermes ACP mode requires the NEW GitHub Copilot CLI " + "(github.com/github/copilot-cli), but the binary it just " + "spawned is the deprecated `gh copilot` extension.\n\n" + "Install the new CLI:\n" + " npm install -g @github/copilot\n" + " # then verify with: copilot --help\n\n" + "If `copilot` already resolves to the new CLI but you still see this,\n" + "point Hermes at it explicitly:\n" + " export HERMES_COPILOT_ACP_COMMAND=/path/to/new/copilot\n\n" + "Alternative: use the `copilot` provider (no ACP, hits the Copilot API\n" + "directly with a Copilot subscription token) via `hermes setup`.\n\n" + f"Original error:\n{stderr_text}" + ) raise RuntimeError(f"Copilot ACP process exited early: {stderr_text}") raise TimeoutError(f"Timed out waiting for Copilot ACP response to {method}.") diff --git a/agent/model_metadata.py b/agent/model_metadata.py index 41e229416c9..26a844ccb92 100644 --- a/agent/model_metadata.py +++ b/agent/model_metadata.py @@ -358,6 +358,12 @@ _URL_TO_PROVIDER: Dict[str, str] = { "api.deepseek.com": "deepseek", "api.githubcopilot.com": "copilot", "models.github.ai": "copilot", + # GitHub Models free tier (Azure-hosted prototyping endpoint) — same + # canonical provider as the Copilot API. Hard per-request token cap + # (often 8K) makes it unusable for Hermes' system prompt, but mapping + # it here lets us recognize the endpoint and emit a targeted hint + # instead of falling through the unknown-custom-endpoint path. + "models.inference.ai.azure.com": "copilot", "api.fireworks.ai": "fireworks", "opencode.ai": "opencode-go", "api.x.ai": "xai", diff --git a/apps/dashboard/src/i18n/af.ts b/apps/dashboard/src/i18n/af.ts index 4f49eb12227..e588a63596d 100644 --- a/apps/dashboard/src/i18n/af.ts +++ b/apps/dashboard/src/i18n/af.ts @@ -663,7 +663,7 @@ export const af: Translations = { columnHelp: { triage: "Rou idees — 'n spesifiseerder sal die spesifikasie uitwerk", todo: "Wag op afhanklikhede of nie toegewys nie", - ready: "Toegewys en wag vir 'n versender-tik", + ready: "Afhanklikhede is bevredig; wys 'n profiel toe om te versend", running: "Deur 'n werker geëis — in vlug", blocked: "Werker het mensinvoer aangevra", done: "Voltooi", diff --git a/apps/dashboard/src/i18n/de.ts b/apps/dashboard/src/i18n/de.ts index c70ccfe8701..28a9b59deff 100644 --- a/apps/dashboard/src/i18n/de.ts +++ b/apps/dashboard/src/i18n/de.ts @@ -662,7 +662,7 @@ export const de: Translations = { columnHelp: { triage: "Rohe Ideen — ein Specifier wird die Spezifikation ausarbeiten", todo: "Wartet auf Abhängigkeiten oder ist nicht zugewiesen", - ready: "Zugewiesen und wartet auf einen Dispatcher-Tick", + ready: "Abhängigkeiten erfüllt; Profil zum Dispatch zuweisen", running: "Von einem Worker übernommen — in Bearbeitung", blocked: "Worker hat um menschliche Eingabe gebeten", done: "Abgeschlossen", diff --git a/apps/dashboard/src/i18n/en.ts b/apps/dashboard/src/i18n/en.ts index e93fdac7ec4..5eae3f9a14a 100644 --- a/apps/dashboard/src/i18n/en.ts +++ b/apps/dashboard/src/i18n/en.ts @@ -574,6 +574,9 @@ export const en: Translations = { createTask: "Create task in this column", noTasks: "— no tasks —", unassigned: "unassigned", + needsAssignee: "Needs assignee", + needsAssigneeHint: + "Dependencies are satisfied, but the dispatcher skips this task until you assign a profile.", untitled: "(untitled)", loadingDetail: "Loading…", addComment: "Add a comment… (Enter to submit)", @@ -664,7 +667,7 @@ export const en: Translations = { columnHelp: { triage: "Raw ideas — a specifier will flesh out the spec", todo: "Waiting on dependencies or unassigned", - ready: "Assigned and waiting for a dispatcher tick", + ready: "Dependencies satisfied; assign a profile to dispatch", running: "Claimed by a worker — in-flight", blocked: "Worker asked for human input", done: "Completed", diff --git a/apps/dashboard/src/i18n/es.ts b/apps/dashboard/src/i18n/es.ts index 19088de12c8..139a8175d44 100644 --- a/apps/dashboard/src/i18n/es.ts +++ b/apps/dashboard/src/i18n/es.ts @@ -662,7 +662,7 @@ export const es: Translations = { columnHelp: { triage: "Ideas en bruto — un specifier desarrollará la especificación", todo: "Esperando dependencias o sin asignar", - ready: "Asignado y esperando un tick del dispatcher", + ready: "Dependencias satisfechas; asigna un perfil para despachar", running: "Reclamado por un worker — en ejecución", blocked: "El worker pidió intervención humana", done: "Completado", diff --git a/apps/dashboard/src/i18n/fr.ts b/apps/dashboard/src/i18n/fr.ts index 4532cab3ee0..51b5ba54f12 100644 --- a/apps/dashboard/src/i18n/fr.ts +++ b/apps/dashboard/src/i18n/fr.ts @@ -662,7 +662,7 @@ export const fr: Translations = { columnHelp: { triage: "Idées brutes — un specifier rédigera la spécification", todo: "En attente de dépendances ou non assigné", - ready: "Assigné et en attente d'un tick du dispatcher", + ready: "Dépendances satisfaites ; assignez un profil pour dispatch", running: "Réclamé par un worker — en cours d'exécution", blocked: "Le worker a demandé une intervention humaine", done: "Terminé", diff --git a/apps/dashboard/src/i18n/ga.ts b/apps/dashboard/src/i18n/ga.ts index d75ec061b8b..4dc4e823430 100644 --- a/apps/dashboard/src/i18n/ga.ts +++ b/apps/dashboard/src/i18n/ga.ts @@ -663,7 +663,7 @@ export const ga: Translations = { columnHelp: { triage: "Smaointe amha — déanfaidh specifier an spec a chur i bhfeidhm", todo: "Ag fanacht ar spleáchais nó gan sannadh", - ready: "Sannta agus ag fanacht ar thic an dispatcher", + ready: "Tá na spleáchais sásaithe; sann próifíl le dispatch a dhéanamh", running: "Éilithe ag worker — ar siúl", blocked: "D'iarr an worker ionchur duine", done: "Críochnaithe", diff --git a/apps/dashboard/src/i18n/hu.ts b/apps/dashboard/src/i18n/hu.ts index f563c1dacc4..8b492f3bb16 100644 --- a/apps/dashboard/src/i18n/hu.ts +++ b/apps/dashboard/src/i18n/hu.ts @@ -663,7 +663,7 @@ export const hu: Translations = { columnHelp: { triage: "Nyers ötletek — egy specifier kidolgozza a specifikációt", todo: "Függőségekre vár vagy nincs felelőse", - ready: "Kiosztva, dispatcher tickre vár", + ready: "A függőségek teljesültek; rendelj hozzá profilt az indításhoz", running: "Worker felvette — folyamatban", blocked: "A worker emberi beavatkozást kért", done: "Befejezve", diff --git a/apps/dashboard/src/i18n/it.ts b/apps/dashboard/src/i18n/it.ts index 5e79d3115c3..86fce86589e 100644 --- a/apps/dashboard/src/i18n/it.ts +++ b/apps/dashboard/src/i18n/it.ts @@ -662,7 +662,7 @@ export const it: Translations = { columnHelp: { triage: "Idee grezze — un specifier elaborerà la specifica", todo: "In attesa di dipendenze o non assegnato", - ready: "Assegnato e in attesa di un tick del dispatcher", + ready: "Dipendenze soddisfatte; assegna un profilo per il dispatch", running: "Preso in carico da un worker — in esecuzione", blocked: "Il worker ha richiesto input umano", done: "Completato", diff --git a/apps/dashboard/src/i18n/ja.ts b/apps/dashboard/src/i18n/ja.ts index 175468e4d8b..154e11f5dbb 100644 --- a/apps/dashboard/src/i18n/ja.ts +++ b/apps/dashboard/src/i18n/ja.ts @@ -663,7 +663,7 @@ export const ja: Translations = { columnHelp: { triage: "未整理のアイデア — スペシファイアが仕様を肉付けします", todo: "依存関係の待機中、または未割り当て", - ready: "割り当て済み、ディスパッチャーのティック待ち", + ready: "依存関係は満たされています。ディスパッチするにはプロファイルを割り当ててください", running: "ワーカーが取得中 — 実行中", blocked: "ワーカーが人間の入力を求めています", done: "完了", diff --git a/apps/dashboard/src/i18n/ko.ts b/apps/dashboard/src/i18n/ko.ts index cfc40d63df7..4dafaeb9cde 100644 --- a/apps/dashboard/src/i18n/ko.ts +++ b/apps/dashboard/src/i18n/ko.ts @@ -663,7 +663,7 @@ export const ko: Translations = { columnHelp: { triage: "원시 아이디어 — 스페시파이어가 사양을 구체화합니다", todo: "종속성 대기 중 또는 미지정", - ready: "지정되었으며 디스패처 틱 대기 중", + ready: "종속성이 충족됨; 디스패치하려면 프로필을 지정하세요", running: "워커가 점유 중 — 실행 중", blocked: "워커가 사람의 입력을 요청함", done: "완료됨", diff --git a/apps/dashboard/src/i18n/pt.ts b/apps/dashboard/src/i18n/pt.ts index 6cdd40b8fe5..d32402dc92a 100644 --- a/apps/dashboard/src/i18n/pt.ts +++ b/apps/dashboard/src/i18n/pt.ts @@ -663,7 +663,7 @@ export const pt: Translations = { columnHelp: { triage: "Ideias em bruto — um specifier vai detalhar a especificação", todo: "À espera de dependências ou sem atribuição", - ready: "Atribuído e à espera de um tick do dispatcher", + ready: "Dependências satisfeitas; atribua um perfil para despachar", running: "Reivindicado por um worker — em execução", blocked: "O worker pediu intervenção humana", done: "Concluído", diff --git a/apps/dashboard/src/i18n/ru.ts b/apps/dashboard/src/i18n/ru.ts index c5b9a5b5038..79a6961b251 100644 --- a/apps/dashboard/src/i18n/ru.ts +++ b/apps/dashboard/src/i18n/ru.ts @@ -663,7 +663,7 @@ export const ru: Translations = { columnHelp: { triage: "Сырые идеи — specifier подготовит спецификацию", todo: "Ожидает зависимостей или без исполнителя", - ready: "Назначено и ждёт тика диспетчера", + ready: "Зависимости выполнены; назначьте профиль для диспетчеризации", running: "Взято воркером — выполняется", blocked: "Воркер запросил вмешательство человека", done: "Завершено", diff --git a/apps/dashboard/src/i18n/tr.ts b/apps/dashboard/src/i18n/tr.ts index 7de6ea1df7d..56670424abb 100644 --- a/apps/dashboard/src/i18n/tr.ts +++ b/apps/dashboard/src/i18n/tr.ts @@ -663,7 +663,7 @@ export const tr: Translations = { columnHelp: { triage: "Ham fikirler — bir specifier şartnameyi detaylandıracak", todo: "Bağımlılıklar bekleniyor veya atanmamış", - ready: "Atanmış ve dispatcher tick'i bekleniyor", + ready: "Bağımlılıklar karşılandı; dispatch için bir profil atayın", running: "Bir worker tarafından alındı — yürütülüyor", blocked: "Worker insan girdisi istedi", done: "Tamamlandı", diff --git a/apps/dashboard/src/i18n/types.ts b/apps/dashboard/src/i18n/types.ts index ca40b4a381f..55669a4b679 100644 --- a/apps/dashboard/src/i18n/types.ts +++ b/apps/dashboard/src/i18n/types.ts @@ -586,6 +586,8 @@ export interface Translations { createTask: string; noTasks: string; unassigned: string; + needsAssignee?: string; + needsAssigneeHint?: string; untitled: string; loadingDetail: string; addComment: string; diff --git a/apps/dashboard/src/i18n/uk.ts b/apps/dashboard/src/i18n/uk.ts index 72726aabe5f..3c3df8dae68 100644 --- a/apps/dashboard/src/i18n/uk.ts +++ b/apps/dashboard/src/i18n/uk.ts @@ -663,7 +663,7 @@ export const uk: Translations = { columnHelp: { triage: "Сирі ідеї — специфікатор деталізує специфікацію", todo: "Очікує на залежності або не призначено", - ready: "Призначено, очікує тіку диспетчера", + ready: "Залежності задоволені; призначте профіль для диспетчеризації", running: "Захоплено воркером — у роботі", blocked: "Воркер запитав втручання людини", done: "Завершено", diff --git a/apps/dashboard/src/i18n/zh-hant.ts b/apps/dashboard/src/i18n/zh-hant.ts index c79222cfe91..27f3a41b95f 100644 --- a/apps/dashboard/src/i18n/zh-hant.ts +++ b/apps/dashboard/src/i18n/zh-hant.ts @@ -663,7 +663,7 @@ export const zhHant: Translations = { columnHelp: { triage: "原始想法 — 規格制定者將完善規格", todo: "等待相依項目或尚未指派", - ready: "已指派,等待排程器輪詢", + ready: "相依項目已滿足;指派設定檔以便排程", running: "已被工作者領取 — 執行中", blocked: "工作者請求人工輸入", done: "已完成", diff --git a/apps/dashboard/src/i18n/zh.ts b/apps/dashboard/src/i18n/zh.ts index 0a8ceb7962a..6290c473b82 100644 --- a/apps/dashboard/src/i18n/zh.ts +++ b/apps/dashboard/src/i18n/zh.ts @@ -659,7 +659,7 @@ export const zh: Translations = { columnHelp: { triage: "原始想法 — 规范制定者将完善规格", todo: "等待依赖项或未分配", - ready: "已分配,等待调度器轮询", + ready: "依赖项已满足;分配一个配置文件以便调度", running: "已被工作者认领 — 执行中", blocked: "工作者请求人工输入", done: "已完成", diff --git a/gateway/platforms/base.py b/gateway/platforms/base.py index d03bc282ed3..c6bdc38c3b9 100644 --- a/gateway/platforms/base.py +++ b/gateway/platforms/base.py @@ -2961,9 +2961,25 @@ class BasePlatformAdapter(ABC): merge_pending_message_event(self._pending_messages, session_key, event) return # Don't interrupt now - will run after current task completes - # Default behavior for non-photo follow-ups: interrupt the running agent + # Default behavior for non-photo follow-ups: interrupt the running agent. + # + # Use merge_text=True so rapid TEXT follow-ups (#4469) accumulate + # into the single pending slot instead of clobbering each other. + # Without merging, three rapid messages "A", "B", "C" land like: + # _pending_messages[k] = A (interrupts) + # _pending_messages[k] = B (replaces A before consumer reads) + # _pending_messages[k] = C (replaces B) + # ...and only "C" reaches the next turn. merge_pending_message_event + # already does the right thing for photo/media bursts; the + # ``merge_text=True`` flag extends that to plain TEXT events. + # Same shape as the Telegram bursty-grace path in gateway/run.py. logger.debug("[%s] New message while session %s is active — triggering interrupt", self.name, session_key) - self._pending_messages[session_key] = event + merge_pending_message_event( + self._pending_messages, + session_key, + event, + merge_text=True, + ) # Signal the interrupt (the processing task checks this) self._active_sessions[session_key].set() return # Don't process now - will be handled after current task finishes diff --git a/hermes_cli/__init__.py b/hermes_cli/__init__.py index 0f247ddcc1f..9781c8bc689 100644 --- a/hermes_cli/__init__.py +++ b/hermes_cli/__init__.py @@ -14,8 +14,8 @@ Provides subcommands for: import os import sys -__version__ = "0.13.0" -__release_date__ = "2026.5.7" +__version__ = "0.14.0" +__release_date__ = "2026.5.16" def _ensure_utf8(): diff --git a/hermes_cli/config.py b/hermes_cli/config.py index 8ad6bc083f9..e1b1fbfb52a 100644 --- a/hermes_cli/config.py +++ b/hermes_cli/config.py @@ -1152,6 +1152,10 @@ DEFAULT_CONFIG = { "provider": "", # e.g. "openrouter" (empty = inherit parent provider + credentials) "base_url": "", # direct OpenAI-compatible endpoint for subagents "api_key": "", # API key for delegation.base_url (falls back to OPENAI_API_KEY) + "api_mode": "", # wire protocol for delegation.base_url: "chat_completions", + # "codex_responses", or "anthropic_messages". Empty = auto-detect + # from URL (e.g. /anthropic suffix → anthropic_messages). Set this + # explicitly for non-standard endpoints the heuristic can't detect. # When delegate_task narrows child toolsets explicitly, preserve any # MCP toolsets the parent already has enabled. On by default so # narrowing (e.g. toolsets=["web","browser"]) expresses "I want these @@ -1609,6 +1613,23 @@ DEFAULT_CONFIG = { "servers": {}, }, + # X (Twitter) Search via xAI's built-in x_search Responses tool. + # The tool registers when xAI credentials are available (SuperGrok + # OAuth or XAI_API_KEY) AND the x_search toolset is enabled in + # `hermes tools`. These settings tune the backing Responses API call. + "x_search": { + # xAI model used for the Responses call. grok-4.20-reasoning is + # the recommended default; any Grok model with x_search tool + # access works. + "model": "grok-4.20-reasoning", + # Request timeout in seconds (minimum 30). x_search can take + # 60-120s for complex queries — the default is generous. + "timeout_seconds": 180, + # Number of automatic retries on 5xx / ReadTimeout / ConnectionError. + # Each retry backs off (1.5x attempt seconds, capped at 5s). + "retries": 2, + }, + # Config schema version - bump this when adding new required fields "_config_version": 23, } diff --git a/hermes_cli/doctor.py b/hermes_cli/doctor.py index bf5a8865909..9d3b6e3c01a 100644 --- a/hermes_cli/doctor.py +++ b/hermes_cli/doctor.py @@ -152,6 +152,30 @@ def _apply_doctor_tool_availability_overrides(available: list[str], unavailable: return updated_available, updated_unavailable +def _has_healthy_oauth_fallback_for_apikey_provider(provider_label: str) -> bool: + """Return True when a direct API-key probe failure is non-blocking. + + Some provider families support both a direct API-key path and a separate + OAuth runtime path. When the OAuth path is already healthy, doctor should + still show a failed API-key connectivity row, but it should not promote + that direct-key problem into the final blocking summary. + """ + try: + from hermes_cli.auth import ( + get_gemini_oauth_auth_status, + get_minimax_oauth_auth_status, + ) + except Exception: + return False + + normalized = (provider_label or "").strip().lower() + if normalized in {"google / gemini", "gemini"}: + return bool((get_gemini_oauth_auth_status() or {}).get("logged_in")) + if normalized == "minimax": + return bool((get_minimax_oauth_auth_status() or {}).get("logged_in")) + return False + + def check_ok(text: str, detail: str = ""): print(f" {color('✓', Colors.GREEN)} {text}" + (f" {color(detail, Colors.DIM)}" if detail else "")) @@ -1594,7 +1618,10 @@ def run_doctor(args): print(f" {_glyph} {_label} {_detail}") else: print(f" {_glyph} {_label}") - for _issue in _r.issues: + _issues_to_add = list(_r.issues) + if _issues_to_add and _has_healthy_oauth_fallback_for_apikey_provider(_r.label): + _issues_to_add = [] + for _issue in _issues_to_add: issues.append(_issue) # ========================================================================= diff --git a/hermes_cli/models.py b/hermes_cli/models.py index ded3f448f87..336e220814e 100644 --- a/hermes_cli/models.py +++ b/hermes_cli/models.py @@ -2525,6 +2525,7 @@ def _is_github_models_base_url(base_url: Optional[str]) -> bool: return ( normalized.startswith(COPILOT_BASE_URL) or normalized.startswith("https://models.github.ai/inference") + or normalized.startswith("https://models.inference.ai.azure.com") ) diff --git a/hermes_cli/plugins.py b/hermes_cli/plugins.py index 9e9af0e0644..d0bbee6ce63 100644 --- a/hermes_cli/plugins.py +++ b/hermes_cli/plugins.py @@ -325,8 +325,15 @@ class PluginContext: is_async: bool = False, description: str = "", emoji: str = "", + override: bool = False, ) -> None: - """Register a tool in the global registry **and** track it as plugin-provided.""" + """Register a tool in the global registry **and** track it as plugin-provided. + + Pass ``override=True`` to replace an existing built-in tool with the + same name (e.g. swap the default ``browser_navigate`` for a custom + CDP-backed implementation). Without it, attempting to register a name + already claimed by a different toolset is rejected. + """ from tools.registry import registry registry.register( @@ -339,9 +346,13 @@ class PluginContext: is_async=is_async, description=description, emoji=emoji, + override=override, ) self._manager._plugin_tool_names.add(name) - logger.debug("Plugin %s registered tool: %s", self.manifest.name, name) + logger.debug( + "Plugin %s registered tool: %s%s", + self.manifest.name, name, " (override)" if override else "", + ) # -- message injection -------------------------------------------------- diff --git a/hermes_cli/tools_config.py b/hermes_cli/tools_config.py index 377194589ea..074bd04aa64 100644 --- a/hermes_cli/tools_config.py +++ b/hermes_cli/tools_config.py @@ -61,6 +61,7 @@ CONFIGURABLE_TOOLSETS = [ ("video", "🎬 Video Analysis", "video_analyze (requires video-capable model)"), ("image_gen", "🎨 Image Generation", "image_generate"), ("video_gen", "🎬 Video Generation", "video_generate (text-to-video + image-to-video)"), + ("x_search", "🐦 X (Twitter) Search", "x_search (requires xAI OAuth or XAI_API_KEY)"), ("moa", "🧠 Mixture of Agents", "mixture_of_agents"), ("tts", "🔊 Text-to-Speech", "text_to_speech"), ("skills", "📚 Skills", "list, view, manage"), @@ -86,7 +87,12 @@ CONFIGURABLE_TOOLSETS = [ # Video gen is off by default — it's a niche, paid, slow feature. Users # who want it opt in via `hermes tools` → Video Generation, which walks # them through provider + model selection. -_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "spotify", "discord", "discord_admin", "video", "video_gen"} +# +# X search is off by default — gated on xAI credentials (SuperGrok OAuth +# or XAI_API_KEY). Users opt in via `hermes tools` → X (Twitter) Search, +# which walks them through credential setup. The tool's check_fn means +# the schema won't appear to the model even if enabled without credentials. +_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "spotify", "discord", "discord_admin", "video", "video_gen", "x_search"} # Platform-scoped toolsets: only appear in the `hermes tools` checklist for # these platforms, and only resolve/save for these platforms. A toolset @@ -308,6 +314,39 @@ TOOL_CATEGORIES = { # converge image_gen toward. "providers": [], }, + "x_search": { + "name": "X (Twitter) Search", + "setup_title": "Select xAI Credential Source", + "setup_note": ( + "Hermes routes X searches through xAI's built-in x_search " + "Responses tool. Both credential sources hit the same " + "https://api.x.ai/v1/responses endpoint — pick whichever you " + "already have. SuperGrok OAuth is preferred when both are set " + "(uses your subscription quota instead of API spend)." + ), + "icon": "🐦", + "providers": [ + { + "name": "xAI Grok OAuth (SuperGrok Subscription)", + "badge": "subscription", + "tag": "Browser login at accounts.x.ai — no API key required", + "env_vars": [], + "post_setup": "xai_grok", + }, + { + "name": "xAI API key", + "badge": "paid", + "tag": "Direct xAI API billing via XAI_API_KEY", + "env_vars": [ + { + "key": "XAI_API_KEY", + "prompt": "xAI API key", + "url": "https://console.x.ai/", + }, + ], + }, + ], + }, "browser": { "name": "Browser Automation", "icon": "🌐", diff --git a/model_tools.py b/model_tools.py index db19bb67e53..1cbc83096ac 100644 --- a/model_tools.py +++ b/model_tools.py @@ -21,6 +21,7 @@ Public API (signatures preserved from the original 2,400-line version): """ import json +import re import asyncio import logging import threading @@ -485,6 +486,48 @@ _AGENT_LOOP_TOOLS = {"todo", "memory", "session_search", "delegate_task"} _READ_SEARCH_TOOLS = {"read_file", "search_files"} +# ========================================================================= +# Tool error sanitization +# ========================================================================= +# +# Tool exceptions can carry arbitrary text into the model's context as the +# `tool` message content. json.dumps() handles quote/backslash escaping so a +# raw injection of `` won't break message framing, but the model +# still *reads* those tokens and they can confuse downstream tool-call +# parsing or, in adversarial cases, nudge it toward role-confusion framing. +# +# This helper strips structural framing tokens (XML role tags, CDATA, +# markdown code fences) and caps the message at a sane upper bound before it +# becomes part of the conversation. It's defense-in-depth — the json layer +# already prevents framing escape — but cheap and worth having. +# +# Ported from ironclaw#1639. +_TOOL_ERROR_ROLE_TAG_RE = re.compile( + r'', + re.IGNORECASE, +) +_TOOL_ERROR_FENCE_OPEN_RE = re.compile(r'^\s*```(?:json|xml|html|markdown)?\s*', re.MULTILINE) +_TOOL_ERROR_FENCE_CLOSE_RE = re.compile(r'\s*```\s*$', re.MULTILINE) +_TOOL_ERROR_CDATA_RE = re.compile(r'', re.DOTALL) +_TOOL_ERROR_MAX_LEN = 2000 + + +def _sanitize_tool_error(error_msg: str) -> str: + """Strip structural framing tokens from a tool error before showing it to the model. + + See _TOOL_ERROR_ROLE_TAG_RE docstring above for rationale. + """ + if not error_msg: + return "[TOOL_ERROR] " + sanitized = _TOOL_ERROR_ROLE_TAG_RE.sub("", error_msg) + sanitized = _TOOL_ERROR_FENCE_OPEN_RE.sub("", sanitized) + sanitized = _TOOL_ERROR_FENCE_CLOSE_RE.sub("", sanitized) + sanitized = _TOOL_ERROR_CDATA_RE.sub("", sanitized) + if len(sanitized) > _TOOL_ERROR_MAX_LEN: + sanitized = sanitized[:_TOOL_ERROR_MAX_LEN - 3] + "..." + return f"[TOOL_ERROR] {sanitized}" + + # ========================================================================= # Tool argument type coercion # ========================================================================= @@ -824,7 +867,7 @@ def handle_function_call( except Exception as e: error_msg = f"Error executing {function_name}: {str(e)}" logger.exception(error_msg) - return json.dumps({"error": error_msg}, ensure_ascii=False) + return json.dumps({"error": _sanitize_tool_error(error_msg)}, ensure_ascii=False) # ============================================================================= diff --git a/optional-skills/devops/pinggy-tunnel/SKILL.md b/optional-skills/devops/pinggy-tunnel/SKILL.md new file mode 100644 index 00000000000..fa9f1d5b67b --- /dev/null +++ b/optional-skills/devops/pinggy-tunnel/SKILL.md @@ -0,0 +1,309 @@ +--- +name: pinggy-tunnel +description: Zero-install localhost tunnels over SSH via Pinggy. +version: 0.1.0 +author: Teknium (teknium1), Hermes Agent +license: MIT +platforms: [linux, macos, windows] +metadata: + hermes: + tags: [Pinggy, Tunnel, Networking, SSH, Webhook, Localhost] + related_skills: [cloudflared-quick-tunnel, webhook-subscriptions] +--- + +# Pinggy Tunnel Skill + +Expose a local service (dev server, webhook receiver, MCP endpoint, demo) to the public internet using a Pinggy SSH reverse tunnel. No daemon to install — the user's stock SSH client connects to `a.pinggy.io:443` and Pinggy hands back a public HTTP/HTTPS URL. + +Free tier: 60-minute tunnels, random subdomain, no signup. Pro tier ($3/mo) is an opt-in with a token. + +## When to Use + +- User asks to "expose this locally", "share my dev server", "make this URL public", "tunnel port N", "get a public URL for a webhook" +- Need to receive a webhook callback during a local task (Stripe, GitHub, Discord, AgentMail) +- Sharing a one-off HTTP demo (MCP server, Ollama/vLLM endpoint, dashboard) with a remote party +- The host has SSH but no `cloudflared` / `ngrok` binary, and installing one would be overkill + +If the host already has `cloudflared` configured, prefer the `cloudflared-quick-tunnel` skill — Cloudflare quick tunnels don't expire after 60 minutes. + +## Prerequisites + +- `ssh` on PATH (`ssh -V`). Default on Linux, macOS, and Windows 10+. No other install. +- A local service listening on `127.0.0.1:` before the tunnel starts. Pinggy will return URLs but they'll 502 until the local origin is up. + +Optional: + +- `PINGGY_TOKEN` env var for paid Pro features (persistent subdomain, custom domain, multiple tunnels, no 60-minute cap). Free tier needs no credentials. + +## Quick Reference + +```bash +# Plain HTTP/HTTPS tunnel for port 8000 (free tier) +ssh -p 443 -o StrictHostKeyChecking=no -o ServerAliveInterval=30 \ + -R0:localhost:8000 free@a.pinggy.io + +# TCP tunnel (databases, raw SSH, etc.) +ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:5432 tcp@a.pinggy.io + +# TLS tunnel (Pinggy can't decrypt — bring your own certs at origin) +ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:443 tls@a.pinggy.io + +# Basic auth gate (b:user:pass) +ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:8000 \ + "b:admin:secret+free@a.pinggy.io" + +# Bearer token gate (k:token) +ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:8000 \ + "k:mysecrettoken+free@a.pinggy.io" + +# IP whitelist (w:CIDR) +ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:8000 \ + "w:203.0.113.0/24+free@a.pinggy.io" + +# Enable CORS + force HTTPS redirect +ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:8000 \ + "co+x:https+free@a.pinggy.io" + +# Pro tier (persistent URL, no 60-min cap) +ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:8000 "$PINGGY_TOKEN+a.pinggy.io" +``` + +## Procedure — Start a Tunnel and Get the URL + +The model SHOULD use the `terminal` tool. The tunnel must stay alive for the duration of the share, so run it as a background process and parse the public URL from stdout. + +### 1. Confirm a local origin is up + +```bash +curl -sI http://127.0.0.1:8000/ | head -1 +# expect HTTP/1.x 200 (or any non-connection-refused response) +``` + +If nothing is listening yet, start it first (e.g. `python3 -m http.server 8000 --bind 127.0.0.1`). Pinggy will happily return a URL pointed at nothing — the user will see 502 until the origin comes up. + +### 2. Launch the tunnel as a background process + +Use `terminal(background=True)` and capture output to a logfile (Pinggy prints the URLs on stdout, then keeps the connection open): + +```bash +LOG=/tmp/pinggy-8000.log +nohup ssh -p 443 \ + -o StrictHostKeyChecking=no \ + -o UserKnownHostsFile=/dev/null \ + -o ServerAliveInterval=30 \ + -o ServerAliveCountMax=3 \ + -R0:localhost:8000 free@a.pinggy.io \ + > "$LOG" 2>&1 & +echo $! > /tmp/pinggy-8000.pid +``` + +`StrictHostKeyChecking=no` + `UserKnownHostsFile=/dev/null` skips the first-run host-key prompt. `ServerAliveInterval=30` keeps the SSH session from getting torn down by an idle NAT. + +### 3. Parse the URL out of the log + +```bash +sleep 4 +grep -oE 'https://[a-z0-9-]+\.[a-z]+\.pinggy\.link' /tmp/pinggy-8000.log | head -1 +``` + +Expected output looks like: + +``` +You are not authenticated. +Your tunnel will expire in 60 minutes. +http://yqycl-98-162-69-48.a.free.pinggy.link +https://yqycl-98-162-69-48.a.free.pinggy.link +``` + +Hand the `https://...pinggy.link` URL to the user. + +### 4. Verify + +```bash +curl -sI https:/// | head -3 +# expect 200/302/whatever the local origin actually returns +``` + +If you get `502 Bad Gateway`, the SSH session is up but the local origin isn't listening — fix step 1 first. + +### 5. Teardown + +```bash +kill "$(cat /tmp/pinggy-8000.pid)" +# or, if the pid file got lost: +pkill -f 'ssh -p 443 .* free@a\.pinggy\.io' +``` + +If you have a session_id from `terminal(background=True)`, prefer `process(action='kill', session_id=...)`. + +## Access Control via Username Keywords + +Pinggy stacks control flags into the SSH username separated by `+`. Always quote the whole `user@host` argument when it contains a `+`: + +| Keyword | Effect | +|---------|--------| +| `b:user:pass` | HTTP Basic auth gate | +| `k:token` | Bearer-token header gate (`Authorization: Bearer `) | +| `w:CIDR` | IP whitelist (single IP or CIDR, repeatable) | +| `co` | Add `Access-Control-Allow-Origin: *` (CORS) | +| `x:https` | Force HTTPS — auto-redirect HTTP to HTTPS | +| `a:Name:Value` | Add request header | +| `u:Name:Value` | Update request header | +| `r:Name` | Remove request header | +| `qr` | Print a QR code of the URL to stdout (handy for mobile sharing) | + +Combine freely: `"b:admin:secret+co+x:https+free@a.pinggy.io"`. + +## Web Debugger (optional) + +Pinggy can mirror the inbound traffic to `localhost:4300` for inspection. Add a local forward to the SSH command: + +```bash +ssh -p 443 -L4300:localhost:4300 -R0:localhost:8000 free@a.pinggy.io +``` + +Then open `http://localhost:4300` in a browser to see live request/response pairs. + +## Pitfalls + +- **60-minute hard cap on the free tier.** The SSH session terminates at the 60-minute mark; the URL goes dead. For longer shares, either use `PINGGY_TOKEN` (Pro) or auto-restart with a shell loop (note that the URL changes on every restart for free-tier). +- **Free-tier URL is random and changes on restart.** Don't bookmark it, don't paste it into a config file. Re-parse from the log each time. +- **Concurrent free tunnels are limited to one per source IP.** Starting a second tunnel from the same machine usually kills the first. Pro tier lifts this. +- **`+` in usernames must be quoted.** Bare `ssh ... b:admin:secret+free@a.pinggy.io` works in bash but breaks under shells that treat `+` specially or when assembled programmatically. Always wrap in double quotes. +- **Don't tunnel anything sensitive without an access-control flag.** A bare HTTP tunnel is reachable by anyone with the URL. Use `b:`, `k:`, or `w:` for non-public services. +- **`process(action='log')` may miss SSH banner output.** Pinggy prints the URLs and then the SSH session goes interactive. Always redirect to a logfile and `grep` the file directly — same pattern as `cloudflared-quick-tunnel`. +- **Host-key prompt on first run.** Default OpenSSH config asks the user to accept Pinggy's host key. Always pass `-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null` for unattended runs. +- **TCP and TLS tunnels return a `.a.pinggy.online:` pair, not an https URL.** Parse with a different regex (`tcp://` and a port). Don't assume every Pinggy tunnel is HTTP. +- **Pro mode requires the token as the username, not a flag.** Use `"$PINGGY_TOKEN+a.pinggy.io"` (no `free@`). With a token you can also add `:persistent` for a stable subdomain — see `pinggy.io/docs/`. + +## Recipes + +Composite patterns combining a local origin with a Pinggy tunnel. Each recipe is self-contained — start the origin, start the tunnel, parse the URL, hand it back to the user. + +### Recipe 1 — Receive a webhook callback + +Use this when an external service (Stripe, GitHub, Discord, AgentMail, etc.) needs to POST to a publicly reachable URL during a local task. + +```bash +# 1. Tiny capturing server: every request gets appended to /tmp/webhook-hits.log +cat >/tmp/webhook-server.py <<'PY' +import http.server, json, datetime, pathlib +LOG = pathlib.Path("/tmp/webhook-hits.log") +class H(http.server.BaseHTTPRequestHandler): + def _capture(self): + n = int(self.headers.get("content-length") or 0) + body = self.rfile.read(n).decode("utf-8", "replace") if n else "" + rec = {"t": datetime.datetime.utcnow().isoformat(), "path": self.path, + "method": self.command, "headers": dict(self.headers), "body": body} + with LOG.open("a") as f: f.write(json.dumps(rec) + "\n") + self.send_response(200); self.send_header("content-type","application/json") + self.end_headers(); self.wfile.write(b'{"ok":true}\n') + def do_GET(self): self._capture() + def do_POST(self): self._capture() + def log_message(self,*a,**k): pass +http.server.HTTPServer(("127.0.0.1", 18080), H).serve_forever() +PY +nohup python3 /tmp/webhook-server.py >/tmp/webhook-server.log 2>&1 & +echo $! >/tmp/webhook-server.pid + +# 2. Tunnel — bearer-token-gate so randos can't pollute the capture log +nohup ssh -p 443 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \ + -o ServerAliveInterval=30 \ + -R0:localhost:18080 "k:$(openssl rand -hex 12)+free@a.pinggy.io" \ + >/tmp/webhook-pinggy.log 2>&1 & +echo $! >/tmp/webhook-pinggy.pid +sleep 5 +URL=$(grep -oE 'https://[a-z0-9-]+\.[a-z]+\.pinggy\.link' /tmp/webhook-pinggy.log | head -1) +echo "Webhook URL: $URL" + +# 3. While the agent works, watch hits land +tail -f /tmp/webhook-hits.log +``` + +Hand `$URL` to the service that needs to call you. Teardown: `kill $(cat /tmp/webhook-server.pid) $(cat /tmp/webhook-pinggy.pid)`. + +### Recipe 2 — Expose an MCP server over HTTP/SSE + +Use when a remote MCP client (Claude Desktop on another machine, a teammate's editor, etc.) needs to reach an MCP server running on the local box. Only works for MCP servers that speak HTTP transport — stdio-mode servers can't be tunneled. + +```bash +# 1. Start the MCP server in HTTP mode (example: a FastMCP server on port 8765) +nohup python3 my_mcp_server.py --transport http --port 8765 \ + >/tmp/mcp-server.log 2>&1 & +echo $! >/tmp/mcp-server.pid + +# 2. Tunnel with a bearer token — MCP traffic should not be open to the internet +TOKEN=$(openssl rand -hex 16) +nohup ssh -p 443 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \ + -o ServerAliveInterval=30 \ + -R0:localhost:8765 "k:$TOKEN+free@a.pinggy.io" \ + >/tmp/mcp-pinggy.log 2>&1 & +echo $! >/tmp/mcp-pinggy.pid +sleep 5 +URL=$(grep -oE 'https://[a-z0-9-]+\.[a-z]+\.pinggy\.link' /tmp/mcp-pinggy.log | head -1) +echo "MCP URL: $URL" +echo "Bearer token: $TOKEN" +``` + +The remote client connects to `$URL` with `Authorization: Bearer $TOKEN`. Hermes' own native MCP client config: `{"transport": "http", "url": "", "headers": {"Authorization": "Bearer "}}`. + +### Recipe 3 — Expose a local LLM endpoint (Ollama / vLLM / llama.cpp) + +Share a local model with a remote caller (another agent, a phone, a teammate). Ollama listens on `:11434`, vLLM and llama.cpp typically on `:8000`. + +```bash +# Pre-req: the model server is already running on 127.0.0.1:11434 (Ollama default) +TOKEN=$(openssl rand -hex 16) +nohup ssh -p 443 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \ + -o ServerAliveInterval=30 \ + -R0:localhost:11434 "k:$TOKEN+co+free@a.pinggy.io" \ + >/tmp/llm-pinggy.log 2>&1 & +echo $! >/tmp/llm-pinggy.pid +sleep 5 +URL=$(grep -oE 'https://[a-z0-9-]+\.[a-z]+\.pinggy\.link' /tmp/llm-pinggy.log | head -1) +echo "Endpoint: $URL" +echo "Token: $TOKEN" + +# Verify +curl -s "$URL/api/tags" -H "Authorization: Bearer $TOKEN" | head +``` + +`co` enables CORS so a browser caller can hit the endpoint. Drop `co` for backend-only callers. For an OpenAI-compatible vLLM/llama.cpp endpoint, callers use base URL `$URL/v1` with `Authorization: Bearer $TOKEN` — but note Pinggy strips/replaces nothing in the body, so the model server itself sees Pinggy's token; the local server should be configured to ignore auth (it's already on `127.0.0.1`) and let Pinggy do the gating. + +### Recipe 4 — Share a dev server with a one-shot password + +The fastest "let a teammate poke at my running app" pattern. Random password, prints once, dies when you Ctrl-C. + +```bash +PASS=$(openssl rand -base64 12 | tr -d '+/=' | head -c 12) +echo "Dev server password: $PASS" +ssh -p 443 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \ + -o ServerAliveInterval=30 \ + -R0:localhost:3000 "b:dev:$PASS+co+x:https+free@a.pinggy.io" +# URL prints to the terminal. Share URL + password. Ctrl-C to tear down. +``` + +`b:dev:$PASS` gates the URL with HTTP Basic auth. `x:https` forces TLS. `co` adds CORS for SPA frontends. + +## Verification + +```bash +# End-to-end: spin up a trivial origin, tunnel it, hit it, tear down +python3 -m http.server 18000 --bind 127.0.0.1 >/tmp/origin.log 2>&1 & +ORIGIN_PID=$! + +nohup ssh -p 443 \ + -o StrictHostKeyChecking=no \ + -o UserKnownHostsFile=/dev/null \ + -R0:localhost:18000 free@a.pinggy.io >/tmp/pinggy-verify.log 2>&1 & +SSH_PID=$! + +sleep 5 +URL=$(grep -oE 'https://[a-z0-9-]+\.[a-z]+\.pinggy\.link' /tmp/pinggy-verify.log | head -1) +echo "URL: $URL" +curl -sI "$URL/" | head -1 + +kill "$SSH_PID" "$ORIGIN_PID" +``` + +Expected: a `pinggy.link` URL and `HTTP/2 200` on the curl head. diff --git a/optional-skills/research/darwinian-evolver/SKILL.md b/optional-skills/research/darwinian-evolver/SKILL.md new file mode 100644 index 00000000000..272f6702481 --- /dev/null +++ b/optional-skills/research/darwinian-evolver/SKILL.md @@ -0,0 +1,199 @@ +--- +name: darwinian-evolver +description: Evolve prompts/regex/SQL/code with Imbue's evolution loop. +version: 0.1.0 +author: Bihruze (Asahi0x), Hermes Agent +license: MIT +platforms: [linux, macos] +metadata: + hermes: + tags: [evolution, optimization, prompt-engineering, research] + related_skills: [arxiv, jupyter-live-kernel] +--- + +# Darwinian Evolver + +Run Imbue's [darwinian_evolver](https://github.com/imbue-ai/darwinian_evolver) — an +LLM-driven evolutionary search loop — to optimize a **prompt, regex, SQL query, +or small code snippet** against a fitness function. + +Status: thin wrapper around the upstream tool. The skill installs it, walks the +agent through writing a `Problem` definition (organism + evaluator + mutator), +and drives the loop via the upstream CLI or a small custom Python driver. + +**License:** the upstream tool is **AGPL-3.0**. The skill ONLY ever invokes it +via the upstream CLI or a `subprocess`/`uv run` call (mere aggregation). Do NOT +import upstream classes into Hermes itself. + +## When to Use + +- User says "optimize this prompt", "evolve a regex for X", "auto-improve this + code/SQL", "search for a better instruction". +- You have a scorer (exact match, regex pass-rate, unit test, LLM-judge, runtime + metric) AND a starting candidate (organism). If you don't have a scorer, stop + and define one first — that's the hard part. +- Cost is OK: a typical run is 50–500 LLM calls. On gpt-4o-mini that's pennies; + on Claude Sonnet it can be a few dollars. + +Do **not** use this when: +- The optimization target is differentiable (use gradient descent / DSPy). +- You only need to try 2–3 variants — just write them by hand. +- The fitness signal is purely subjective with no measurable criterion. + +## Prerequisites + +- Python ≥3.11 +- `git`, `uv` (or `pip`) +- One of: `OPENROUTER_API_KEY`, `ANTHROPIC_API_KEY`, or `OPENAI_API_KEY` + +The skill ships a small `parrot_openrouter.py` driver that uses `OPENROUTER_API_KEY` +via the OpenAI SDK, so any model on OpenRouter works. The upstream CLI itself +hardcodes Anthropic and needs `ANTHROPIC_API_KEY`. + +## Install (One-Time) + +Run via the `terminal` tool: + +```bash +mkdir -p ~/.hermes/cache/darwinian-evolver && cd ~/.hermes/cache/darwinian-evolver +[ -d darwinian_evolver ] || git clone --depth 1 https://github.com/imbue-ai/darwinian_evolver.git +cd darwinian_evolver && uv sync +``` + +Verify: + +```bash +cd ~/.hermes/cache/darwinian-evolver/darwinian_evolver \ + && uv run darwinian_evolver --help | head -5 +``` + +## Quick Start — The Built-In Parrot Example + +Tiny smoke test (requires `ANTHROPIC_API_KEY`): + +```bash +cd ~/.hermes/cache/darwinian-evolver/darwinian_evolver +uv run darwinian_evolver parrot \ + --num_iterations 2 \ + --num_parents_per_iteration 2 \ + --mutator_concurrency 2 --evaluator_concurrency 2 \ + --output_dir /tmp/parrot_demo +``` + +Outputs: +- `/tmp/parrot_demo/snapshots/iteration_N.pkl` — pickled population per iteration +- `/tmp/parrot_demo/` — per-iteration JSON log (path printed at end) + +Open `~/.hermes/cache/darwinian-evolver/darwinian_evolver/darwinian_evolver/lineage_visualizer.html` +in a browser and load the JSON log to see the evolutionary tree. + +## Quick Start — OpenRouter Driver (No Anthropic Key) + +The skill ships `scripts/parrot_openrouter.py` — same parrot problem, but the +LLM call goes through OpenRouter so any provider works. + +```bash +# From wherever the skill is installed: +SKILL_DIR=~/.hermes/skills/research/darwinian-evolver +DE_DIR=~/.hermes/cache/darwinian-evolver/darwinian_evolver + +cd "$DE_DIR" && \ + EVOLVER_MODEL='openai/gpt-4o-mini' \ + uv run --with openai python "$SKILL_DIR/scripts/parrot_openrouter.py" \ + --num_iterations 3 --num_parents_per_iteration 2 \ + --output_dir /tmp/parrot_or +``` + +Inspect the result with `scripts/show_snapshot.py`: + +```bash +uv run --with openai python "$SKILL_DIR/scripts/show_snapshot.py" \ + /tmp/parrot_or/snapshots/iteration_3.pkl +``` + +Expected output: 7 evolved prompt templates ranked by score, with the best +landing around 0.6–0.8 (the seed `Say {{ phrase }}` scored 0.000). + +## Defining a Custom Problem + +The skill ships `templates/custom_problem_template.py` — copy, edit, run. +Three things you must define: + +1. **`Organism`** — a Pydantic `BaseModel` subclass holding the artifact being + evolved (`prompt_template: str`, `regex_pattern: str`, `sql_query: str`, + `code_block: str`, etc.). Add a `run(*args)` method that exercises it. + +2. **`Evaluator`** — `.evaluate(organism) -> EvaluationResult(score=..., trainable_failure_cases=[...], holdout_failure_cases=[...], is_viable=True)`. + - **`score`** is in `[0, 1]`. Higher is better. + - **`trainable_failure_cases`** — what the mutator sees. Include enough + context (input, expected, actual) for the LLM to diagnose. + - **`holdout_failure_cases`** — kept out of the mutator's view. Use these + to detect overfitting. + - **`is_viable=True`** unless the organism is completely broken (raises, + returns None, etc.). A 0-score viable organism is fine — it just gets + down-weighted in parent selection. + +3. **`Mutator`** — `.mutate(organism, failure_cases, learning_log_entries) -> list[Organism]`. + Typically: build an LLM prompt that includes the current organism + a + failure case + an ask to propose a fix; parse the LLM's response; return + a new `Organism`. Return `[]` on parse failure — the loop handles it. + +Then write a driver script that wires `Problem(initial_organism, evaluator, [mutators])` +into `EvolveProblemLoop` and iterates over `loop.run(num_iterations=N)` — the +shipped `scripts/parrot_openrouter.py` is the reference. + +## Hyperparameters That Actually Matter + +| flag | default | when to change | +|---|---|---| +| `--num_iterations` | 5 | bump to 10–20 once you trust the evaluator | +| `--num_parents_per_iteration` | 4 | drop to 2 for cheap exploration | +| `--mutator_concurrency` | 10 | drop to 2–4 to avoid rate limits | +| `--evaluator_concurrency` | 10 | same; evaluator hits the LLM too | +| `--batch_size` | 1 | raise to 3–5 once your mutator handles multiple failures | +| `--verify_mutations` | off | turn on once mutator is wasteful (>10× cost saving on later runs per Imbue) | +| `--midpoint_score` | `p75` | leave alone unless scores cluster | +| `--sharpness` | 10 | leave alone | + +## Pitfalls + +1. **`Initial organism must be viable`** — set `is_viable=True` in your + `EvaluationResult` even on a 0-score seed. The loop refuses non-viable + organisms because they imply the loop has nothing to evolve from. +2. **Provider content filters kill runs.** Azure-backed OpenRouter models + reject phrases like "ignore previous instructions" with HTTP 400. Wrap + the LLM call in `try/except` and return `f""` — the + evolver will just score that organism 0 and move on. +3. **`loop.run()` is a generator** — calling it doesn't run anything until + you iterate. Use `for snap in loop.run(num_iterations=N):`. +4. **Snapshots are nested pickles.** `iteration_N.pkl` contains a dict with + `population_snapshot` (more pickled bytes). To unpickle you must have the + `Organism` class importable under the same dotted path it was pickled at. +5. **Concurrency defaults are aggressive.** 10/10 will hit rate limits on + most providers. Start with 2/2. +6. **CLI is hardcoded to Anthropic.** `uv run darwinian_evolver ` + reaches for `ANTHROPIC_API_KEY` and uses Claude Sonnet. To use any other + provider, write a driver like `parrot_openrouter.py`. +7. **AGPL.** Never `from darwinian_evolver import ...` inside Hermes core. + Custom driver scripts under `~/.hermes/skills/...` are user-side and fine. +8. **No PyPI package.** `pip install darwinian-evolver` will pull the wrong + thing. Always install from the GitHub repo. + +## Verification + +After install + a parrot run, exit code 0 from this is sufficient: + +```bash +DE_DIR=~/.hermes/cache/darwinian-evolver/darwinian_evolver +ls "$DE_DIR/darwinian_evolver/lineage_visualizer.html" >/dev/null && \ +cd "$DE_DIR" && uv run darwinian_evolver --help >/dev/null && \ +echo "darwinian-evolver: OK" +``` + +## References + +- [Imbue research post](https://imbue.com/research/2026-02-27-darwinian-evolver/) +- [ARC-AGI-2 results](https://imbue.com/research/2026-02-27-arc-agi-2-evolution/) +- [imbue-ai/darwinian_evolver](https://github.com/imbue-ai/darwinian_evolver) (AGPL-3.0) +- [Darwin Gödel Machines](https://arxiv.org/abs/2505.22954) +- [PromptBreeder](https://arxiv.org/abs/2309.16797) diff --git a/optional-skills/research/darwinian-evolver/scripts/parrot_openrouter.py b/optional-skills/research/darwinian-evolver/scripts/parrot_openrouter.py new file mode 100644 index 00000000000..545f8f1feb3 --- /dev/null +++ b/optional-skills/research/darwinian-evolver/scripts/parrot_openrouter.py @@ -0,0 +1,218 @@ +""" +parrot_openrouter: same as the upstream `parrot` example but the LLM call goes +through OpenRouter (OpenAI SDK) instead of Anthropic native. Lets us run an +end-to-end evolution with whatever model the user already has paid access to. + +Run with: + uv --project darwinian_evolver run python parrot_openrouter.py \ + --num_iterations 3 --output_dir /tmp/parrot_out + +Reads `OPENROUTER_API_KEY` from the environment. +""" +from __future__ import annotations + +import argparse +import os +import sys +from pathlib import Path + +import jinja2 +from openai import OpenAI + +# Vendored problem types from upstream (AGPL — only run via subprocess in production) +from darwinian_evolver.cli_common import build_hyperparameter_config_from_args +from darwinian_evolver.cli_common import register_hyperparameter_args +from darwinian_evolver.cli_common import parse_learning_log_view_type +from darwinian_evolver.evolve_problem_loop import EvolveProblemLoop +from darwinian_evolver.learning_log import LearningLogEntry +from darwinian_evolver.problem import EvaluationFailureCase +from darwinian_evolver.problem import EvaluationResult +from darwinian_evolver.problem import Evaluator +from darwinian_evolver.problem import Mutator +from darwinian_evolver.problem import Organism +from darwinian_evolver.problem import Problem + +DEFAULT_MODEL = os.environ.get("EVOLVER_MODEL", "openai/gpt-4o-mini") + + +def _client() -> OpenAI: + key = os.environ.get("OPENROUTER_API_KEY") + if not key: + sys.exit("OPENROUTER_API_KEY is not set") + return OpenAI(api_key=key, base_url="https://openrouter.ai/api/v1") + + +def _prompt_llm(prompt: str) -> str: + try: + r = _client().chat.completions.create( + model=DEFAULT_MODEL, + max_tokens=1024, + messages=[{"role": "user", "content": prompt}], + ) + return r.choices[0].message.content or "" + except Exception as e: + # Treat any provider error (rate limit, content filter, schema reject) + # as a failed response. The evolver will simply see this as a low score + # on this organism and move on — much friendlier than killing the run. + return f"" + + +class ParrotOrganism(Organism): + prompt_template: str + + def run(self, phrase: str) -> str: + try: + prompt = jinja2.Template(self.prompt_template).render(phrase=phrase) + except jinja2.exceptions.TemplateError as e: + return f"Error rendering prompt: {e}" + if not prompt: + return "" + return _prompt_llm(prompt) + + +class ParrotEvaluationFailureCase(EvaluationFailureCase): + phrase: str + response: str + + +class ImproveParrotMutator(Mutator[ParrotOrganism, ParrotEvaluationFailureCase]): + IMPROVEMENT_PROMPT_TEMPLATE = """ +We want to build a prompt that causes an LLM to repeat back a given phrase verbatim. + +The current prompt template is: +``` +{{ organism.prompt_template }} +``` + +Unfortunately, on this phrase: +``` +{{ failure_case.phrase }} +``` +the LLM responded with: +``` +{{ failure_case.response }} +``` + +Diagnose what went wrong, then propose an improved prompt template. Put the new +template in the LAST triple-backtick block of your response. +""".strip() + + def mutate( + self, + organism: ParrotOrganism, + failure_cases: list[ParrotEvaluationFailureCase], + learning_log_entries: list[LearningLogEntry], + ) -> list[ParrotOrganism]: + fc = failure_cases[0] + prompt = jinja2.Template(self.IMPROVEMENT_PROMPT_TEMPLATE).render( + organism=organism, failure_case=fc + ) + try: + resp = _prompt_llm(prompt) + parts = resp.split("```") + if len(parts) < 3: + return [] + new_tpl = parts[-2].strip() + return [ParrotOrganism(prompt_template=new_tpl)] + except Exception as e: + print(f"mutate error: {e}", file=sys.stderr) + return [] + + +class ParrotEvaluator(Evaluator[ParrotOrganism, EvaluationResult, ParrotEvaluationFailureCase]): + TRAINABLE_PHRASES = [ + "Hello world.", + "bla", + "Bla", + "bla.", + '"bla bla".', + "Just say 'foo' once with no extra words.", + ] + HOLDOUT_PHRASES = [ + "bla, but only once.", + "'bla'", + ] + + def evaluate(self, organism: ParrotOrganism) -> EvaluationResult: + train_fails: list[ParrotEvaluationFailureCase] = [] + hold_fails: list[ParrotEvaluationFailureCase] = [] + for i, p in enumerate(self.TRAINABLE_PHRASES): + r = organism.run(p) + if r != p: + train_fails.append(ParrotEvaluationFailureCase( + phrase=p, response=r, data_point_id=f"trainable_{i}")) + for i, p in enumerate(self.HOLDOUT_PHRASES): + r = organism.run(p) + if r != p: + hold_fails.append(ParrotEvaluationFailureCase( + phrase=p, response=r, data_point_id=f"holdout_{i}")) + n_total = len(self.TRAINABLE_PHRASES) + len(self.HOLDOUT_PHRASES) + n_ok = n_total - len(train_fails) - len(hold_fails) + return EvaluationResult( + score=n_ok / n_total, + trainable_failure_cases=train_fails, + holdout_failure_cases=hold_fails, + # Always viable. Even a 0-score seed is a valid starting point; the + # mutator should still get a chance to fix it. + is_viable=True, + ) + + +def make_problem() -> Problem: + return Problem[ParrotOrganism, EvaluationResult, ParrotEvaluationFailureCase]( + evaluator=ParrotEvaluator(), + mutators=[ImproveParrotMutator()], + initial_organism=ParrotOrganism(prompt_template="Say {{ phrase }}"), + ) + + +def main() -> int: + ap = argparse.ArgumentParser() + register_hyperparameter_args(ap.add_argument_group("hyperparameters")) + ap.add_argument("--num_iterations", type=int, default=3) + ap.add_argument("--mutator_concurrency", type=int, default=4) + ap.add_argument("--evaluator_concurrency", type=int, default=4) + ap.add_argument("--output_dir", type=str, required=True) + args = ap.parse_args() + + out = Path(args.output_dir) + out.mkdir(parents=True, exist_ok=True) + + hp = build_hyperparameter_config_from_args(args) + loop = EvolveProblemLoop( + problem=make_problem(), + learning_log_view_type=parse_learning_log_view_type(hp.learning_log_view_type), + num_parents_per_iteration=hp.num_parents_per_iteration, + mutator_concurrency=args.mutator_concurrency, + evaluator_concurrency=args.evaluator_concurrency, + fixed_midpoint_score=hp.fixed_midpoint_score, + midpoint_score_percentile=hp.midpoint_score_percentile, + sharpness=hp.sharpness, + novelty_weight=hp.novelty_weight, + batch_size=hp.batch_size, + should_verify_mutations=hp.verify_mutations, + ) + + import json + log_path = out / "results.jsonl" + snap_dir = out / "snapshots" + snap_dir.mkdir(exist_ok=True) + print("Evaluating initial organism...") + for snap in loop.run(num_iterations=args.num_iterations): + (snap_dir / f"iteration_{snap.iteration}.pkl").write_bytes(snap.snapshot) + _, best_eval = snap.best_organism_result + print(f"iter={snap.iteration} pop={snap.population_size} " + f"best_score={best_eval.score:.3f}") + with log_path.open("a") as f: + f.write(json.dumps({ + "iteration": snap.iteration, + "best_score": best_eval.score, + "pop_size": snap.population_size, + "score_percentiles": {str(k): v for k, v in snap.score_percentiles.items()}, + }) + "\n") + print(f"\nDone. Results in: {out}") + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/optional-skills/research/darwinian-evolver/scripts/show_snapshot.py b/optional-skills/research/darwinian-evolver/scripts/show_snapshot.py new file mode 100644 index 00000000000..10e3a03dca9 --- /dev/null +++ b/optional-skills/research/darwinian-evolver/scripts/show_snapshot.py @@ -0,0 +1,69 @@ +""" +show_snapshot.py — Dump the population from a darwinian-evolver snapshot pickle. + +Usage: + python show_snapshot.py PATH/TO/iteration_N.pkl [--field prompt_template] + +The script is intentionally Organism-agnostic: it walks `org.__dict__` and prints +all str fields. By default it shows `prompt_template` if present; pass --field to +target a different attribute (e.g. `regex_pattern`, `sql_query`, `code_block`). +""" +from __future__ import annotations + +import argparse +import pickle +import sys +from pathlib import Path + + +def main() -> int: + ap = argparse.ArgumentParser() + ap.add_argument("snapshot", type=Path) + ap.add_argument( + "--field", + default=None, + help="Organism attribute to display. Defaults to the first str field found.", + ) + ap.add_argument("--top", type=int, default=None, help="Show only top N by score.") + args = ap.parse_args() + + if not args.snapshot.exists(): + sys.exit(f"snapshot not found: {args.snapshot}") + + # The outer pickle wraps a dict; the inner pickle contains the actual organism + # objects, which must be importable under their original dotted path. If you + # ran a custom driver, make sure its module is on sys.path before calling this. + outer = pickle.loads(args.snapshot.read_bytes()) + if not isinstance(outer, dict) or "population_snapshot" not in outer: + sys.exit("not a darwinian-evolver snapshot (no population_snapshot key)") + inner = pickle.loads(outer["population_snapshot"]) + pairs = inner["organisms"] # list of (Organism, EvaluationResult) + + print(f"# organisms: {len(pairs)}\n") + ranked = sorted(pairs, key=lambda p: getattr(p[1], "score", 0) or 0, reverse=True) + if args.top: + ranked = ranked[: args.top] + + for i, (org, res) in enumerate(ranked): + score = getattr(res, "score", float("nan")) + print(f"=== rank {i} score={score:.3f} ===") + # pick field + field = args.field + if field is None: + for k, v in vars(org).items(): + if isinstance(v, str) and not k.startswith("_") and k not in ("id",): + field = k + break + val = getattr(org, field, None) if field else None + if val is None: + print(f" (no string field; org fields: {list(vars(org).keys())})") + else: + print(f" {field} ({len(val)} chars):") + for ln in val.splitlines()[:30]: + print(f" {ln}") + print() + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/optional-skills/research/darwinian-evolver/templates/custom_problem_template.py b/optional-skills/research/darwinian-evolver/templates/custom_problem_template.py new file mode 100644 index 00000000000..c6daac14ede --- /dev/null +++ b/optional-skills/research/darwinian-evolver/templates/custom_problem_template.py @@ -0,0 +1,240 @@ +""" +Template: a custom darwinian-evolver problem. + +Copy this file, fill in the THREE marked spots (Organism, Evaluator, Mutator), +then run it as a driver script. The skeleton handles all the wiring so you only +write the domain-specific logic. + +To run: + cd ~/.hermes/cache/darwinian-evolver/darwinian_evolver + OPENROUTER_API_KEY=... uv run --with openai python /path/to/this_file.py \ + --num_iterations 3 --num_parents_per_iteration 2 \ + --output_dir /tmp/my_problem + +The pattern mirrors `scripts/parrot_openrouter.py` (the working reference). +""" +from __future__ import annotations + +import argparse +import os +import sys +from pathlib import Path + +from openai import OpenAI + +# Upstream types (AGPL — invoked via subprocess in production; importing here +# is fine for skill-side driver scripts the user owns). +from darwinian_evolver.cli_common import ( + build_hyperparameter_config_from_args, + parse_learning_log_view_type, + register_hyperparameter_args, +) +from darwinian_evolver.evolve_problem_loop import EvolveProblemLoop +from darwinian_evolver.learning_log import LearningLogEntry +from darwinian_evolver.problem import ( + EvaluationFailureCase, + EvaluationResult, + Evaluator, + Mutator, + Organism, + Problem, +) + +DEFAULT_MODEL = os.environ.get("EVOLVER_MODEL", "openai/gpt-4o-mini") + + +def _client() -> OpenAI: + key = os.environ.get("OPENROUTER_API_KEY") + if not key: + sys.exit("OPENROUTER_API_KEY is not set") + return OpenAI(api_key=key, base_url="https://openrouter.ai/api/v1") + + +def _prompt_llm(prompt: str, max_tokens: int = 1024) -> str: + try: + r = _client().chat.completions.create( + model=DEFAULT_MODEL, + max_tokens=max_tokens, + messages=[{"role": "user", "content": prompt}], + ) + return r.choices[0].message.content or "" + except Exception as e: + # Never let one bad LLM response kill the run. + return f"" + + +# --------------------------------------------------------------------------- +# 1. ORGANISM — what you are evolving. +# --------------------------------------------------------------------------- +class MyOrganism(Organism): + # TODO: replace with your artifact field. Common shapes: + # prompt_template: str + # regex_pattern: str + # sql_query: str + # code_block: str + artifact: str + + def run(self, *inputs) -> str: + """Exercise the organism on a test input. Return whatever your + evaluator wants to score.""" + # TODO: implement. For prompt evolution this typically calls _prompt_llm + # with the artifact rendered against the input. For regex/SQL it would + # call `re.findall(self.artifact, input)` / execute SQL / etc. + raise NotImplementedError + + +# --------------------------------------------------------------------------- +# 2. EVALUATOR — score organisms and surface failures the mutator can learn from. +# --------------------------------------------------------------------------- +class MyFailureCase(EvaluationFailureCase): + # TODO: include enough context for the LLM to diagnose the failure. + input: str + expected: str + actual: str + + +class MyEvaluator(Evaluator[MyOrganism, EvaluationResult, MyFailureCase]): + # Split your dataset. Mutator only sees trainable; holdout detects overfitting. + TRAINABLE = [ + # TODO: list of (input, expected) tuples + # ("input1", "expected1"), + ] + HOLDOUT = [ + # TODO: separate set the mutator never sees + ] + + def evaluate(self, organism: MyOrganism) -> EvaluationResult: + train_fails: list[MyFailureCase] = [] + hold_fails: list[MyFailureCase] = [] + for i, (inp, expected) in enumerate(self.TRAINABLE): + actual = organism.run(inp) + if actual != expected: + train_fails.append(MyFailureCase( + input=inp, expected=expected, actual=actual, + data_point_id=f"trainable_{i}", + )) + for i, (inp, expected) in enumerate(self.HOLDOUT): + actual = organism.run(inp) + if actual != expected: + hold_fails.append(MyFailureCase( + input=inp, expected=expected, actual=actual, + data_point_id=f"holdout_{i}", + )) + n_total = len(self.TRAINABLE) + len(self.HOLDOUT) + n_ok = n_total - len(train_fails) - len(hold_fails) + return EvaluationResult( + score=n_ok / n_total if n_total else 0.0, + trainable_failure_cases=train_fails, + holdout_failure_cases=hold_fails, + # Always-viable. The evolver only blocks completely-broken organisms; + # a 0-score organism is fine and will simply be sampled less often. + is_viable=True, + ) + + +# --------------------------------------------------------------------------- +# 3. MUTATOR — LLM proposes an improved organism from a failure case. +# --------------------------------------------------------------------------- +class MyMutator(Mutator[MyOrganism, MyFailureCase]): + PROMPT = """ +The current artifact is: +``` +{artifact} +``` + +On this input: +``` +{input} +``` +it produced: +``` +{actual} +``` +but we wanted: +``` +{expected} +``` + +Diagnose what went wrong, then propose an improved version of the artifact. +Put the new version in the LAST triple-backtick block of your response. +""".strip() + + def mutate( + self, + organism: MyOrganism, + failure_cases: list[MyFailureCase], + learning_log_entries: list[LearningLogEntry], + ) -> list[MyOrganism]: + fc = failure_cases[0] + prompt = self.PROMPT.format( + artifact=organism.artifact, + input=fc.input, + actual=fc.actual, + expected=fc.expected, + ) + resp = _prompt_llm(prompt) + parts = resp.split("```") + if len(parts) < 3: + return [] + new_artifact = parts[-2].strip() + # Strip an opening language tag like "python\n" or "sql\n" + if "\n" in new_artifact: + first_line, rest = new_artifact.split("\n", 1) + if first_line and not first_line.startswith(" ") and len(first_line) < 20: + new_artifact = rest + return [MyOrganism(artifact=new_artifact)] + + +# --------------------------------------------------------------------------- +# Driver — fills in the EvolveProblemLoop boilerplate. You shouldn't need to +# touch anything below this line for a typical run. +# --------------------------------------------------------------------------- +def make_problem() -> Problem: + initial = MyOrganism(artifact="TODO: starting artifact here") # TODO + return Problem[MyOrganism, EvaluationResult, MyFailureCase]( + evaluator=MyEvaluator(), + mutators=[MyMutator()], + initial_organism=initial, + ) + + +def main() -> int: + ap = argparse.ArgumentParser() + register_hyperparameter_args(ap.add_argument_group("hyperparameters")) + ap.add_argument("--num_iterations", type=int, default=3) + ap.add_argument("--mutator_concurrency", type=int, default=2) + ap.add_argument("--evaluator_concurrency", type=int, default=2) + ap.add_argument("--output_dir", type=str, required=True) + args = ap.parse_args() + + out = Path(args.output_dir) + out.mkdir(parents=True, exist_ok=True) + (out / "snapshots").mkdir(exist_ok=True) + + hp = build_hyperparameter_config_from_args(args) + loop = EvolveProblemLoop( + problem=make_problem(), + learning_log_view_type=parse_learning_log_view_type(hp.learning_log_view_type), + num_parents_per_iteration=hp.num_parents_per_iteration, + mutator_concurrency=args.mutator_concurrency, + evaluator_concurrency=args.evaluator_concurrency, + fixed_midpoint_score=hp.fixed_midpoint_score, + midpoint_score_percentile=hp.midpoint_score_percentile, + sharpness=hp.sharpness, + novelty_weight=hp.novelty_weight, + batch_size=hp.batch_size, + should_verify_mutations=hp.verify_mutations, + ) + + print("Evaluating initial organism...") + for snap in loop.run(num_iterations=args.num_iterations): + (out / "snapshots" / f"iteration_{snap.iteration}.pkl").write_bytes(snap.snapshot) + _, best = snap.best_organism_result + print(f"iter={snap.iteration} pop={snap.population_size} best_score={best.score:.3f}") + + print(f"\nDone. Results in: {out}") + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/optional-skills/research/osint-investigation/SKILL.md b/optional-skills/research/osint-investigation/SKILL.md new file mode 100644 index 00000000000..b2da82fbd00 --- /dev/null +++ b/optional-skills/research/osint-investigation/SKILL.md @@ -0,0 +1,277 @@ +--- +name: osint-investigation +description: Public-records OSINT investigation framework — SEC EDGAR filings, USAspending contracts, Senate lobbying, OFAC sanctions, ICIJ offshore leaks, NYC property records (ACRIS), OpenCorporates registries, CourtListener court records, Wayback Machine archives, Wikipedia + Wikidata, GDELT news monitoring. Entity resolution across sources, cross-link analysis, timing correlation, evidence chains. Python stdlib only. +version: 0.1.0 +platforms: [linux, macos, windows] +author: Hermes Agent (adapted from ShinMegamiBoson/OpenPlanter, MIT) +metadata: + hermes: + tags: [osint, investigation, public-records, sec, sanctions, corporate-registry, property, courts, due-diligence, journalism] + category: research + related_skills: [domain-intel, arxiv] +--- + +# OSINT Investigation — Public Records Cross-Reference + +Investigative framework for public-records OSINT: government contracts, +corporate filings, lobbying, sanctions, offshore leaks, property records, +court records, web archives, knowledge bases, and global news. Resolve +entities across heterogeneous sources, build cross-links with explicit +confidence, run statistical timing tests, and produce structured evidence +chains. + +**Python stdlib only.** Zero install. Works on Linux, macOS, Windows. Most +sources work with no API key (OpenCorporates has an optional free token +that raises rate limits). + +Adapted from the MIT-licensed ShinMegamiBoson/OpenPlanter project; expanded +to cover identity / property / litigation / archives / news sources that +the original didn't address. + +## When to use this skill + +Use when the user asks for: + +- "follow the money" — government contracts, lobbying → legislation, sanctions +- corporate due diligence — who controls company X, where are they + incorporated, who serves on their boards, what filings have they made +- sanctions screening — is entity X on OFAC SDN, ICIJ offshore leaks +- pay-to-play investigation — contractors with offshore ties, lobbying + clients winning awards +- property ownership — find recorded deeds/mortgages by name or address + (NYC; for other counties point users at the relevant recorder) +- litigation history — find federal + state court opinions and PACER dockets +- multi-source entity resolution where naming varies (LLC suffixes, abbreviations) +- evidence-chain construction with explicit confidence levels +- "what's been said about X" — international news (GDELT) + Wikipedia + narrative + Wayback Machine to recover dead URLs + +Do NOT use this skill for: + +- general web research → `web_search` / `web_extract` +- domain/infrastructure OSINT → `domain-intel` skill +- academic literature → `arxiv` skill +- social-media profile discovery → `sherlock` skill (optional) +- US **federal** campaign finance — FEC is intentionally NOT covered here + (the API is unreliable for ad-hoc contributor-name queries on the free + DEMO_KEY tier). For federal donations, point users at + https://www.fec.gov/data/ directly. + +## Workflow + +The agent runs scripts via the `terminal` tool. `SKILL_DIR` is the directory +holding this SKILL.md. + +### 1. Identify which sources apply + +Read the data-source wiki entries to plan the investigation: + +``` +ls SKILL_DIR/references/sources/ + +# Federal financial / regulatory +cat SKILL_DIR/references/sources/sec-edgar.md # corporate filings +cat SKILL_DIR/references/sources/usaspending.md # federal contracts +cat SKILL_DIR/references/sources/senate-ld.md # lobbying +cat SKILL_DIR/references/sources/ofac-sdn.md # sanctions +cat SKILL_DIR/references/sources/icij-offshore.md # offshore leaks + +# Identity / property / litigation / archives / news +cat SKILL_DIR/references/sources/nyc-acris.md # NYC property records +cat SKILL_DIR/references/sources/opencorporates.md # global corporate registry +cat SKILL_DIR/references/sources/courtlistener.md # court records (federal + state) +cat SKILL_DIR/references/sources/wayback.md # Wayback Machine archives +cat SKILL_DIR/references/sources/wikipedia.md # Wikipedia + Wikidata +cat SKILL_DIR/references/sources/gdelt.md # global news monitoring +``` + +Each entry follows a 9-section template: summary, access, schema, coverage, +cross-reference keys, data quality, acquisition, legal, references. + +The **cross-reference potential** section maps join keys between sources — read +those first to pick the right pair. + +### 2. Acquire data + +Each source has a stdlib-only fetch script in `SKILL_DIR/scripts/`: + +**Federal financial / regulatory** + +```bash +# SEC EDGAR filings (corporate disclosures) +python3 SKILL_DIR/scripts/fetch_sec_edgar.py --cik 0000320193 \ + --types 10-K,10-Q --out data/edgar_filings.csv + +# USAspending federal contracts +python3 SKILL_DIR/scripts/fetch_usaspending.py --recipient "EXAMPLE CORP" \ + --fy 2024 --out data/contracts.csv + +# Senate LD-1 / LD-2 lobbying disclosures +python3 SKILL_DIR/scripts/fetch_senate_ld.py --client "EXAMPLE CORP" \ + --year 2024 --out data/lobbying.csv + +# OFAC SDN sanctions list (full snapshot) +python3 SKILL_DIR/scripts/fetch_ofac_sdn.py --out data/ofac_sdn.csv + +# ICIJ Offshore Leaks — downloads ~70 MB bulk CSV on first use, +# then searches it locally. Cached for 30 days under +# $HERMES_OSINT_CACHE/icij/ (default: ~/.cache/hermes-osint/icij/). +python3 SKILL_DIR/scripts/fetch_icij_offshore.py --entity "EXAMPLE CORP" \ + --out data/icij.csv +``` + +**Identity / property / litigation / archives / news** + +```bash +# NYC property records (deeds, mortgages, liens) — ACRIS via Socrata +python3 SKILL_DIR/scripts/fetch_nyc_acris.py --name "SMITH, JOHN" \ + --out data/acris.csv +python3 SKILL_DIR/scripts/fetch_nyc_acris.py --address "571 HUDSON" \ + --out data/acris_addr.csv + +# OpenCorporates — 130+ jurisdiction corporate registry +# (free token required; set OPENCORPORATES_API_TOKEN or pass --token) +python3 SKILL_DIR/scripts/fetch_opencorporates.py --query "Example Corp" \ + --jurisdiction us_ny --out data/opencorporates.csv + +# CourtListener — federal + state court opinions, PACER dockets +python3 SKILL_DIR/scripts/fetch_courtlistener.py --query "Smith v. Example Corp" \ + --type opinions --out data/courts.csv + +# Wayback Machine — historical web captures +python3 SKILL_DIR/scripts/fetch_wayback.py --url "example.com" \ + --match host --collapse digest --out data/wayback.csv + +# Wikipedia + Wikidata — narrative bio + structured facts +# Set HERMES_OSINT_UA=your-app/1.0 (your@email) to identify yourself +python3 SKILL_DIR/scripts/fetch_wikipedia.py --query "Bill Gates" \ + --out data/wp.csv + +# GDELT — global news in 100+ languages, ~2015→present +python3 SKILL_DIR/scripts/fetch_gdelt.py --query '"Example Corp"' \ + --timespan 1y --out data/gdelt.csv +``` + +All outputs are normalized CSV with a header row. Re-run scripts idempotently. + +When a private individual won't be in a source (e.g. SEC EDGAR for a non-public- +company person, USAspending for someone who isn't a federal contractor, Senate +LDA for someone who isn't a lobbying client), the script returns 0 rows with a +clear warning rather than silently writing an empty CSV. EDGAR specifically +flags when the company-name resolver matched an individual Form 3/4/5 filer +rather than a corporate registrant. + +Rate-limit notes are in each source's wiki entry. Default fetchers sleep +politely between paginated requests. **API keys raise rate limits** for +sources that support them (`SEC_USER_AGENT`, `SENATE_LDA_TOKEN`, +`OPENCORPORATES_API_TOKEN`, `COURTLISTENER_TOKEN`). All scripts surface +429 responses immediately with the upstream's quota message so the user +knows to slow down or supply a key. + +### 3. Resolve entities across sources + +Normalize names and find matches between two CSV files: + +```bash +# Match lobbying clients (Senate LDA) against contract recipients (USAspending) +python3 SKILL_DIR/scripts/entity_resolution.py \ + --left data/lobbying.csv --left-name-col client_name \ + --right data/contracts.csv --right-name-col recipient_name \ + --out data/cross_links.csv +``` + +Three matching tiers with explicit confidence: + +| Tier | Method | Confidence | +|------|--------|------------| +| `exact` | Normalized strings equal after suffix/punctuation strip | high | +| `fuzzy` | Sorted-token equality (word-bag match) | medium | +| `token_overlap` | ≥60% token overlap, ≥2 shared tokens, tokens ≥4 chars | low | + +Output `cross_links.csv` columns: `match_type, confidence, left_name, +right_name, left_normalized, right_normalized, left_row, right_row`. + +### 4. Statistical timing correlation (optional) + +Test whether two time series cluster suspiciously close together — e.g. +lobbying filings near contract awards — using a permutation test: + +```bash +python3 SKILL_DIR/scripts/timing_analysis.py \ + --donations data/lobbying.csv --donation-date-col filing_date \ + --donation-amount-col income --donation-donor-col client_name \ + --donation-recipient-col registrant_name \ + --contracts data/contracts.csv --contract-date-col award_date \ + --contract-vendor-col recipient_name \ + --cross-links data/cross_links.csv \ + --permutations 1000 \ + --out data/timing.json +``` + +The script's column flags are intentionally generic — the original tool was +written for donations vs awards, but it works for any (event, payee) time +series joined through cross-links. Null hypothesis: event timing is +independent of award dates. One-tailed p-value = fraction of permutations +with mean nearest-award distance ≤ observed. Minimum 3 events per (payer, +vendor) pair to run the test. + +### 5. Build the findings JSON (evidence chain) + +```bash +python3 SKILL_DIR/scripts/build_findings.py \ + --cross-links data/cross_links.csv \ + --timing data/timing.json \ + --out data/findings.json +``` + +Every finding has `id, title, severity, confidence, summary, evidence[], sources[]`. +Each evidence item points back to a specific row in a source CSV. The user (or a +follow-up agent) can verify every claim against its source. + +## Confidence and evidence discipline + +This is the load-bearing rule of the skill. Tell the user: + +- Every claim must trace to a record. No naked assertions. +- Confidence tier travels with the claim. `match_type=fuzzy` is "probable", + not "confirmed." +- Entity resolution produces candidates, NOT conclusions. A `fuzzy` match + between "ACME LLC" and "Acme Holdings Group" is a lead, not a fact. +- Statistical significance ≠ wrongdoing. p < 0.05 means the timing pattern + is unlikely under the null. It does not establish corruption. +- All data sources here are public records. They may still contain + inaccuracies, stale info, or redactions (GDPR, sealed records). + +## Adding a new data source + +Use the template: + +```bash +cp SKILL_DIR/templates/source-template.md \ + SKILL_DIR/references/sources/.md +``` + +Fill in all 9 sections. Write a `fetch_.py` script in `scripts/` that +uses stdlib only and writes a normalized CSV. Update the source list in the +"When to use" section above. + +## Tools and their limits + +- `entity_resolution.py` does NOT use external fuzzy libraries (no rapidfuzz, + no jellyfish). Token-bag matching is the upper bound here. If you need + Levenshtein, transliteration, or phonetic matching, pip-install separately. +- `timing_analysis.py` uses Python's `random` for permutations. For + reproducibility, pass `--seed N`. +- `fetch_*.py` scripts use `urllib.request` and respect `Retry-After`. Heavy + bulk usage may still violate ToS — read each source's legal section first. + +## Legal note + +All Phase-1 sources are public records. Bulk acquisition is permitted under +their respective access terms (FOIA, public records law, ICIJ explicit +publication, OFAC public data). However: + +- Some sources rate-limit aggressively. Respect their headers. +- Some redact registrant info (GDPR on WHOIS, sealed filings). +- Cross-referencing public records to identify private individuals can have + ethical implications. The skill produces evidence chains, not accusations. diff --git a/optional-skills/research/osint-investigation/references/sources/courtlistener.md b/optional-skills/research/osint-investigation/references/sources/courtlistener.md new file mode 100644 index 00000000000..0365b2ba0b1 --- /dev/null +++ b/optional-skills/research/osint-investigation/references/sources/courtlistener.md @@ -0,0 +1,98 @@ +# CourtListener — Free Law Project + +## 1. Summary + +CourtListener (Free Law Project) aggregates court opinions, dockets, oral +arguments, and judge data. Covers ~10M federal and state court opinions +back to colonial America, plus PACER docket data from RECAP submissions. + +## 2. Access Methods + +- **REST API v4:** `https://www.courtlistener.com/api/rest/v4/` +- **Auth:** Anonymous reads allowed on most endpoints; token raises rate + limits and unlocks bulk export +- **Rate limit:** ~5,000 req/hour unauthenticated for search; higher with token + +Set `COURTLISTENER_TOKEN` env var. Get a free token at +https://www.courtlistener.com/sign-in/ then create an API key. + +## 3. Data Schema + +Key fields emitted by `fetch_courtlistener.py`: + +| Column | Type | Description | +|--------|------|-------------| +| `case_name` | str | Case name | +| `court` | str | Court name | +| `court_id` | str | Court ID (e.g. `nysd`, `scotus`, `ca9`) | +| `date_filed` | str | YYYY-MM-DD | +| `docket_number` | str | Court docket number | +| `judge` | str | Judge name(s) | +| `citation` | str | Reporter citation(s) | +| `result_type` | str | opinions / dockets / oral / people | +| `snippet` | str | Search-match snippet (up to 500 chars) | +| `absolute_url` | str | Direct CourtListener URL | + +## 4. Coverage + +- Federal: all circuit and district courts, SCOTUS +- State: all 50 state supreme/appellate courts, many trial courts +- Opinions: ~10M back to 1600s (colonial), full coverage 1950 → present +- Dockets via RECAP: ~3M+ from user-submitted PACER PDFs +- Updated continuously + +## 5. Cross-Reference Potential + +- **OpenCorporates** ↔ `case_name` (corporate litigation) +- **SEC EDGAR** ↔ `case_name` (securities class actions) +- **OFAC SDN** ↔ `case_name` (sanctions-related civil/criminal cases) + +Join key: party name from `case_name`. Note: `case_name` often abbreviates +("Smith v. Jones" rather than full party names) — use the full case URL +to get all parties. + +## 6. Data Quality + +- Older opinions (pre-1990) often lack docket numbers and judges +- State coverage is more uneven than federal +- PACER docket coverage depends on RECAP user submissions — not exhaustive +- Sealed documents are excluded +- Party names in case captions don't always match filing names exactly + +## 7. Acquisition Script + +Path: `scripts/fetch_courtlistener.py` + +```bash +# Search opinions for a party / keyword +python3 SKILL_DIR/scripts/fetch_courtlistener.py --query "Example Corp" \ + --out data/cl.csv + +# PACER dockets (best for recent litigation) +python3 SKILL_DIR/scripts/fetch_courtlistener.py --query "Example Corp" \ + --type dockets --out data/cl_dockets.csv + +# Restrict to a court +python3 SKILL_DIR/scripts/fetch_courtlistener.py --query "Microsoft" \ + --court ca9 --out data/cl_9th.csv + +# Date range +python3 SKILL_DIR/scripts/fetch_courtlistener.py --query "Example Corp" \ + --date-from 2020-01-01 --date-to 2024-12-31 --out data/cl.csv +``` + +Pass `--token` or set `COURTLISTENER_TOKEN`. + +## 8. Legal & Licensing + +- Court opinions are public domain +- Free Law Project provides the data under CC0 / public domain dedication +- No commercial use restrictions on opinion text or metadata +- Some PACER PDFs have copyright on layout (not text) — fair use applies + +## 9. References + +- API docs: https://www.courtlistener.com/help/api/rest/ +- Court IDs: https://www.courtlistener.com/api/jurisdictions/ +- RECAP archive: https://www.courtlistener.com/recap/ +- Bulk data: https://www.courtlistener.com/help/api/bulk-data/ diff --git a/optional-skills/research/osint-investigation/references/sources/gdelt.md b/optional-skills/research/osint-investigation/references/sources/gdelt.md new file mode 100644 index 00000000000..785c171a0c9 --- /dev/null +++ b/optional-skills/research/osint-investigation/references/sources/gdelt.md @@ -0,0 +1,104 @@ +# GDELT — Global News Monitoring + +## 1. Summary + +GDELT (Global Database of Events, Language, and Tone) monitors world news +in 100+ languages with full-text indexing. Updated every 15 minutes. +~2015 → present, ~1B+ articles indexed. Free anonymous access. + +GDELT is wider than Google News (more international, more long-tail +sources) and indexed by tone/sentiment, themes (CAMEO codes), people, and +organizations. + +## 2. Access Methods + +- **DOC 2.0 API:** `https://api.gdeltproject.org/api/v2/doc/doc` +- **Events / GKG 2.0:** `https://api.gdeltproject.org/api/v2/events/events` +- **Auth:** None +- **Rate limit:** **1 request per 5 seconds** for the DOC API — strict + +The fetch script automatically retries after a 6-second sleep when a +429 is received. + +## 3. Data Schema + +Key fields emitted by `fetch_gdelt.py`: + +| Column | Type | Description | +|--------|------|-------------| +| `title` | str | Article title | +| `url` | str | Article URL | +| `seen_date` | str | When GDELT first saw the article (UTC) | +| `domain` | str | Publisher domain | +| `language` | str | Source language | +| `source_country` | str | 2-letter country code | +| `tone` | str | GDELT-computed tone score (negative = negative coverage) | +| `social_image` | str | Open Graph image URL when available | + +## 4. Coverage + +- Worldwide news in 100+ languages +- ~2015 → present (Events back to 1979 via a separate stream) +- Update frequency: 15 minutes +- Bias: heavily Anglophone in volume but very wide source list overall + +## 5. Cross-Reference Potential + +- **All sources** ↔ `title` / `url` (news context for any subject) +- **Wikipedia** ↔ event timeline for notable entities +- **Wayback Machine** ↔ recover articles whose URLs have died +- **OFAC SDN** ↔ news context for sanctions designations +- **SEC EDGAR** ↔ news context for 8-K material events + +Join key: entity name appearing in article title or full-text. GDELT also +extracts named entities into a separate stream (GKG) not exposed by this +fetcher — query GDELT directly for entity-level filtering. + +## 6. Data Quality + +- Title extraction is automated and can be wrong (sometimes captures the + site name + delimiter + article title; sometimes a generic page title) +- Sentiment / tone is computed by GDELT, not source-supplied +- Some domains are oversampled (newswires, aggregators) +- Source country is inferred from domain registration / TLD — can be + wrong for international news sites with country-neutral domains +- Article URLs can rot — pair with Wayback Machine to preserve content + +## 7. Acquisition Script + +Path: `scripts/fetch_gdelt.py` + +```bash +# Recent news mentioning an entity +python3 SKILL_DIR/scripts/fetch_gdelt.py --query "Nous Research" \ + --timespan 6m --out data/gdelt.csv + +# Phrase-exact (use double quotes inside single quotes for the shell) +python3 SKILL_DIR/scripts/fetch_gdelt.py --query '"Dillon Rolnick"' \ + --timespan 1y --out data/gdelt.csv + +# Filter to a country / language +python3 SKILL_DIR/scripts/fetch_gdelt.py --query "Microsoft" \ + --source-country US --source-lang English --out data/gdelt.csv + +# Date range +python3 SKILL_DIR/scripts/fetch_gdelt.py --query "Microsoft" \ + --start 2024-01-01 --end 2024-12-31 --out data/gdelt.csv +``` + +GDELT supports its own query operators: phrase quoting, AND/OR/NOT, +`sourcecountry:US`, `theme:ECON_BANKRUPTCY`, `tone<-5`, etc. +See https://blog.gdeltproject.org/gdelt-doc-2-0-api-debuts/ for syntax. + +## 8. Legal & Licensing + +- GDELT data is provided free for academic and journalistic use +- Article URLs link out to original publishers — copyright remains with + the publisher +- GDELT is NOT a content archive; it's a metadata index + +## 9. References + +- DOC 2.0 API: https://blog.gdeltproject.org/gdelt-doc-2-0-api-debuts/ +- Themes & query syntax: https://blog.gdeltproject.org/gkg-2-0-our-global-knowledge-graph-2-0-amazing-data-at-your-fingertips/ +- Project home: https://www.gdeltproject.org/ diff --git a/optional-skills/research/osint-investigation/references/sources/icij-offshore.md b/optional-skills/research/osint-investigation/references/sources/icij-offshore.md new file mode 100644 index 00000000000..99e2abcb24b --- /dev/null +++ b/optional-skills/research/osint-investigation/references/sources/icij-offshore.md @@ -0,0 +1,104 @@ +# ICIJ Offshore Leaks Database + +## 1. Summary + +The International Consortium of Investigative Journalists (ICIJ) publishes a +combined database of offshore entities from the Panama Papers, Paradise Papers, +Pandora Papers, Bahamas Leaks, and Offshore Leaks. ~800,000+ offshore entities +with their officers, intermediaries, and addresses. + +## 2. Access Methods + +- **Bulk download (primary):** `https://offshoreleaks-data.icij.org/offshoreleaks/csv/full-oldb.LATEST.zip` (~70 MB ZIP, refreshed periodically) +- **Search UI (human):** `https://offshoreleaks.icij.org/` +- **Auth:** None +- **Note:** The previous Open Refine reconciliation endpoint at + `/reconcile` now returns 404. ICIJ has removed it. The bulk ZIP is the + remaining stable access path. The skill's `fetch_icij_offshore.py` caches + the ZIP locally (default `~/.cache/hermes-osint/icij/`, refreshes after + 30 days) and searches it offline. + +## 3. Data Schema + +Key fields emitted by `fetch_icij_offshore.py`: + +| Column | Type | Description | +|--------|------|-------------| +| `node_id` | int | ICIJ canonical node ID | +| `name` | str | Entity / officer / intermediary name | +| `node_type` | str | entity / officer / intermediary / address | +| `country_codes` | str | Semicolon-separated ISO codes | +| `countries` | str | Country names | +| `jurisdiction` | str | Offshore jurisdiction (BVI, Panama, etc.) | +| `incorporation_date` | str | YYYY-MM-DD | +| `inactivation_date` | str | YYYY-MM-DD (if struck) | +| `source` | str | Panama Papers / Paradise Papers / Pandora Papers / etc. | +| `entity_url` | str | Link to ICIJ page | +| `connections` | str | Semicolon-separated node IDs of related entities | + +## 4. Coverage + +- Worldwide offshore entity records +- Earliest records: 1970s (Bahamas Leaks). Most data 1990–2018. +- NOT updated in real-time — new leaks added when ICIJ publishes them +- ~810,000 offshore entities + ~750,000 officers + ~150,000 intermediaries + +## 5. Cross-Reference Potential + +- **SEC EDGAR** ↔ `name` (public companies with offshore arms) +- **USAspending** ↔ `name` (federal contractors with offshore structure) +- **OFAC SDN** ↔ `name` (sanctioned entities using offshore vehicles) + +Join key: normalized entity/officer name. `node_id` is canonical for cross- +referencing within ICIJ. Connections graph traversal is in-script (BFS over +`connections`). + +## 6. Data Quality + +- Offshore entity names sometimes appear in multiple leaks with slight variations +- Officers may be nominees (front persons), not beneficial owners +- Some entries have minimal info (just a name + jurisdiction) +- The connections graph is incomplete — some relationships are documented in + source materials but not in the structured database +- Inactive/struck-off entities are still included with `inactivation_date` + +## 7. Acquisition Script + +Path: `scripts/fetch_icij_offshore.py` + +```bash +# Search by entity name (case-insensitive substring across the bulk DB) +python3 SKILL_DIR/scripts/fetch_icij_offshore.py --entity "EXAMPLE CORP" \ + --out data/icij.csv + +# Search by officer (individual person) +python3 SKILL_DIR/scripts/fetch_icij_offshore.py --officer "SMITH JOHN" \ + --out data/icij.csv + +# Search by jurisdiction (filter on cached results) +python3 SKILL_DIR/scripts/fetch_icij_offshore.py --officer "SMITH" \ + --jurisdiction "BRITISH VIRGIN ISLANDS" --out data/icij_bvi.csv + +# Force a fresh download (default refresh window is 30 days) +python3 SKILL_DIR/scripts/fetch_icij_offshore.py --entity "EXAMPLE CORP" \ + --force-refresh --out data/icij.csv +``` + +First call downloads the ~70 MB ZIP under `~/.cache/hermes-osint/icij/` +(or `$HERMES_OSINT_CACHE/icij/`). Subsequent calls reuse the cache for 30 days. + +## 8. Legal & Licensing + +- Public record as published by ICIJ under explicit publication +- No copyright on the underlying facts (entity names, jurisdictions) +- ICIJ asks for attribution if used in derivative reporting +- **Ethical note**: Presence in this database does NOT imply wrongdoing. Many + offshore structures are legal. The database is a research tool, not a list of + criminals. + +## 9. References + +- Database: https://offshoreleaks.icij.org/ +- About the data: https://offshoreleaks.icij.org/pages/about +- Methodology: https://www.icij.org/investigations/panama-papers/ +- API hints: Open Refine reconciliation endpoint at `https://offshoreleaks.icij.org/reconcile` diff --git a/optional-skills/research/osint-investigation/references/sources/nyc-acris.md b/optional-skills/research/osint-investigation/references/sources/nyc-acris.md new file mode 100644 index 00000000000..4b20169bf3e --- /dev/null +++ b/optional-skills/research/osint-investigation/references/sources/nyc-acris.md @@ -0,0 +1,90 @@ +# NYC ACRIS — NYC Real Property Records + +## 1. Summary + +The Automated City Register Information System (ACRIS) is NYC's index of +recorded property documents: deeds, mortgages, satisfactions, liens, UCC +filings. Covers Manhattan, Bronx, Brooklyn, Queens, Staten Island. +Published as 4 linked Socrata datasets on the NYC Open Data portal. + +## 2. Access Methods + +- **Socrata API:** `https://data.cityofnewyork.us/resource/636b-3b5g.json` (Parties) +- **Other datasets:** `bnx9-e6tj` (Master), `8h5j-fqxa` (Legal), `uqqa-hym2` (References) +- **Auth:** None for read access (Socrata `$app_token` raises rate limits if needed) +- **Rate limit:** Generous (~1000 req/hour unauthenticated) + +## 3. Data Schema + +Key fields emitted by `fetch_nyc_acris.py` (Parties joined to Master): + +| Column | Type | Description | +|--------|------|-------------| +| `document_id` | str | ACRIS document ID | +| `name` | str | Party name as recorded (often "LAST, FIRST" but varies) | +| `party_type` | str | 1=grantor, 2=grantee, 3=other | +| `party_role` | str | Human-readable role label | +| `address_1` | str | Property or party address line 1 | +| `city`, `state`, `zip`, `country` | str | Address parts | +| `doc_type` | str | DEED, MTGE (mortgage), SAT (satisfaction), AGMT, etc. | +| `doc_date`, `recorded_date` | str | YYYY-MM-DD | +| `borough` | str | Manhattan / Bronx / Brooklyn / Queens / Staten Island | +| `amount` | str | Document amount (USD, when applicable) | +| `filing_url` | str | Direct ACRIS DocumentImageView link | + +## 4. Coverage + +- NYC 5 boroughs only — other counties have their own recorders +- 1966 → present (older filings exist on microfilm at the County Clerk) +- Updated nightly +- ~70M+ party records cumulative + +## 5. Cross-Reference Potential + +- **SEC EDGAR** ↔ `name` (insider filers with NYC property) +- **USAspending** ↔ `name` (federal contractors with NYC property) +- **Senate LDA** ↔ `name` (lobbyists / clients with NYC property) +- **ICIJ Offshore** ↔ `name` (NYC properties owned via offshore vehicles) + +Join key: normalized party name. NYC property records typically store names +as "LAST, FIRST" or full LLC names — use `entity_resolution.py`. + +## 6. Data Quality + +- Same person appears with multiple name formats over time +- LLC and trust ownership obscures beneficial owners +- Recording lag can be 2-4 weeks after closing +- Older documents have spottier address data +- Sealed records (e.g. domestic violence shelters) are excluded by law + +## 7. Acquisition Script + +Path: `scripts/fetch_nyc_acris.py` + +```bash +# By party name +python3 SKILL_DIR/scripts/fetch_nyc_acris.py --name "ROLNICK" --out data/acris.csv + +# By address (useful when you know the property but not the names) +python3 SKILL_DIR/scripts/fetch_nyc_acris.py --address "571 HUDSON" --out data/acris.csv + +# Restrict to grantees (buyers / mortgagees) +python3 SKILL_DIR/scripts/fetch_nyc_acris.py --name "ROLNICK" --party-type 2 \ + --out data/acris_buyers.csv +``` + +The script joins Parties → Master to populate doc_type, dates, borough, and +amount. Pass `--no-enrich` to skip the join (faster, fewer columns). + +## 8. Legal & Licensing + +- Public record under NYS Real Property Law and NYC Charter +- No commercial use restrictions on the data +- All ACRIS data is public information by statute + +## 9. References + +- ACRIS portal: https://a836-acris.nyc.gov/CP/ +- NYC Open Data: https://data.cityofnewyork.us/ +- Parties dataset: https://data.cityofnewyork.us/City-Government/ACRIS-Real-Property-Parties/636b-3b5g +- Document type codes: https://www1.nyc.gov/site/finance/taxes/acris.page diff --git a/optional-skills/research/osint-investigation/references/sources/ofac-sdn.md b/optional-skills/research/osint-investigation/references/sources/ofac-sdn.md new file mode 100644 index 00000000000..ab3602031f1 --- /dev/null +++ b/optional-skills/research/osint-investigation/references/sources/ofac-sdn.md @@ -0,0 +1,92 @@ +# OFAC SDN — Specially Designated Nationals List + +## 1. Summary + +The Office of Foreign Assets Control (OFAC) publishes the Specially Designated +Nationals and Blocked Persons List (SDN). US persons are generally prohibited +from dealing with individuals and entities on this list. Also published: +non-SDN consolidated lists (BIS Denied Persons, FSE, etc.). + +## 2. Access Methods + +- **Full XML:** `https://www.treasury.gov/ofac/downloads/sdn.xml` +- **Delimited:** `https://www.treasury.gov/ofac/downloads/sdn.csv` +- **Consolidated:** `https://www.treasury.gov/ofac/downloads/consolidated/consolidated.xml` +- **Auth:** None +- **Rate limit:** None (static file downloads). Updated continuously. + +## 3. Data Schema + +Key fields emitted by `fetch_ofac_sdn.py`: + +| Column | Type | Description | +|--------|------|-------------| +| `entity_id` | int | OFAC unique ID | +| `name` | str | Primary name | +| `entity_type` | str | individual / entity / vessel / aircraft | +| `program_list` | str | Semicolon-separated sanctions programs (e.g. SDGT;IRAN) | +| `title` | str | For individuals: title/role | +| `nationalities` | str | Semicolon-separated country codes | +| `aka_list` | str | Semicolon-separated "also known as" names | +| `addresses` | str | Semicolon-separated known addresses | +| `dob` | str | Date of birth (individuals) | +| `pob` | str | Place of birth (individuals) | +| `remarks` | str | OFAC's free-text remarks | +| `last_updated` | str | YYYY-MM-DD (publication date) | + +## 4. Coverage + +- Worldwide — all entities sanctioned by US Treasury +- ~10,000 entries on SDN, ~15,000 on consolidated lists +- Updated continuously (sometimes daily during active enforcement) +- Includes AKAs (very common, can be 10+ per entity) + +## 5. Cross-Reference Potential + +- **SEC EDGAR** ↔ `name` (public companies sanctioned) +- **USAspending** ↔ `name` (sanctioned entity as federal contractor — should + be impossible but verify) +- **ICIJ Offshore** ↔ `name` (offshore entities also sanctioned) + +Join key: normalized name. **CRITICAL**: must match against `aka_list` too. +Many sanctioned entities are caught only via aliases. + +## 6. Data Quality + +- Names are transliterated from many scripts — multiple romanizations possible +- AKAs often differ wildly from primary name +- Some entries have minimal info (no DOB, no address) for individuals +- Free-text `remarks` contain critical context — read them +- "Specially Designated Global Terrorists" (SDGT) and "Cyber-related" (CYBER2) + programs add and remove entries frequently + +## 7. Acquisition Script + +Path: `scripts/fetch_ofac_sdn.py` + +```bash +# Full snapshot +python3 SKILL_DIR/scripts/fetch_ofac_sdn.py --out data/ofac_sdn.csv + +# Filter to specific program +python3 SKILL_DIR/scripts/fetch_ofac_sdn.py --program SDGT --out data/sdn_sdgt.csv + +# Entities only (skip individuals, vessels, aircraft) +python3 SKILL_DIR/scripts/fetch_ofac_sdn.py --entity-type entity --out data/sdn_entities.csv +``` + +## 8. Legal & Licensing + +- Public record under Executive Order authority and statutory sanctions programs +- US persons MUST screen against this list — it is enforced +- No restrictions on the data itself; restrictions are on transactions with + the listed entities +- ZERO penalty for "over-matching" — false positives must be cleared but are not + prohibited + +## 9. References + +- OFAC home: https://ofac.treasury.gov/ +- SDN list: https://ofac.treasury.gov/specially-designated-nationals-and-blocked-persons-list-sdn-human-readable-lists +- Data formats: https://ofac.treasury.gov/sdn-list/sanctions-list-search-tool +- Compliance guidance: https://ofac.treasury.gov/recent-actions diff --git a/optional-skills/research/osint-investigation/references/sources/opencorporates.md b/optional-skills/research/osint-investigation/references/sources/opencorporates.md new file mode 100644 index 00000000000..0bd190a2f49 --- /dev/null +++ b/optional-skills/research/osint-investigation/references/sources/opencorporates.md @@ -0,0 +1,103 @@ +# OpenCorporates — Global Corporate Registry + +## 1. Summary + +OpenCorporates aggregates corporate registry data from 130+ jurisdictions +worldwide (~200M companies). Covers US state-level filings (NY DOS, Delaware +DOC, California SOS, etc.), UK Companies House, EU registries, and most +common-law jurisdictions. + +## 2. Access Methods + +- **REST API:** `https://api.opencorporates.com/v0.4/` +- **HTML fallback:** `https://opencorporates.com/companies?q=...` +- **Auth:** API token required (free tier 500 calls/month, paid plans available) +- **Rate limit:** Token-bound; un-tokened requests return 401 + +Set `OPENCORPORATES_API_TOKEN` env var. Get a free token at +https://opencorporates.com/api_accounts/new. + +## 3. Data Schema + +Key fields emitted by `fetch_opencorporates.py`: + +| Column | Type | Description | +|--------|------|-------------| +| `name` | str | Company legal name | +| `company_number` | str | Registry-assigned number | +| `jurisdiction_code` | str | e.g. `us_ny`, `us_de`, `gb` | +| `jurisdiction_name` | str | Human-readable jurisdiction | +| `incorporation_date` | str | YYYY-MM-DD | +| `dissolution_date` | str | YYYY-MM-DD (empty if active) | +| `company_type` | str | Domestic LLC / Foreign Corp / etc. | +| `status` | str | Active / Inactive / Dissolved | +| `registered_address` | str | Registered office address | +| `opencorporates_url` | str | Link to OpenCorporates entity page | +| `officers_count` | str | Total officers on record | +| `source` | str | `api`, `html`, or `html-fallback` | + +## 4. Coverage + +- US: all 50 states + DC at state level (LLCs, corps, LPs) +- International: UK, EU, Canada, Australia, NZ, many APAC + LATAM jurisdictions +- ~200M company records cumulative +- Update frequency varies by jurisdiction (UK CH is near-realtime; some + state registries lag months) + +## 5. Cross-Reference Potential + +- **NYC ACRIS** ↔ `name` (LLC/corp owners of NYC property) +- **USAspending** ↔ `name` (corporate federal contractors) +- **SEC EDGAR** ↔ `name` (public companies + their subsidiaries) +- **ICIJ Offshore** ↔ `name` (international corporate structures) + +Join key: normalized company name. Some entries have `previous_names` arrays +which are not currently exported by the fetch script — query OC directly +for that. + +## 6. Data Quality + +- Company-name spellings vary across re-incorporations and renames +- Officer records are spottier than company records (many jurisdictions + don't require officer disclosure) +- Beneficial-ownership data is generally NOT here — most jurisdictions + don't require it. UK Companies House has PSC (people with significant + control) but that's not universal. +- Cross-jurisdictional links (parent / subsidiary) are based on registry + filings only; corporate trees are often incomplete + +## 7. Acquisition Script + +Path: `scripts/fetch_opencorporates.py` + +```bash +# Search globally by name +python3 SKILL_DIR/scripts/fetch_opencorporates.py --query "Example Corp" \ + --out data/oc.csv + +# Restrict to a jurisdiction +python3 SKILL_DIR/scripts/fetch_opencorporates.py --query "Example Corp" \ + --jurisdiction us_ny --out data/oc_ny.csv + +# Set token via env or flag +OPENCORPORATES_API_TOKEN=xxx python3 SKILL_DIR/scripts/fetch_opencorporates.py \ + --query "Microsoft" --out data/oc.csv +``` + +Without a token the script falls back to scraping the HTML search page. +The fallback is brittle and only fills in `name`, `jurisdiction_code`, +`opencorporates_url` — set the token for serious work. + +## 8. Legal & Licensing + +- OpenCorporates aggregates public records — the underlying facts are + public domain +- OpenCorporates own database is licensed CC-BY-SA-4.0; attribution required +- API ToS prohibits redistributing the full dataset; per-record reference + is fine + +## 9. References + +- API docs: https://api.opencorporates.com/documentation/API-Reference +- Jurisdiction codes: https://api.opencorporates.com/v0.4/jurisdictions.json +- Schema: https://opencorporates.com/info/our_data diff --git a/optional-skills/research/osint-investigation/references/sources/sec-edgar.md b/optional-skills/research/osint-investigation/references/sources/sec-edgar.md new file mode 100644 index 00000000000..55a33d70258 --- /dev/null +++ b/optional-skills/research/osint-investigation/references/sources/sec-edgar.md @@ -0,0 +1,83 @@ +# SEC EDGAR — Corporate Filings + +## 1. Summary + +EDGAR (Electronic Data Gathering, Analysis, and Retrieval) is the SEC's system +for corporate disclosure filings: 10-K (annual), 10-Q (quarterly), 8-K (current +events), DEF 14A (proxy), Form 4 (insider trading), 13F (institutional holdings). + +## 2. Access Methods + +- **API:** `https://data.sec.gov/submissions/CIK<10-digit-padded>.json` (no auth) +- **Filing index:** `https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=...` +- **Full-text search:** `https://efts.sec.gov/LATEST/search-index?q=...` +- **Auth:** None — requires `User-Agent` header with contact info per SEC policy +- **Rate limit:** 10 requests/second per IP (enforced) + +## 3. Data Schema + +Key fields emitted by `fetch_sec_edgar.py` (filings index): + +| Column | Type | Description | +|--------|------|-------------| +| `cik` | str | Central Index Key (10-digit padded) | +| `company_name` | str | Registrant name | +| `form_type` | str | 10-K, 10-Q, 8-K, etc. | +| `filing_date` | str | YYYY-MM-DD | +| `accession_number` | str | Filing accession (e.g. 0000320193-24-000123) | +| `primary_document` | str | Filename of main document | +| `filing_url` | str | Direct URL to filing index | +| `reporting_period` | str | Period of report (where applicable) | + +## 4. Coverage + +- All public US registrants from 1993 → present +- 1993-2000 has spotty coverage of older filings (paper-to-electronic migration) +- ~12M filings cumulative +- Updated within minutes of filing acceptance + +## 5. Cross-Reference Potential + +- **USAspending** ↔ `company_name` (public companies as federal contractors) +- **Senate LD** ↔ `company_name` (public companies hire lobbyists) +- **OFAC SDN** ↔ `company_name` (sanctions screening of public registrants) + +Join key: company name OR CIK if you have it. CIK is canonical and stable. + +## 6. Data Quality + +- Subsidiaries often filed under parent CIK — be careful with name matches +- Name changes over time (rebrands, acquisitions) — CIK remains constant +- 10-K Item 1A Risk Factors are free-form text — useful for `web_extract`-style + parsing, not structured queries +- Foreign private issuers file 20-F instead of 10-K + +## 7. Acquisition Script + +Path: `scripts/fetch_sec_edgar.py` + +```bash +# By CIK +python3 SKILL_DIR/scripts/fetch_sec_edgar.py --cik 0000320193 \ + --types 10-K,10-Q --out data/edgar_filings.csv + +# By company name (resolves to CIK first via name search) +python3 SKILL_DIR/scripts/fetch_sec_edgar.py --company "APPLE INC" \ + --types 8-K --since 2024-01-01 --out data/edgar_filings.csv +``` + +Set `SEC_USER_AGENT` env var with your contact email (SEC requirement). +Example: `SEC_USER_AGENT="Research example@example.com"`. + +## 8. Legal & Licensing + +- Public record under SEC Rule 24b-2 / 17 CFR § 230.401 +- No commercial use restrictions on filing content +- SEC asks all bulk users to include a `User-Agent` with contact info and to + respect 10 req/s — failure to do so can result in IP blocking + +## 9. References + +- Developer docs: https://www.sec.gov/edgar/sec-api-documentation +- EDGAR full-text search: https://efts.sec.gov/LATEST/search-index +- Fair access policy: https://www.sec.gov/os/accessing-edgar-data diff --git a/optional-skills/research/osint-investigation/references/sources/senate-ld.md b/optional-skills/research/osint-investigation/references/sources/senate-ld.md new file mode 100644 index 00000000000..5142dc6ea41 --- /dev/null +++ b/optional-skills/research/osint-investigation/references/sources/senate-ld.md @@ -0,0 +1,89 @@ +# Senate LD — Lobbying Disclosure (LD-1 / LD-2) + +## 1. Summary + +The Senate Office of Public Records publishes lobbying disclosures under the +Lobbying Disclosure Act of 1995 (LDA, as amended by HLOGA 2007). LD-1 is +registration of a new client-lobbyist relationship; LD-2 is the quarterly +activity report. + +## 2. Access Methods + +- **API:** `https://lda.senate.gov/api/v1/` (no auth required for read-only) +- **Bulk download:** `https://lda.senate.gov/api/v1/filings/?format=csv` (paginated) +- **Auth:** Token required for >120 req/hour — register at https://lda.senate.gov/api/auth/register/ +- **Rate limit:** 120 req/hour unauthenticated, 1,200 req/hour authenticated + +## 3. Data Schema + +Key fields emitted by `fetch_senate_ld.py`: + +| Column | Type | Description | +|--------|------|-------------| +| `filing_uuid` | str | Unique filing ID | +| `filing_type` | str | LD-1, LD-2, LD-203, etc. | +| `filing_year` | int | Year | +| `filing_period` | str | Q1/Q2/Q3/Q4 or annual | +| `registrant_name` | str | Lobbying firm or organization | +| `registrant_id` | str | Senate-assigned registrant ID | +| `client_name` | str | Client being represented | +| `client_id` | str | Senate-assigned client ID | +| `client_general_description` | str | Client industry / business | +| `income` | float | LD-2 income from client this quarter (USD) | +| `expenses` | float | LD-2 expenses (in-house lobbying) | +| `lobbyists` | str | Semicolon-separated lobbyist names | +| `issues` | str | Semicolon-separated issue areas | +| `government_entities` | str | Agencies/chambers contacted | +| `filing_date` | str | YYYY-MM-DD | + +## 4. Coverage + +- US federal lobbying only (state lobbying handled by individual state ethics offices) +- 1999 → present (full electronic coverage from 2008) +- Quarterly reporting cycle (LD-2) +- ~1M+ filings cumulative + +## 5. Cross-Reference Potential + +- **USAspending** ↔ `client_name` (clients lobbying for contracts) +- **SEC EDGAR** ↔ `client_name` (public companies as lobbying clients) +- **OFAC SDN** ↔ `client_name` (sanctions screening of lobbying clients) + +Join key: normalized client_name. registrant_id and client_id are canonical +when joining Senate-internal records. + +## 6. Data Quality + +- Many lobbyist names appear in multiple registrants over time (job changes) +- `issues` and `government_entities` are free-text — Inconsistent capitalization +- Foreign agents register under FARA (Department of Justice), NOT here +- Income/expenses are reported in $10,000 brackets in some older filings + +## 7. Acquisition Script + +Path: `scripts/fetch_senate_ld.py` + +```bash +# By client +python3 SKILL_DIR/scripts/fetch_senate_ld.py --client "EXAMPLE CORP" \ + --year 2024 --out data/lobbying.csv + +# By registrant (lobbying firm) +python3 SKILL_DIR/scripts/fetch_senate_ld.py --registrant "BIG K STREET LLP" \ + --year 2024 --out data/lobbying.csv +``` + +Set `SENATE_LDA_TOKEN` env var if you have one (or pass `--token`). +Defaults to anonymous (120 req/hour). + +## 8. Legal & Licensing + +- Public record under 2 U.S.C. § 1604 (LDA) +- No commercial use restrictions +- Reuse is unconditional — see Senate Public Records Office disclaimer + +## 9. References + +- API docs: https://lda.senate.gov/api/redoc/v1/ +- LDA guidance: https://lobbyingdisclosure.house.gov/ld_guidance.pdf +- Senate Public Records: https://lda.senate.gov/ diff --git a/optional-skills/research/osint-investigation/references/sources/usaspending.md b/optional-skills/research/osint-investigation/references/sources/usaspending.md new file mode 100644 index 00000000000..6477272293b --- /dev/null +++ b/optional-skills/research/osint-investigation/references/sources/usaspending.md @@ -0,0 +1,97 @@ +# USAspending — Federal Government Contracts and Grants + +## 1. Summary + +USAspending.gov is the official source of federal spending data. Coverage: +contracts, grants, loans, direct payments, sub-awards. Required by the DATA Act +of 2014 — all federal agencies must report to a single schema. + +## 2. Access Methods + +- **API v2:** `https://api.usaspending.gov/api/v2/` (no auth, no key) +- **Bulk:** `https://files.usaspending.gov/` (CSV / Parquet by award type) +- **Auth:** None +- **Rate limit:** Not strictly enforced, but be polite — keep to <10 req/s + +## 3. Data Schema + +Key fields emitted by `fetch_usaspending.py` (prime awards): + +| Column | Type | Description | +|--------|------|-------------| +| `award_id` | str | Federal award ID (PIID for contracts, FAIN for grants) | +| `recipient_name` | str | Awardee legal name | +| `recipient_uei` | str | Unique Entity Identifier (replaced DUNS in 2022) | +| `recipient_duns` | str | Legacy DUNS number (historical only) | +| `recipient_parent_name` | str | Ultimate parent organization | +| `recipient_state` | str | Recipient state | +| `awarding_agency` | str | Department / agency name | +| `awarding_sub_agency` | str | Sub-tier (e.g. DoD → Army) | +| `award_type` | str | Contract / Grant / Loan / Direct Payment | +| `award_amount` | float | Current total obligation in USD | +| `award_date` | str | Action / signed date YYYY-MM-DD | +| `period_of_performance_start` | str | YYYY-MM-DD | +| `period_of_performance_end` | str | YYYY-MM-DD | +| `naics_code` | str | Industry classification | +| `psc_code` | str | Product / Service Code | +| `competition_extent` | str | Full / limited / sole-source | +| `description` | str | Award description (free-text) | + +## 4. Coverage + +- US federal awards only (state/local not included) +- FY 2008 → present (full coverage from FY 2017) +- Updated bi-weekly from agency reporting +- ~100M+ transaction records cumulative + +## 5. Cross-Reference Potential + +- **SEC EDGAR** ↔ `recipient_name` (public companies as contractors) +- **Senate LD** ↔ `recipient_name` (lobbying clients winning contracts) +- **OFAC SDN** ↔ `recipient_name` (sanctions screening of contractors — must be + filtered out by SAM.gov but verify) +- **ICIJ Offshore** ↔ `recipient_name` (offshore-linked contractors) + +Join key: normalized recipient name. UEI is canonical when present. + +## 6. Data Quality + +- DUNS → UEI transition (April 2022) — old records have DUNS, new records have UEI +- Some sub-awards aren't reported (FFATA threshold is $30k) +- Award amount changes over time (mod actions) — fetch script reports current total +- `competition_extent` field is free-text in older records — `fetch_usaspending.py` + normalizes to canonical values +- Recipient name variations are extensive — "ACME LLC", "Acme L.L.C.", "ACME, INC" + all appear. Use `entity_resolution.py`. + +## 7. Acquisition Script + +Path: `scripts/fetch_usaspending.py` + +```bash +# By recipient name +python3 SKILL_DIR/scripts/fetch_usaspending.py --recipient "EXAMPLE CORP" \ + --fy 2024 --out data/contracts.csv + +# By awarding agency +python3 SKILL_DIR/scripts/fetch_usaspending.py --agency "Department of Defense" \ + --fy 2024 --out data/contracts.csv + +# Filter to sole-source only +python3 SKILL_DIR/scripts/fetch_usaspending.py --recipient "EXAMPLE CORP" \ + --fy 2024 --sole-source-only --out data/contracts.csv +``` + +## 8. Legal & Licensing + +- Public record under the Federal Funding Accountability and Transparency Act + (FFATA, 2006) and DATA Act (2014) +- No commercial use restrictions on the data +- Personal information of award recipients (e.g. small business owners' addresses + in some grants) should be handled per the source agency's privacy notice + +## 9. References + +- API docs: https://api.usaspending.gov/ +- Data dictionary: https://www.usaspending.gov/data-dictionary +- Award schema: https://files.usaspending.gov/docs/Data_Dictionary_Crosswalk.xlsx diff --git a/optional-skills/research/osint-investigation/references/sources/wayback.md b/optional-skills/research/osint-investigation/references/sources/wayback.md new file mode 100644 index 00000000000..f397c093a23 --- /dev/null +++ b/optional-skills/research/osint-investigation/references/sources/wayback.md @@ -0,0 +1,93 @@ +# Wayback Machine — Internet Archive CDX + +## 1. Summary + +The Internet Archive's Wayback Machine has captured ~900B+ web pages since +1996. The CDX server API indexes those captures by URL, timestamp, and +content hash. Free, anonymous, no auth. + +## 2. Access Methods + +- **CDX server:** `https://web.archive.org/cdx/search/cdx` +- **Wayback URL:** `https://web.archive.org/web//` +- **Save Page Now (write):** `https://web.archive.org/save/` (different API) +- **Auth:** None +- **Rate limit:** Generous; be polite (~1 req/s) + +## 3. Data Schema + +Key fields emitted by `fetch_wayback.py`: + +| Column | Type | Description | +|--------|------|-------------| +| `url` | str | Original URL captured | +| `timestamp` | str | YYYYMMDDHHMMSS (CDX format) | +| `wayback_url` | str | Direct replay URL | +| `mimetype` | str | Content-type at capture | +| `status` | str | HTTP status (typically 200) | +| `digest` | str | SHA1 of capture content (collapse-friendly) | +| `length` | str | Byte length of capture | + +## 4. Coverage + +- 1996 → present +- ~900B+ captures across ~700M domains +- Updated continuously by automated crawls + manual saves +- Some domains have aggressive coverage (news), others sparse (private) + +## 5. Cross-Reference Potential + +- **Wikipedia** ↔ Reverse-lookup pages cited as references that have since + disappeared +- **News URLs** ↔ Original article content when present-day URLs 404 +- **Corporate websites** ↔ Historical "About" pages, executive bios that + have been scrubbed + +The Wayback CDX is most useful as a **content-recovery** layer when other +sources point to URLs that no longer exist. + +## 6. Data Quality + +- robots.txt-blocked domains may have spotty or no coverage +- Captures vary in completeness (HTML may be saved without CSS/JS) +- Some content is excluded by domain owner request (DMCA, etc.) +- Coverage of "deep links" (URLs with query strings) is uneven +- Time resolution is per-capture, not continuous — gaps are common + +## 7. Acquisition Script + +Path: `scripts/fetch_wayback.py` + +```bash +# All captures of a specific URL +python3 SKILL_DIR/scripts/fetch_wayback.py --url "https://example.com/page" \ + --out data/wb.csv + +# All captures of a host +python3 SKILL_DIR/scripts/fetch_wayback.py --url "example.com" \ + --match host --out data/wb.csv + +# All captures of a domain + subdomains +python3 SKILL_DIR/scripts/fetch_wayback.py --url "example.com" \ + --match domain --out data/wb.csv + +# Only unique-content captures within a date window +python3 SKILL_DIR/scripts/fetch_wayback.py --url "example.com" \ + --match host --collapse digest \ + --from-date 2020-01-01 --to-date 2023-12-31 \ + --out data/wb.csv +``` + +## 8. Legal & Licensing + +- Internet Archive captures are made under fair-use research provisions +- Replay URLs are stable references — citing them is encouraged +- Internet Archive non-profit terms of use govern content +- Some content is rights-restricted; replay may be blocked even if the + CDX entry shows it as captured + +## 9. References + +- CDX server docs: https://github.com/internetarchive/wayback/blob/master/wayback-cdx-server/README.md +- Wayback API: https://archive.org/help/wayback_api.php +- Internet Archive: https://archive.org/ diff --git a/optional-skills/research/osint-investigation/references/sources/wikipedia.md b/optional-skills/research/osint-investigation/references/sources/wikipedia.md new file mode 100644 index 00000000000..1a004bf2e8d --- /dev/null +++ b/optional-skills/research/osint-investigation/references/sources/wikipedia.md @@ -0,0 +1,107 @@ +# Wikipedia + Wikidata + +## 1. Summary + +Wikipedia is the canonical narrative-bio source for notable people, places, +and organizations. Wikidata is its structured-data counterpart: ~110M +items, each with claims, dates, identifiers, and cross-references to +external authorities (VIAF, ISNI, ORCID, GRID, etc.). + +Together they're a high-precision entity-resolution layer — the bar for +inclusion is real, but anything past that bar is well-cross-referenced. + +## 2. Access Methods + +- **Wikipedia OpenSearch:** `https://en.wikipedia.org/w/api.php?action=opensearch` +- **Wikipedia REST summary:** `https://en.wikipedia.org/api/rest_v1/page/summary/` +- **Wikidata Action API:** `https://www.wikidata.org/w/api.php?action=wbgetentities` +- **Wikidata SPARQL:** `https://query.wikidata.org/sparql` (more powerful but aggressively rate-limited) +- **Auth:** None, but **a meaningful User-Agent is required** + +Set `HERMES_OSINT_UA` to something identifying (e.g. `your-app/1.0 (you@example.com)`). +Wikimedia returns HTTP 429 to generic UAs. + +## 3. Data Schema + +Key fields emitted by `fetch_wikipedia.py`: + +| Column | Type | Description | +|--------|------|-------------| +| `source` | str | `wikipedia` or `wikipedia+wikidata` | +| `label` | str | Wikipedia article title | +| `description` | str | Short Wikidata description | +| `qid` | str | Wikidata QID (e.g. Q2283 for Microsoft) | +| `wikipedia_title`, `wikipedia_url` | str | Article identifier + URL | +| `wikidata_url` | str | Wikidata entity URL | +| `instance_of` | str | What kind of thing it is (P31) | +| `country` | str | Country (P17 for orgs/places, P27 for people) | +| `occupation` | str | P106 | +| `employer` | str | P108 | +| `date_of_birth` | str | P569, YYYY-MM-DD | +| `place_of_birth` | str | P19 | +| `summary` | str | Wikipedia REST extract (~1000 chars) | + +The fetch script uses Wikidata's Action API (NOT SPARQL) for structured +facts — far more lenient on rate limits. + +## 4. Coverage + +- Wikipedia EN: ~7M articles +- Wikidata: ~110M items, ~1.5B statements +- Updated continuously; abuse filters and bots run constantly +- High notability bar — most private individuals are not in Wikipedia + +## 5. Cross-Reference Potential + +- **All sources** ↔ `label` (entity identity resolution) +- **SEC EDGAR** ↔ `label` (public companies) +- **CourtListener** ↔ `label` (parties to notable litigation) +- **Wikidata external identifiers** (not currently in this fetcher's output) + link to VIAF, ISNI, ORCID, GRID, GitHub, Twitter, IMDb, ... + +Join key: Wikidata QID is canonical. Wikipedia titles are stable for +most articles but can be renamed. + +## 6. Data Quality + +- Notability filter — only notable entities (criteria vary by topic) +- Recency lag — current events take days to weeks to be reflected +- POV / vandalism — moderated, but edits between sweeps can be bad +- Living-persons biographies have stricter sourcing requirements +- Wikidata claims have qualifiers and references — the fetch script + doesn't currently export them + +## 7. Acquisition Script + +Path: `scripts/fetch_wikipedia.py` + +```bash +# Look up a notable entity +python3 SKILL_DIR/scripts/fetch_wikipedia.py --query "Microsoft" --out data/wp.csv + +# A specific person +python3 SKILL_DIR/scripts/fetch_wikipedia.py --query "Bill Gates" --out data/wp_bg.csv + +# Skip the Wikidata enrichment for speed +python3 SKILL_DIR/scripts/fetch_wikipedia.py --query "Microsoft" --no-wikidata \ + --limit 5 --out data/wp.csv +``` + +The OpenSearch is fuzzy — `--limit 5` returns the top 5 Wikipedia article +matches. Each is enriched with the QID + structured facts unless +`--no-wikidata` is passed. + +## 8. Legal & Licensing + +- Wikipedia text: CC-BY-SA-3.0 / GFDL +- Wikidata claims: CC0 (public domain) +- API ToS: respect rate limits, identify your agent +- Commercial use allowed with attribution + +## 9. References + +- Wikipedia OpenSearch: https://www.mediawiki.org/wiki/API:Opensearch +- Wikipedia REST: https://en.wikipedia.org/api/rest_v1/ +- Wikidata Action API: https://www.wikidata.org/wiki/Wikidata:Data_access +- Wikidata SPARQL: https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service +- User-Agent policy: https://meta.wikimedia.org/wiki/User-Agent_policy diff --git a/optional-skills/research/osint-investigation/scripts/_http.py b/optional-skills/research/osint-investigation/scripts/_http.py new file mode 100644 index 00000000000..5da62310b9f --- /dev/null +++ b/optional-skills/research/osint-investigation/scripts/_http.py @@ -0,0 +1,82 @@ +"""Tiny stdlib HTTP helper used by fetch_*.py scripts. + +Provides polite retry + JSON convenience + User-Agent enforcement. +""" +from __future__ import annotations + +import json +import os +import time +import urllib.error +import urllib.parse +import urllib.request + +DEFAULT_UA = ( + "hermes-osint-investigation/0.2 " + "(+https://github.com/NousResearch/hermes-agent; " + "set HERMES_OSINT_UA env var to identify yourself per " + "Wikimedia / SEC fair-use guidance)" +) + + +def get( + url: str, + *, + params: dict | None = None, + headers: dict | None = None, + user_agent: str | None = None, + max_retries: int = 3, + backoff: float = 1.5, + timeout: float = 30.0, +) -> bytes: + """GET with retry on 5xx and Retry-After honoring. + + 429 (rate-limit) is raised IMMEDIATELY with a clear message — retrying + when the upstream says "you're over quota" just wastes time. The caller + should slow down or supply real credentials. + """ + if params: + sep = "&" if "?" in url else "?" + url = f"{url}{sep}{urllib.parse.urlencode(params)}" + h = {"User-Agent": user_agent or os.environ.get("HERMES_OSINT_UA", DEFAULT_UA)} + if headers: + h.update(headers) + + last_err: Exception | None = None + for attempt in range(max_retries + 1): + req = urllib.request.Request(url, headers=h) + try: + with urllib.request.urlopen(req, timeout=timeout) as resp: + return resp.read() + except urllib.error.HTTPError as e: + if e.code == 429: + # Surface immediately. Read the body so the caller sees the + # provider's actual message ("OVER_RATE_LIMIT" etc.). + try: + body = e.read(2048).decode("utf-8", errors="replace") + except Exception: # noqa: BLE001 + body = "" + raise RuntimeError( + f"HTTP 429 rate-limited by {urllib.parse.urlsplit(url).netloc}. " + f"Slow down or supply a real API key. Body: {body[:300]}" + ) from e + if e.code in (500, 502, 503, 504) and attempt < max_retries: + retry_after = e.headers.get("Retry-After") if e.headers else None + wait = float(retry_after) if (retry_after and retry_after.isdigit()) else backoff ** (attempt + 1) + time.sleep(wait) + last_err = e + continue + raise + except urllib.error.URLError as e: + if attempt < max_retries: + time.sleep(backoff ** (attempt + 1)) + last_err = e + continue + raise + if last_err: + raise last_err + raise RuntimeError("unreachable") + + +def get_json(url: str, **kwargs) -> dict | list: + return json.loads(get(url, **kwargs).decode("utf-8")) diff --git a/optional-skills/research/osint-investigation/scripts/_normalize.py b/optional-skills/research/osint-investigation/scripts/_normalize.py new file mode 100644 index 00000000000..3c9a197af8b --- /dev/null +++ b/optional-skills/research/osint-investigation/scripts/_normalize.py @@ -0,0 +1,67 @@ +"""Shared entity-name normalization helpers (stdlib-only). + +Used by entity_resolution.py and timing_analysis.py. +""" +from __future__ import annotations + +import re + +# Legal suffixes / corporate boilerplate to strip during normalization. +_SUFFIX_TOKENS = { + "INC", "INCORPORATED", "LLC", "LLP", "LP", "LTD", "LIMITED", + "CORP", "CORPORATION", "CO", "COMPANY", + "GROUP", "GRP", "HOLDINGS", "HOLDING", + "PARTNERS", "ASSOCIATES", + "INTERNATIONAL", "INTL", + "ENTERPRISES", "ENTERPRISE", + "SERVICES", "SERVICE", "SVCS", + "SOLUTIONS", "MANAGEMENT", "MGMT", "CONSULTING", + "TECHNOLOGY", "TECHNOLOGIES", "TECH", + "INDUSTRIES", "INDUSTRY", + "AMERICA", "AMERICAN", + "USA", "US", + "PLLC", "PC", + "TRUST", "FOUNDATION", +} + +_PUNCT_RE = re.compile(r"[^\w\s]") +_WS_RE = re.compile(r"\s+") + + +def normalize_name(name: str | None) -> str: + """Standard normalization: uppercase, strip suffixes, drop punctuation.""" + if not name: + return "" + s = _PUNCT_RE.sub(" ", name.upper()) + s = _WS_RE.sub(" ", s).strip() + tokens = [t for t in s.split() if t and t not in _SUFFIX_TOKENS] + return " ".join(tokens) + + +def normalize_aggressive(name: str | None) -> str: + """Aggressive normalization: sorted unique tokens (word-bag).""" + base = normalize_name(name) + if not base: + return "" + return " ".join(sorted(set(base.split()))) + + +def name_tokens(name: str | None, min_len: int = 4) -> set[str]: + """Token set used for overlap matching.""" + base = normalize_name(name) + if not base: + return set() + return {t for t in base.split() if len(t) >= min_len} + + +def token_overlap_ratio(left: str | None, right: str | None) -> tuple[float, int]: + """Return (jaccard-like ratio, shared token count) over min-len tokens.""" + a = name_tokens(left) + b = name_tokens(right) + if not a or not b: + return 0.0, 0 + shared = a & b + if not shared: + return 0.0, 0 + union = a | b + return len(shared) / len(union), len(shared) diff --git a/optional-skills/research/osint-investigation/scripts/build_findings.py b/optional-skills/research/osint-investigation/scripts/build_findings.py new file mode 100644 index 00000000000..15021eb0878 --- /dev/null +++ b/optional-skills/research/osint-investigation/scripts/build_findings.py @@ -0,0 +1,221 @@ +#!/usr/bin/env python3 +"""Build a structured findings.json with evidence chains (stdlib-only). + +Aggregates cross_links.csv (entity_resolution output) and an optional +timing.json (timing_analysis output) into a single evidence-chain document. + +Output structure: + { + "metadata": {...}, + "findings": [ + { + "id": "F0001", + "title": "...", + "severity": "HIGH|MEDIUM|LOW", + "confidence": "high|medium|low", + "summary": "...", + "evidence": [ + {"source": "cross_links.csv", "row": 12, "fields": {...}}, + ... + ], + "sources": ["cross_links.csv", "timing.json"] + } + ] + } + +Every finding traces to specific source rows. No naked claims. +""" +from __future__ import annotations + +import argparse +import csv +import json +from collections import defaultdict +from pathlib import Path + +CONFIDENCE_ORDER = {"high": 0, "medium": 1, "low": 2} +SEVERITY_ORDER = {"HIGH": 0, "MEDIUM": 1, "LOW": 2} + + +def _read_cross_links(path: str) -> list[dict[str, str]]: + with open(path, newline="", encoding="utf-8") as fh: + return list(csv.DictReader(fh)) + + +def build_findings( + cross_links_path: str, + timing_path: str | None = None, + out_path: str = "findings.json", + bundled_threshold: int = 3, +) -> dict: + findings: list[dict] = [] + next_id = 1 + + # 1. Match-based findings, grouped by (left_normalized, right_normalized). + matches = _read_cross_links(cross_links_path) + grouped: dict[tuple[str, str], list[dict[str, str]]] = defaultdict(list) + for i, row in enumerate(matches): + row["__row__"] = str(i) + grouped[(row.get("left_normalized", ""), row.get("right_normalized", ""))].append(row) + + for (left_norm, right_norm), rows in grouped.items(): + if not left_norm or not right_norm: + continue + # Use the highest-confidence match for the finding's overall confidence. + best = min(rows, key=lambda r: CONFIDENCE_ORDER.get(r.get("confidence", "low"), 2)) + finding_id = f"F{next_id:04d}" + next_id += 1 + evidence = [ + { + "source": "cross_links.csv", + "row": int(r["__row__"]), + "fields": { + "match_type": r.get("match_type", ""), + "confidence": r.get("confidence", ""), + "left_name": r.get("left_name", ""), + "right_name": r.get("right_name", ""), + "overlap_ratio": r.get("overlap_ratio", ""), + "shared_tokens": r.get("shared_tokens", ""), + }, + } + for r in rows + ] + findings.append( + { + "id": finding_id, + "title": f"Entity match: {best.get('left_name', '')} ↔ {best.get('right_name', '')}", + "severity": "MEDIUM" if best.get("confidence") == "high" else "LOW", + "confidence": best.get("confidence", "low"), + "summary": ( + f"{len(rows)} cross-link record(s) tie " + f"'{best.get('left_name', '')}' to " + f"'{best.get('right_name', '')}' " + f"(best tier: {best.get('match_type', '')})." + ), + "evidence": evidence, + "sources": ["cross_links.csv"], + } + ) + + # 2. Bundled-donations findings (if cross_links carries donor↔candidate pattern). + # Heuristic: many distinct left names sharing the same right name. + by_right: dict[str, set[str]] = defaultdict(set) + by_right_rows: dict[str, list[dict[str, str]]] = defaultdict(list) + for r in matches: + right = r.get("right_normalized", "") + left_raw = r.get("left_name", "").strip() + if right and left_raw: + by_right[right].add(left_raw) + by_right_rows[right].append(r) + for right_norm, lefts in by_right.items(): + if len(lefts) < bundled_threshold: + continue + rows = by_right_rows[right_norm] + right_raw = rows[0].get("right_name", "") + findings.append( + { + "id": f"F{next_id:04d}", + "title": f"Bundled cross-links: {len(lefts)} distinct left entities ↔ '{right_raw}'", + "severity": "HIGH", + "confidence": "medium", + "summary": ( + f"{len(lefts)} distinct left-side entities link to " + f"'{right_raw}'. Pattern suggests coordinated relationship " + f"(e.g. bundled donations, multi-vendor employer)." + ), + "evidence": [ + { + "source": "cross_links.csv", + "row": int(r.get("__row__", "0")), + "fields": { + "left_name": r.get("left_name", ""), + "match_type": r.get("match_type", ""), + }, + } + for r in rows + ], + "sources": ["cross_links.csv"], + } + ) + next_id += 1 + + # 3. Timing-based findings. + if timing_path and Path(timing_path).exists(): + timing = json.loads(Path(timing_path).read_text()) + for r in timing.get("results", []): + if not r.get("significant"): + continue + findings.append( + { + "id": f"F{next_id:04d}", + "title": ( + f"Donation timing significantly clusters near awards: " + f"{r['donor']} ↔ {r['recipient']}" + ), + "severity": "HIGH" if r["p_value"] < 0.01 else "MEDIUM", + "confidence": "medium", + "summary": ( + f"Mean nearest-award distance {r['observed_mean_days']} days " + f"(null {r['null_mean_days']} days). p={r['p_value']}, " + f"effect size {r['effect_size_sd']} SD. " + f"{r['n_donations']} donations, {r['n_award_dates']} awards." + ), + "evidence": [ + { + "source": "timing.json", + "row": None, + "fields": r, + } + ], + "sources": ["timing.json"], + } + ) + next_id += 1 + + # Sort: severity → confidence → id. + findings.sort( + key=lambda f: ( + SEVERITY_ORDER.get(f["severity"], 3), + CONFIDENCE_ORDER.get(f["confidence"], 3), + f["id"], + ) + ) + + payload = { + "metadata": { + "n_findings": len(findings), + "cross_links_path": cross_links_path, + "timing_path": timing_path, + "bundled_threshold": bundled_threshold, + }, + "findings": findings, + } + Path(out_path).write_text(json.dumps(payload, indent=2)) + return payload + + +def main() -> int: + p = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) + p.add_argument("--cross-links", required=True) + p.add_argument("--timing", help="Optional timing.json from timing_analysis.py") + p.add_argument("--out", default="findings.json") + p.add_argument( + "--bundled-threshold", + type=int, + default=3, + help="Minimum distinct left entities to flag as bundled (default 3)", + ) + a = p.parse_args() + + payload = build_findings( + cross_links_path=a.cross_links, + timing_path=a.timing, + out_path=a.out, + bundled_threshold=a.bundled_threshold, + ) + print(f"Wrote {payload['metadata']['n_findings']} findings to {a.out}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/optional-skills/research/osint-investigation/scripts/entity_resolution.py b/optional-skills/research/osint-investigation/scripts/entity_resolution.py new file mode 100644 index 00000000000..26d60d433d4 --- /dev/null +++ b/optional-skills/research/osint-investigation/scripts/entity_resolution.py @@ -0,0 +1,228 @@ +#!/usr/bin/env python3 +"""Cross-source entity resolution (stdlib-only). + +Given two CSV files with name columns, find candidate matches using three +tiers of normalization: + + 1. exact — normalized strings equal + 2. fuzzy — sorted-token (word-bag) match + 3. token_overlap — >=60% Jaccard overlap on >=4-char tokens, >=2 shared + +Adapted from ShinMegamiBoson/OpenPlanter (MIT) but generalized: no Boston- +specific record types, no contribution-code filters, no fixed schemas. + +Output CSV columns: + match_type, confidence, left_name, right_name, + left_normalized, right_normalized, left_row, right_row, + overlap_ratio, shared_tokens +""" +from __future__ import annotations + +import argparse +import csv +import sys +from pathlib import Path + +# Allow running directly or as a module. +sys.path.insert(0, str(Path(__file__).parent)) +from _normalize import ( # noqa: E402 + normalize_name, + normalize_aggressive, + token_overlap_ratio, +) + +CONFIDENCE = { + "exact": "high", + "fuzzy": "medium", + "token_overlap": "low", +} + + +def _read_csv(path: str, name_col: str) -> list[dict[str, str]]: + rows = [] + with open(path, newline="", encoding="utf-8") as fh: + reader = csv.DictReader(fh) + if name_col not in (reader.fieldnames or []): + raise SystemExit( + f"Column {name_col!r} not in {path}. " + f"Available: {reader.fieldnames}" + ) + for i, row in enumerate(reader): + row["__row__"] = str(i) + rows.append(row) + return rows + + +def _build_index(rows: list[dict[str, str]], name_col: str): + """Index by exact-normalized and aggressive (sorted-token) form.""" + exact: dict[str, list[dict[str, str]]] = {} + aggressive: dict[str, list[dict[str, str]]] = {} + for row in rows: + raw = row.get(name_col, "") + n = normalize_name(raw) + if n: + exact.setdefault(n, []).append(row) + a = normalize_aggressive(raw) + if a: + aggressive.setdefault(a, []).append(row) + return exact, aggressive + + +def _emit( + out_rows: list[dict[str, str]], + seen: set[tuple], + match_type: str, + left_row: dict[str, str], + right_row: dict[str, str], + left_col: str, + right_col: str, + ratio: float = 0.0, + shared: int = 0, +): + left_raw = left_row.get(left_col, "") + right_raw = right_row.get(right_col, "") + key = ( + left_row["__row__"], + right_row["__row__"], + match_type, + ) + if key in seen: + return + seen.add(key) + out_rows.append( + { + "match_type": match_type, + "confidence": CONFIDENCE[match_type], + "left_name": left_raw, + "right_name": right_raw, + "left_normalized": normalize_name(left_raw), + "right_normalized": normalize_name(right_raw), + "left_row": left_row["__row__"], + "right_row": right_row["__row__"], + "overlap_ratio": f"{ratio:.3f}" if ratio else "", + "shared_tokens": str(shared) if shared else "", + } + ) + + +def resolve( + left_path: str, + left_col: str, + right_path: str, + right_col: str, + out_path: str, + overlap_threshold: float = 0.60, + min_shared: int = 2, + skip_overlap: bool = False, +) -> int: + left_rows = _read_csv(left_path, left_col) + right_rows = _read_csv(right_path, right_col) + + right_exact, right_aggressive = _build_index(right_rows, right_col) + + out_rows: list[dict[str, str]] = [] + seen: set[tuple] = set() + + # Pass 1+2: exact / fuzzy via index lookup. + for lrow in left_rows: + raw = lrow.get(left_col, "") + n = normalize_name(raw) + if not n: + continue + for rrow in right_exact.get(n, []): + _emit(out_rows, seen, "exact", lrow, rrow, left_col, right_col) + a = normalize_aggressive(raw) + if a: + for rrow in right_aggressive.get(a, []): + _emit(out_rows, seen, "fuzzy", lrow, rrow, left_col, right_col) + + if not skip_overlap: + # Pass 3: token overlap (O(N*M) — expensive; allow opt-out). + for lrow in left_rows: + l_raw = lrow.get(left_col, "") + if not normalize_name(l_raw): + continue + for rrow in right_rows: + ratio, shared = token_overlap_ratio( + l_raw, rrow.get(right_col, "") + ) + if ratio >= overlap_threshold and shared >= min_shared: + _emit( + out_rows, + seen, + "token_overlap", + lrow, + rrow, + left_col, + right_col, + ratio=ratio, + shared=shared, + ) + + fieldnames = [ + "match_type", + "confidence", + "left_name", + "right_name", + "left_normalized", + "right_normalized", + "left_row", + "right_row", + "overlap_ratio", + "shared_tokens", + ] + with open(out_path, "w", newline="", encoding="utf-8") as fh: + writer = csv.DictWriter(fh, fieldnames=fieldnames) + writer.writeheader() + writer.writerows(out_rows) + return len(out_rows) + + +def main() -> int: + p = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) + p.add_argument("--left", required=True, help="Left CSV path") + p.add_argument( + "--left-name-col", required=True, help="Name column in left CSV" + ) + p.add_argument("--right", required=True, help="Right CSV path") + p.add_argument( + "--right-name-col", + required=True, + help="Name column in right CSV", + ) + p.add_argument("--out", required=True, help="Output CSV path") + p.add_argument( + "--overlap-threshold", + type=float, + default=0.60, + help="Jaccard overlap threshold for token_overlap tier (default 0.60)", + ) + p.add_argument( + "--min-shared", + type=int, + default=2, + help="Minimum shared tokens for token_overlap tier (default 2)", + ) + p.add_argument( + "--skip-overlap", + action="store_true", + help="Skip the O(N*M) token_overlap pass (much faster on large CSVs)", + ) + args = p.parse_args() + + count = resolve( + left_path=args.left, + left_col=args.left_name_col, + right_path=args.right, + right_col=args.right_name_col, + out_path=args.out, + overlap_threshold=args.overlap_threshold, + min_shared=args.min_shared, + skip_overlap=args.skip_overlap, + ) + print(f"Wrote {count} match rows to {args.out}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/optional-skills/research/osint-investigation/scripts/fetch_courtlistener.py b/optional-skills/research/osint-investigation/scripts/fetch_courtlistener.py new file mode 100644 index 00000000000..db5e715bf57 --- /dev/null +++ b/optional-skills/research/osint-investigation/scripts/fetch_courtlistener.py @@ -0,0 +1,149 @@ +#!/usr/bin/env python3 +"""Search court records via CourtListener (Free Law Project). + +Covers ~10M federal and state court opinions, plus PACER docket data +where available. Public REST API v4 supports anonymous read access for +search; some endpoints require a token (free at courtlistener.com). + +Set COURTLISTENER_TOKEN to authenticate (raises rate limits). +""" +from __future__ import annotations + +import argparse +import csv +import os +import sys +import urllib.parse +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).parent)) +from _http import get_json # noqa: E402 + +BASE = "https://www.courtlistener.com/api/rest/v4/search/" + +COLUMNS = [ + "case_name", + "court", + "court_id", + "date_filed", + "docket_number", + "judge", + "citation", + "result_type", + "snippet", + "absolute_url", +] + +SEARCH_TYPES = { + "opinions": "o", # Court opinions + "dockets": "r", # PACER dockets (may require auth depending on coverage) + "oral": "oa", # Oral arguments + "people": "p", # Judges / people + "recap": "r", # Same as dockets in v4 +} + + +def fetch( + query: str, + search_type: str, + court: str | None, + date_from: str | None, + date_to: str | None, + token: str | None, + limit: int, + out_path: str, +) -> int: + type_code = SEARCH_TYPES.get(search_type, search_type) + params = { + "q": query, + "type": type_code, + } + if court: + params["court"] = court + if date_from: + params["filed_after"] = date_from + if date_to: + params["filed_before"] = date_to + headers = {"Authorization": f"Token {token}"} if token else None + + rows: list[dict[str, str]] = [] + next_url: str | None = f"{BASE}?{urllib.parse.urlencode(params)}" + while next_url and len(rows) < limit: + try: + payload = get_json(next_url, headers=headers) + except Exception as e: # noqa: BLE001 + print(f"CourtListener error: {e}", file=sys.stderr) + break + if not isinstance(payload, dict): + break + results = payload.get("results", []) + for r in results: + if len(rows) >= limit: + break + rows.append( + { + "case_name": r.get("caseName", "") or r.get("case_name", "") or "", + "court": r.get("court", "") or "", + "court_id": r.get("court_id", "") or "", + "date_filed": (r.get("dateFiled", "") or r.get("date_filed", "") or "")[:10], + "docket_number": r.get("docketNumber", "") or r.get("docket_number", "") or "", + "judge": r.get("judge", "") or "", + "citation": "; ".join(r.get("citation", []) or []) if isinstance(r.get("citation"), list) else (r.get("citation") or ""), + "result_type": search_type, + "snippet": (r.get("snippet", "") or "").replace("\n", " ")[:500], + "absolute_url": ( + f"https://www.courtlistener.com{r.get('absolute_url', '')}" + if r.get("absolute_url", "").startswith("/") + else r.get("absolute_url", "") + ), + } + ) + next_url = payload.get("next") + + Path(out_path).parent.mkdir(parents=True, exist_ok=True) + with open(out_path, "w", newline="", encoding="utf-8") as fh: + w = csv.DictWriter(fh, fieldnames=COLUMNS) + w.writeheader() + w.writerows(rows) + if not rows: + print( + f"CourtListener: 0 results for type={search_type!r} q={query!r}. " + "Most private individuals don't appear in published court records " + "unless they were party to a federal or state appellate case.", + file=sys.stderr, + ) + return len(rows) + + +def main() -> int: + p = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) + p.add_argument("--query", required=True, help="Search query (party name, case name, keyword)") + p.add_argument( + "--type", + default="opinions", + choices=list(SEARCH_TYPES.keys()), + help="Search type (default: opinions)", + ) + p.add_argument("--court", help="Court ID filter (e.g. 'nysd' = SDNY, 'scotus' = Supreme Court)") + p.add_argument("--date-from", help="Filed-after date YYYY-MM-DD") + p.add_argument("--date-to", help="Filed-before date YYYY-MM-DD") + p.add_argument("--token", default=os.environ.get("COURTLISTENER_TOKEN")) + p.add_argument("--limit", type=int, default=100) + p.add_argument("--out", required=True) + a = p.parse_args() + n = fetch( + query=a.query, + search_type=a.type, + court=a.court, + date_from=a.date_from, + date_to=a.date_to, + token=a.token, + limit=a.limit, + out_path=a.out, + ) + print(f"Wrote {n} CourtListener rows to {a.out}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/optional-skills/research/osint-investigation/scripts/fetch_gdelt.py b/optional-skills/research/osint-investigation/scripts/fetch_gdelt.py new file mode 100644 index 00000000000..fa98dabc9bb --- /dev/null +++ b/optional-skills/research/osint-investigation/scripts/fetch_gdelt.py @@ -0,0 +1,162 @@ +#!/usr/bin/env python3 +"""Search the GDELT 2.0 DOC API for news mentions. + +GDELT monitors world news in 100+ languages and indexes the full text. +Free, anonymous, ~15-minute update frequency. Covers ~2015→present. + +Useful for surfacing news mentions of a person, company, or topic across +international media — much wider net than Google News. +""" +from __future__ import annotations + +import argparse +import csv +import json +import sys +import time +import urllib.parse +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).parent)) +from _http import get_json # noqa: E402 + +BASE = "https://api.gdeltproject.org/api/v2/doc/doc" + +COLUMNS = [ + "title", + "url", + "seen_date", + "domain", + "language", + "source_country", + "tone", + "social_image", +] + + +def fetch( + query: str, + mode: str, + timespan: str | None, + start_datetime: str | None, + end_datetime: str | None, + source_country: str | None, + source_lang: str | None, + limit: int, + out_path: str, +) -> int: + params: dict[str, str] = { + "query": query, + "mode": mode, + "format": "json", + "maxrecords": str(min(limit, 250)), + "sort": "datedesc", + } + if timespan: + params["timespan"] = timespan + if start_datetime: + params["startdatetime"] = start_datetime.replace("-", "").replace(":", "").replace(" ", "") + if end_datetime: + params["enddatetime"] = end_datetime.replace("-", "").replace(":", "").replace(" ", "") + if source_country: + params["sourcecountry"] = source_country + if source_lang: + params["sourcelang"] = source_lang + + url = f"{BASE}?{urllib.parse.urlencode(params)}" + payload: dict | list = {} + for attempt in range(3): + try: + payload = get_json(url) + break + except RuntimeError as e: + # GDELT requires 1 request per 5 seconds; back off and retry. + if "429" in str(e) and attempt < 2: + print( + f"GDELT throttle hit; sleeping 6s before retry " + f"(attempt {attempt + 1}/3)", + file=sys.stderr, + ) + time.sleep(6) + continue + print(f"GDELT error: {e}", file=sys.stderr) + payload = {} + break + except Exception as e: # noqa: BLE001 + print(f"GDELT error: {e}", file=sys.stderr) + payload = {} + break + + rows: list[dict[str, str]] = [] + if isinstance(payload, dict): + articles = payload.get("articles", []) or [] + for a in articles[:limit]: + seen = (a.get("seendate") or "") + # GDELT format: 20260319T083000Z → 2026-03-19 08:30:00Z + if len(seen) == 16 and "T" in seen: + seen = f"{seen[0:4]}-{seen[4:6]}-{seen[6:8]} {seen[9:11]}:{seen[11:13]}:{seen[13:15]}Z" + rows.append( + { + "title": (a.get("title") or "").replace("\n", " ").strip(), + "url": a.get("url") or "", + "seen_date": seen, + "domain": a.get("domain") or "", + "language": a.get("language") or "", + "source_country": a.get("sourcecountry") or "", + "tone": str(a.get("tone") or ""), + "social_image": a.get("socialimage") or "", + } + ) + + Path(out_path).parent.mkdir(parents=True, exist_ok=True) + with open(out_path, "w", newline="", encoding="utf-8") as fh: + w = csv.DictWriter(fh, fieldnames=COLUMNS) + w.writeheader() + w.writerows(rows) + if not rows: + print( + f"GDELT: 0 articles for query={query!r}. " + "GDELT indexes ~2015→present. Try widening the timespan or " + "checking the query syntax (https://blog.gdeltproject.org/gdelt-doc-2-0-api-debuts/).", + file=sys.stderr, + ) + return len(rows) + + +def main() -> int: + p = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) + p.add_argument("--query", required=True, help='Search query (supports GDELT operators: quoted phrases, AND/OR/NOT, sourcecountry:, theme:)') + p.add_argument( + "--mode", + default="ArtList", + choices=["ArtList", "ImageCollage", "TimelineVol", "TimelineTone", "ToneChart"], + help="GDELT mode (default ArtList for article list)", + ) + p.add_argument( + "--timespan", + help="Relative window: e.g. '1d', '1w', '1m', '3m', '1y' (overrides start/end)", + ) + p.add_argument("--start", help="Absolute start YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS") + p.add_argument("--end", help="Absolute end YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS") + p.add_argument("--source-country", help="2-letter source country (e.g. US, UK)") + p.add_argument("--source-lang", help="Source language (e.g. English, Spanish)") + p.add_argument("--limit", type=int, default=100) + p.add_argument("--out", required=True) + a = p.parse_args() + n = fetch( + query=a.query, + mode=a.mode, + timespan=a.timespan, + start_datetime=a.start, + end_datetime=a.end, + source_country=a.source_country, + source_lang=a.source_lang, + limit=a.limit, + out_path=a.out, + ) + print(f"Wrote {n} GDELT article rows to {a.out}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/optional-skills/research/osint-investigation/scripts/fetch_icij_offshore.py b/optional-skills/research/osint-investigation/scripts/fetch_icij_offshore.py new file mode 100644 index 00000000000..8d050b62bf1 --- /dev/null +++ b/optional-skills/research/osint-investigation/scripts/fetch_icij_offshore.py @@ -0,0 +1,234 @@ +#!/usr/bin/env python3 +"""Search ICIJ Offshore Leaks via the bulk CSV database. + +The old reconcile endpoint (https://offshoreleaks.icij.org/reconcile) returns +404 — ICIJ has removed it. The remaining stable access path is the public +bulk download: + + https://offshoreleaks-data.icij.org/offshoreleaks/csv/full-oldb.LATEST.zip + +~70 MB, ~6 CSVs inside (nodes-entities, nodes-officers, nodes-intermediaries, +nodes-addresses, relationships, ...). We cache it under +$HERMES_OSINT_CACHE/icij/ (default: ~/.cache/hermes-osint/icij/) and search +locally so the agent doesn't re-download for every query. + +Output CSV columns match the original `fetch_icij_offshore.py` contract. +""" +from __future__ import annotations + +import argparse +import csv +import io +import os +import re +import sys +import time +import urllib.request +import zipfile +from pathlib import Path + +BULK_URL = "https://offshoreleaks-data.icij.org/offshoreleaks/csv/full-oldb.LATEST.zip" + +COLUMNS = [ + "node_id", + "name", + "node_type", + "country_codes", + "countries", + "jurisdiction", + "incorporation_date", + "inactivation_date", + "source", + "entity_url", + "connections", +] + + +def _cache_dir() -> Path: + base = os.environ.get("HERMES_OSINT_CACHE") + if base: + return Path(base) / "icij" + return Path.home() / ".cache" / "hermes-osint" / "icij" + + +def _download(dest: Path, force: bool = False) -> Path: + """Download (or reuse cached) ICIJ bulk ZIP.""" + dest.mkdir(parents=True, exist_ok=True) + zip_path = dest / "full-oldb.zip" + if zip_path.exists() and not force: + # Re-check age: refetch if older than 30 days. + age_days = (time.time() - zip_path.stat().st_mtime) / 86400 + if age_days < 30: + return zip_path + print(f"Downloading ICIJ bulk database (~70 MB) to {zip_path}", file=sys.stderr) + req = urllib.request.Request( + BULK_URL, + headers={"User-Agent": "hermes-agent osint-investigation skill"}, + ) + with urllib.request.urlopen(req, timeout=120) as resp: # noqa: S310 + tmp = zip_path.with_suffix(".zip.tmp") + with open(tmp, "wb") as fh: + while True: + chunk = resp.read(1 << 16) + if not chunk: + break + fh.write(chunk) + tmp.replace(zip_path) + return zip_path + + +def _open_csv(zf: zipfile.ZipFile, name_pattern: str): + """Open the first CSV matching name_pattern (case-insensitive substring).""" + for info in zf.infolist(): + if name_pattern.lower() in info.filename.lower() and info.filename.lower().endswith(".csv"): + return zf.open(info), info.filename + return None, None + + +def _match(needle_norm: str, hay: str) -> bool: + return needle_norm in (hay or "").upper() + + +def _normalize_query(s: str) -> str: + s = s.upper() + s = re.sub(r"[^\w\s]", " ", s) + s = re.sub(r"\s+", " ", s).strip() + return s + + +def fetch( + entity: str | None, + officer: str | None, + jurisdiction: str | None, + out_path: str, + cache_dir: Path, + force_refresh: bool = False, + limit: int = 500, +) -> int: + zip_path = _download(cache_dir, force=force_refresh) + rows: list[dict[str, str]] = [] + needles: list[tuple[str, str]] = [] # (kind, normalized needle) + if entity: + needles.append(("Entity", _normalize_query(entity))) + if officer: + needles.append(("Officer", _normalize_query(officer))) + jur_norm = _normalize_query(jurisdiction) if jurisdiction else None + + targets = [ + ("Entity", "nodes-entities"), + ("Officer", "nodes-officers"), + ("Intermediary", "nodes-intermediaries"), + ] + + with zipfile.ZipFile(zip_path) as zf: + for node_type, csv_substring in targets: + relevant_needles = [n for (k, n) in needles if k in (node_type, "Entity", "Officer")] or [] + # Only scan a CSV if we have a needle that could plausibly match it, + # or if we have ONLY a jurisdiction filter. + applicable_needles = [n for (k, n) in needles if k == node_type] + if needles and not applicable_needles and not jur_norm: + continue + stream, fname = _open_csv(zf, csv_substring) + if not stream: + continue + with stream: + text = io.TextIOWrapper(stream, encoding="utf-8", errors="replace") + reader = csv.DictReader(text) + for row in reader: + name = (row.get("name") or "").strip() + if not name: + continue + name_u = name.upper() + matched = False + for n in applicable_needles or relevant_needles: + if _match(n, name_u): + matched = True + break + if not needles: + matched = True # jurisdiction-only sweep + if not matched: + continue + jur = (row.get("jurisdiction_description") or row.get("country_codes") or "").strip() + if jur_norm and jur_norm not in jur.upper() and jur_norm not in (row.get("countries") or "").upper(): + continue + node_id = (row.get("node_id") or "").strip() + rows.append( + { + "node_id": node_id, + "name": name, + "node_type": node_type, + "country_codes": row.get("country_codes", "") or "", + "countries": row.get("countries", "") or "", + "jurisdiction": jur, + "incorporation_date": row.get("incorporation_date", "") or "", + "inactivation_date": row.get("inactivation_date", "") or "", + "source": row.get("sourceID", "") or row.get("source", "") or "", + "entity_url": ( + f"https://offshoreleaks.icij.org/nodes/{node_id}" if node_id else "" + ), + "connections": "", + } + ) + if len(rows) >= limit: + break + if len(rows) >= limit: + break + + Path(out_path).parent.mkdir(parents=True, exist_ok=True) + with open(out_path, "w", newline="", encoding="utf-8") as fh: + w = csv.DictWriter(fh, fieldnames=COLUMNS) + w.writeheader() + w.writerows(rows) + if not rows: + bits = [] + if entity: + bits.append(f"entity={entity!r}") + if officer: + bits.append(f"officer={officer!r}") + if jurisdiction: + bits.append(f"jurisdiction={jurisdiction!r}") + print( + f"ICIJ: 0 matches for {', '.join(bits)}. " + "The bulk database covers offshore leaks (Panama, Paradise, Pandora, " + "Bahamas, Offshore Leaks). Most private US individuals are NOT in it.", + file=sys.stderr, + ) + return len(rows) + + +def main() -> int: + p = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) + p.add_argument("--entity", help="Search by entity name (substring, case-insensitive)") + p.add_argument("--officer", help="Search by officer / individual name (substring, case-insensitive)") + p.add_argument("--jurisdiction", help="Filter results by jurisdiction substring") + p.add_argument("--limit", type=int, default=500) + p.add_argument("--out", required=True) + p.add_argument( + "--cache-dir", + type=Path, + default=None, + help="Override cache directory (default: $HERMES_OSINT_CACHE/icij or ~/.cache/hermes-osint/icij)", + ) + p.add_argument( + "--force-refresh", + action="store_true", + help="Re-download the bulk ZIP even if a recent cached copy exists.", + ) + a = p.parse_args() + if not (a.entity or a.officer or a.jurisdiction): + p.error("must supply at least one of --entity / --officer / --jurisdiction") + n = fetch( + entity=a.entity, + officer=a.officer, + jurisdiction=a.jurisdiction, + out_path=a.out, + cache_dir=a.cache_dir or _cache_dir(), + force_refresh=a.force_refresh, + limit=a.limit, + ) + print(f"Wrote {n} ICIJ Offshore Leaks rows to {a.out}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/optional-skills/research/osint-investigation/scripts/fetch_nyc_acris.py b/optional-skills/research/osint-investigation/scripts/fetch_nyc_acris.py new file mode 100644 index 00000000000..6ec448f0f53 --- /dev/null +++ b/optional-skills/research/osint-investigation/scripts/fetch_nyc_acris.py @@ -0,0 +1,203 @@ +#!/usr/bin/env python3 +"""Search NYC property records via ACRIS (Automated City Register Information System). + +Uses the city's Socrata-backed open data API. No auth required for read access. + +Datasets: + bnx9-e6tj — Real Property Master (one row per recorded document) + 636b-3b5g — Real Property Parties (names — grantor, grantee, etc.) + 8h5j-fqxa — Real Property Legal (lot / property identifiers) + uqqa-hym2 — Real Property References + +The Parties dataset has the names. We search by name and optionally join to +Master to get the doc type and date. +""" +from __future__ import annotations + +import argparse +import csv +import sys +import urllib.parse +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).parent)) +from _http import get_json # noqa: E402 + +PARTIES_URL = "https://data.cityofnewyork.us/resource/636b-3b5g.json" +MASTER_URL = "https://data.cityofnewyork.us/resource/bnx9-e6tj.json" + +PARTY_TYPE = { + "1": "grantor (seller / mortgagor / debtor)", + "2": "grantee (buyer / mortgagee / creditor)", + "3": "other party", +} + +BOROUGH = { + "1": "Manhattan", + "2": "Bronx", + "3": "Brooklyn", + "4": "Queens", + "5": "Staten Island", +} + +COLUMNS = [ + "document_id", + "name", + "party_type", + "party_role", + "address_1", + "address_2", + "city", + "state", + "zip", + "country", + "doc_type", + "doc_date", + "recorded_date", + "borough", + "amount", + "filing_url", +] + + +def _filing_url(document_id: str) -> str: + if not document_id: + return "" + return ( + f"https://a836-acris.nyc.gov/DS/DocumentSearch/DocumentImageView?doc_id={document_id}" + ) + + +def fetch( + name: str | None, + address: str | None, + party_type: str | None, + limit: int, + out_path: str, + enrich: bool = True, +) -> int: + if not (name or address): + raise SystemExit("must supply --name or --address") + + where_clauses: list[str] = [] + if name: + safe = name.upper().replace("'", "''") + where_clauses.append(f"upper(name) like '%{safe}%'") + if address: + safe_addr = address.upper().replace("'", "''") + where_clauses.append(f"upper(address_1) like '%{safe_addr}%'") + if party_type and party_type in {"1", "2", "3"}: + where_clauses.append(f"party_type='{party_type}'") + + params = { + "$where": " AND ".join(where_clauses), + "$limit": str(limit), + } + url = f"{PARTIES_URL}?{urllib.parse.urlencode(params)}" + parties = get_json(url) + if not isinstance(parties, list): + raise SystemExit(f"Unexpected ACRIS response: {parties!r}") + + # Enrich with master record (doc_type, dates, borough, amount). + doc_ids: list[str] = sorted({ + d for d in (p.get("document_id") for p in parties) if d + }) + masters: dict[str, dict] = {} + if enrich and doc_ids: + # Batch up to 100 doc_ids per request (Socrata IN-list is fine for this). + for i in range(0, len(doc_ids), 100): + chunk = doc_ids[i : i + 100] + id_list = ",".join(f"'{d}'" for d in chunk) + master_params = { + "$where": f"document_id in ({id_list})", + "$limit": "100", + } + url = f"{MASTER_URL}?{urllib.parse.urlencode(master_params)}" + try: + rows = get_json(url) + except Exception as e: # noqa: BLE001 + print(f"ACRIS master lookup failed for chunk: {e}", file=sys.stderr) + continue + if isinstance(rows, list): + for r in rows: + did = r.get("document_id", "") + if did: + masters[did] = r + + out_rows: list[dict[str, str]] = [] + for p in parties: + did = p.get("document_id", "") or "" + m = masters.get(did, {}) + out_rows.append( + { + "document_id": did, + "name": p.get("name", "") or "", + "party_type": p.get("party_type", "") or "", + "party_role": PARTY_TYPE.get(p.get("party_type", ""), ""), + "address_1": p.get("address_1", "") or "", + "address_2": p.get("address_2", "") or "", + "city": p.get("city", "") or "", + "state": p.get("state", "") or "", + "zip": p.get("zip", "") or "", + "country": p.get("country", "") or "", + "doc_type": m.get("doc_type", "") or "", + "doc_date": (m.get("document_date", "") or "")[:10], + "recorded_date": (m.get("recorded_datetime", "") or "")[:10], + "borough": BOROUGH.get(m.get("recorded_borough", ""), m.get("recorded_borough", "")), + "amount": m.get("document_amt", "") or "", + "filing_url": _filing_url(did), + } + ) + + Path(out_path).parent.mkdir(parents=True, exist_ok=True) + with open(out_path, "w", newline="", encoding="utf-8") as fh: + w = csv.DictWriter(fh, fieldnames=COLUMNS) + w.writeheader() + w.writerows(out_rows) + + if not out_rows: + filters = [] + if name: + filters.append(f"name={name!r}") + if address: + filters.append(f"address={address!r}") + print( + f"NYC ACRIS: 0 records for {', '.join(filters)}. " + "ACRIS covers ONLY NYC (5 boroughs). For property records elsewhere, " + "search the relevant county recorder directly.", + file=sys.stderr, + ) + return len(out_rows) + + +def main() -> int: + p = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) + p.add_argument("--name", help="Party name substring (case-insensitive)") + p.add_argument("--address", help="Address line 1 substring") + p.add_argument( + "--party-type", + choices=["1", "2", "3"], + help="Filter party type: 1=grantor (seller/mortgagor), 2=grantee (buyer/mortgagee), 3=other", + ) + p.add_argument("--limit", type=int, default=200) + p.add_argument( + "--no-enrich", + action="store_true", + help="Skip the master-document lookup that adds doc_type/date/amount", + ) + p.add_argument("--out", required=True) + a = p.parse_args() + n = fetch( + name=a.name, + address=a.address, + party_type=a.party_type, + limit=a.limit, + out_path=a.out, + enrich=not a.no_enrich, + ) + print(f"Wrote {n} NYC ACRIS rows to {a.out}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/optional-skills/research/osint-investigation/scripts/fetch_ofac_sdn.py b/optional-skills/research/osint-investigation/scripts/fetch_ofac_sdn.py new file mode 100644 index 00000000000..5233fa09ab8 --- /dev/null +++ b/optional-skills/research/osint-investigation/scripts/fetch_ofac_sdn.py @@ -0,0 +1,175 @@ +#!/usr/bin/env python3 +"""Fetch OFAC SDN list (CSV format) and normalize. + +Public endpoint: https://www.treasury.gov/ofac/downloads/sdn.csv +Format reference: https://ofac.treasury.gov/specially-designated-nationals-and-blocked-persons-list-sdn-human-readable-lists + +The SDN CSV uses a specific 12-column format with no header row: + ent_num, sdn_name, sdn_type, program, title, call_sign, vess_type, + tonnage, grt, vess_flag, vess_owner, remarks +Address and AKA records live in separate files. We fetch all three and join. +""" +from __future__ import annotations + +import argparse +import csv +import io +import sys +from collections import defaultdict +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).parent)) +from _http import get # noqa: E402 + +SDN_URL = "https://www.treasury.gov/ofac/downloads/sdn.csv" +ADD_URL = "https://www.treasury.gov/ofac/downloads/add.csv" +ALT_URL = "https://www.treasury.gov/ofac/downloads/alt.csv" + +SDN_COLS = [ + "ent_num", "sdn_name", "sdn_type", "program", "title", + "call_sign", "vess_type", "tonnage", "grt", "vess_flag", + "vess_owner", "remarks", +] +ADD_COLS = [ + "ent_num", "add_num", "address", "city_state_zip", "country", "add_remarks", +] +ALT_COLS = [ + "ent_num", "alt_num", "alt_type", "alt_name", "alt_remarks", +] + +COLUMNS = [ + "entity_id", + "name", + "entity_type", + "program_list", + "title", + "nationalities", + "aka_list", + "addresses", + "dob", + "pob", + "remarks", + "last_updated", +] + +_TYPE_MAP = { + "individual": "individual", + "entity": "entity", + "vessel": "vessel", + "aircraft": "aircraft", +} + + +def _read_csv(url: str, columns: list[str]) -> list[dict[str, str]]: + body = get(url, timeout=60).decode("latin-1", errors="replace") + reader = csv.reader(io.StringIO(body)) + out = [] + for row in reader: + if not row: + continue + # Pad/truncate to expected width. + row = row[: len(columns)] + [""] * (len(columns) - len(row)) + out.append(dict(zip(columns, row))) + return out + + +def _strip_quotes(s: str) -> str: + s = s.strip() + if s.startswith('"') and s.endswith('"'): + s = s[1:-1] + if s == "-0-": + return "" + return s + + +def fetch( + program: str | None, + entity_type: str | None, + out_path: str, +) -> int: + sdn = _read_csv(SDN_URL, SDN_COLS) + addresses = _read_csv(ADD_URL, ADD_COLS) + akas = _read_csv(ALT_URL, ALT_COLS) + + addr_by_ent: dict[str, list[str]] = defaultdict(list) + for a in addresses: + ent = _strip_quotes(a["ent_num"]) + parts = [ + _strip_quotes(a[c]) + for c in ("address", "city_state_zip", "country") + if _strip_quotes(a[c]) + ] + if parts: + addr_by_ent[ent].append(", ".join(parts)) + + aka_by_ent: dict[str, list[str]] = defaultdict(list) + for k in akas: + ent = _strip_quotes(k["ent_num"]) + name = _strip_quotes(k["alt_name"]) + if name: + aka_by_ent[ent].append(name) + + rows: list[dict[str, str]] = [] + for r in sdn: + ent_num = _strip_quotes(r["ent_num"]) + if not ent_num: + continue + sdn_type = _TYPE_MAP.get(_strip_quotes(r["sdn_type"]).lower(), _strip_quotes(r["sdn_type"])) + if entity_type and sdn_type != entity_type: + continue + progs = _strip_quotes(r["program"]) + if program and program.upper() not in progs.upper().split(";"): + continue + remarks = _strip_quotes(r["remarks"]) + # DOB / POB are commonly embedded in remarks for individuals. + dob = "" + pob = "" + if sdn_type == "individual" and remarks: + for chunk in remarks.split(";"): + ch = chunk.strip() + if ch.upper().startswith("DOB"): + dob = ch.split(maxsplit=1)[1] if " " in ch else "" + elif ch.upper().startswith("POB"): + pob = ch.split(maxsplit=1)[1] if " " in ch else "" + rows.append( + { + "entity_id": ent_num, + "name": _strip_quotes(r["sdn_name"]), + "entity_type": sdn_type, + "program_list": "; ".join(p.strip() for p in progs.split(";") if p.strip()), + "title": _strip_quotes(r["title"]), + "nationalities": "", # not in this CSV; available in XML format + "aka_list": "; ".join(aka_by_ent.get(ent_num, [])), + "addresses": "; ".join(addr_by_ent.get(ent_num, [])), + "dob": dob, + "pob": pob, + "remarks": remarks, + "last_updated": "", + } + ) + + Path(out_path).parent.mkdir(parents=True, exist_ok=True) + with open(out_path, "w", newline="", encoding="utf-8") as fh: + w = csv.DictWriter(fh, fieldnames=COLUMNS) + w.writeheader() + w.writerows(rows) + return len(rows) + + +def main() -> int: + p = argparse.ArgumentParser(description=__doc__) + p.add_argument("--program", help="Filter to specific sanctions program (e.g. SDGT, IRAN)") + p.add_argument( + "--entity-type", + choices=["individual", "entity", "vessel", "aircraft"], + help="Filter to a specific entity type", + ) + p.add_argument("--out", required=True) + a = p.parse_args() + n = fetch(program=a.program, entity_type=a.entity_type, out_path=a.out) + print(f"Wrote {n} OFAC SDN rows to {a.out}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/optional-skills/research/osint-investigation/scripts/fetch_opencorporates.py b/optional-skills/research/osint-investigation/scripts/fetch_opencorporates.py new file mode 100644 index 00000000000..6924a8056a6 --- /dev/null +++ b/optional-skills/research/osint-investigation/scripts/fetch_opencorporates.py @@ -0,0 +1,192 @@ +#!/usr/bin/env python3 +"""Search OpenCorporates company registry data. + +OpenCorporates aggregates ~200M companies from 130+ jurisdictions. The +public API requires an API token (free tier: 500 calls/month). Set +OPENCORPORATES_API_TOKEN in env or pass --token. + +Without a token, this script falls back to scraping the public HTML +search page (limited fields, more brittle, no jurisdiction filter). +""" +from __future__ import annotations + +import argparse +import csv +import json +import os +import re +import sys +import urllib.parse +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).parent)) +from _http import get, get_json # noqa: E402 + +API_URL = "https://api.opencorporates.com/v0.4/companies/search" +HTML_URL = "https://opencorporates.com/companies" + +COLUMNS = [ + "name", + "company_number", + "jurisdiction_code", + "jurisdiction_name", + "incorporation_date", + "dissolution_date", + "company_type", + "status", + "registered_address", + "opencorporates_url", + "officers_count", + "source", +] + + +def _via_api(query: str, jurisdiction: str | None, token: str, limit: int) -> list[dict]: + params = { + "q": query, + "api_token": token, + "per_page": str(min(limit, 100)), + } + if jurisdiction: + params["jurisdiction_code"] = jurisdiction + url = f"{API_URL}?{urllib.parse.urlencode(params)}" + payload = get_json(url) + if not isinstance(payload, dict): + return [] + results = payload.get("results", {}).get("companies", []) or [] + return [r.get("company", {}) for r in results if isinstance(r, dict)] + + +def _via_html(query: str, limit: int) -> list[dict]: + """Best-effort HTML fallback when no API token is available.""" + params = {"q": query, "utf8": "✓"} + url = f"{HTML_URL}?{urllib.parse.urlencode(params)}" + body = get(url, user_agent="Mozilla/5.0 hermes-osint").decode("utf-8", errors="replace") + # Each result is in <li class="company"> ... </li> with name, url, status + pattern = re.compile( + r'<li[^>]*class="[^"]*company[^"]*"[^>]*>.*?' + r'<a[^>]+href="(?P<url>/companies/[^"]+)"[^>]*>(?P<name>[^<]+)</a>' + r'(?:.*?<span[^>]*class="[^"]*jurisdiction[^"]*"[^>]*>(?P<jur>[^<]+)</span>)?' + r"(?:.*?<dt[^>]*>(?:Company\s+Number|Number)</dt>\s*<dd[^>]*>(?P<num>[^<]+)</dd>)?", + re.DOTALL | re.IGNORECASE, + ) + out = [] + for m in pattern.finditer(body): + if len(out) >= limit: + break + url_path = m.group("url").strip() + out.append( + { + "name": (m.group("name") or "").strip(), + "opencorporates_url": f"https://opencorporates.com{url_path}", + "jurisdiction_code": (m.group("jur") or "").strip(), + "company_number": (m.group("num") or "").strip(), + "_via": "html", + } + ) + return out + + +def fetch( + query: str, + jurisdiction: str | None, + token: str | None, + limit: int, + out_path: str, +) -> int: + if token: + try: + companies = _via_api(query, jurisdiction, token, limit) + source_tag = "api" + except Exception as e: # noqa: BLE001 + print( + f"OpenCorporates API call failed ({e}); falling back to HTML.", + file=sys.stderr, + ) + companies = _via_html(query, limit) + source_tag = "html-fallback" + else: + print( + "OPENCORPORATES_API_TOKEN not set — using HTML fallback (limited fields). " + "Get a free token at https://opencorporates.com/api_accounts/new", + file=sys.stderr, + ) + companies = _via_html(query, limit) + source_tag = "html" + + rows: list[dict[str, str]] = [] + for c in companies[:limit]: + if c.get("_via") == "html": + rows.append( + { + "name": c.get("name", ""), + "company_number": c.get("company_number", ""), + "jurisdiction_code": c.get("jurisdiction_code", ""), + "jurisdiction_name": "", + "incorporation_date": "", + "dissolution_date": "", + "company_type": "", + "status": "", + "registered_address": "", + "opencorporates_url": c.get("opencorporates_url", ""), + "officers_count": "", + "source": source_tag, + } + ) + continue + addr = c.get("registered_address_in_full") or "" + rows.append( + { + "name": c.get("name", "") or "", + "company_number": c.get("company_number", "") or "", + "jurisdiction_code": c.get("jurisdiction_code", "") or "", + "jurisdiction_name": "", + "incorporation_date": c.get("incorporation_date", "") or "", + "dissolution_date": c.get("dissolution_date", "") or "", + "company_type": c.get("company_type", "") or "", + "status": c.get("current_status", "") or c.get("inactive", "") or "", + "registered_address": addr, + "opencorporates_url": c.get("opencorporates_url", "") or "", + "officers_count": str(c.get("officers", {}).get("total_count", "") if c.get("officers") else ""), + "source": source_tag, + } + ) + + Path(out_path).parent.mkdir(parents=True, exist_ok=True) + with open(out_path, "w", newline="", encoding="utf-8") as fh: + w = csv.DictWriter(fh, fieldnames=COLUMNS) + w.writeheader() + w.writerows(rows) + if not rows: + print( + f"OpenCorporates: 0 matches for query={query!r}" + f"{f' jurisdiction={jurisdiction!r}' if jurisdiction else ''}.", + file=sys.stderr, + ) + return len(rows) + + +def main() -> int: + p = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) + p.add_argument("--query", required=True, help="Company name search") + p.add_argument( + "--jurisdiction", + help="Jurisdiction code, e.g. 'us_ny', 'us_de', 'gb', 'sg' (lowercased OpenCorporates style)", + ) + p.add_argument("--limit", type=int, default=50) + p.add_argument("--token", default=os.environ.get("OPENCORPORATES_API_TOKEN")) + p.add_argument("--out", required=True) + a = p.parse_args() + n = fetch( + query=a.query, + jurisdiction=a.jurisdiction, + token=a.token, + limit=a.limit, + out_path=a.out, + ) + print(f"Wrote {n} OpenCorporates rows to {a.out}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/optional-skills/research/osint-investigation/scripts/fetch_sec_edgar.py b/optional-skills/research/osint-investigation/scripts/fetch_sec_edgar.py new file mode 100644 index 00000000000..bd2fda8feb9 --- /dev/null +++ b/optional-skills/research/osint-investigation/scripts/fetch_sec_edgar.py @@ -0,0 +1,184 @@ +#!/usr/bin/env python3 +"""Fetch SEC EDGAR filings index for a given CIK or company name. + +SEC requires a User-Agent header with contact info. Set SEC_USER_AGENT, +e.g. SEC_USER_AGENT="Research example@example.com". + +Filings JSON is published at: + https://data.sec.gov/submissions/CIK<10-digit-padded>.json + +Company lookup uses: + https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&company=<name>&output=atom +""" +from __future__ import annotations + +import argparse +import csv +import os +import re +import sys +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).parent)) +from _http import get, get_json # noqa: E402 + +SUBMISSIONS_URL = "https://data.sec.gov/submissions/CIK{cik}.json" +COLUMNS = [ + "cik", + "company_name", + "form_type", + "filing_date", + "accession_number", + "primary_document", + "filing_url", + "reporting_period", +] + + +def _ua() -> str: + ua = os.environ.get("SEC_USER_AGENT", "").strip() + if not ua: + raise SystemExit( + "SEC requires a User-Agent with contact info. " + "Set SEC_USER_AGENT='Your Name your@email'." + ) + return ua + + +def _resolve_cik(company: str) -> tuple[str, str]: + """Resolve a company name to a CIK via EDGAR's atom feed. + + Returns (cik, resolved_company_name). The feed entries also reveal whether + the match is an individual filer (Form 3/4/5 only) — surfaced in the + return value so callers can warn. + """ + url = "https://www.sec.gov/cgi-bin/browse-edgar" + params = {"action": "getcompany", "company": company, "output": "atom", "owner": "include"} + body = get(url, params=params, user_agent=_ua()).decode("utf-8", errors="replace") + m = re.search(r"CIK=(\d{10})", body) + if not m: + raise SystemExit(f"Could not resolve CIK for company={company!r}") + cik = m.group(1) + name_m = re.search(r"<title>([^<]+)\s*\((\d{10})\)", body) + resolved = name_m.group(1).strip() if name_m else "" + return cik, resolved + + +def fetch( + cik: str | None, + company: str | None, + types: list[str], + since: str | None, + out_path: str, +) -> int: + resolved_name = "" + if not cik and company: + try: + cik, resolved_name = _resolve_cik(company) # type: ignore[assignment] + except SystemExit as e: + # Write empty CSV with header so downstream tools still work, + # and tell the user clearly. + print(f"SEC EDGAR: {e}", file=sys.stderr) + Path(out_path).parent.mkdir(parents=True, exist_ok=True) + with open(out_path, "w", newline="", encoding="utf-8") as fh: + csv.DictWriter(fh, fieldnames=COLUMNS).writeheader() + return 0 + if resolved_name: + print( + f"Resolved company={company!r} → CIK {cik} ({resolved_name})", + file=sys.stderr, + ) + if not cik: + raise SystemExit("must supply --cik or --company") + cik = cik.zfill(10) + url = SUBMISSIONS_URL.format(cik=cik) + payload = get_json(url, user_agent=_ua()) + if not isinstance(payload, dict): + raise SystemExit(f"Unexpected EDGAR response shape for CIK {cik}") + name = payload.get("name", "") + recent = (payload.get("filings", {}) or {}).get("recent", {}) or {} + form = recent.get("form", []) + date = recent.get("filingDate", []) + accession = recent.get("accessionNumber", []) + primary_doc = recent.get("primaryDocument", []) + period = recent.get("reportDate", []) + + # Histogram of available filing types — useful for surfacing why a filter + # returned 0 (e.g. user asked for 10-K on an individual Form 4 filer). + type_hist: dict[str, int] = {} + for ftype in form: + type_hist[ftype] = type_hist.get(ftype, 0) + 1 + + type_set = {t.strip().upper() for t in types} if types else None + rows: list[dict[str, str]] = [] + for i, ftype in enumerate(form): + if type_set and ftype.upper() not in type_set: + continue + fdate = date[i] if i < len(date) else "" + if since and fdate and fdate < since: + continue + acc = accession[i] if i < len(accession) else "" + pdoc = primary_doc[i] if i < len(primary_doc) else "" + acc_nodash = acc.replace("-", "") + filing_url = ( + f"https://www.sec.gov/Archives/edgar/data/{int(cik)}/{acc_nodash}/{pdoc}" + if acc and pdoc + else "" + ) + rows.append( + { + "cik": cik, + "company_name": name, + "form_type": ftype, + "filing_date": fdate, + "accession_number": acc, + "primary_document": pdoc, + "filing_url": filing_url, + "reporting_period": period[i] if i < len(period) else "", + } + ) + + Path(out_path).parent.mkdir(parents=True, exist_ok=True) + with open(out_path, "w", newline="", encoding="utf-8") as fh: + w = csv.DictWriter(fh, fieldnames=COLUMNS) + w.writeheader() + w.writerows(rows) + + if not rows and type_hist: + top = sorted(type_hist.items(), key=lambda kv: -kv[1])[:8] + hist_str = ", ".join(f"{t}={n}" for t, n in top) + print( + f"Warning: SEC EDGAR CIK {cik} ({name}) has {sum(type_hist.values())} " + f"recent filings but NONE match types={types}. " + f"Available form types: {hist_str}.", + file=sys.stderr, + ) + # Insider-filer heuristic: only Form 3/4/5 → individual person, not a company. + company_types = {"10-K", "10-Q", "8-K", "20-F", "DEF 14A", "S-1"} + if not (set(type_hist.keys()) & company_types): + print( + f"Note: CIK {cik} appears to be an INDIVIDUAL filer " + f"(insider Form 3/4/5 only), not a corporate registrant. " + f"The resolver may have matched an officer/director named " + f"{company!r} rather than a company.", + file=sys.stderr, + ) + return len(rows) + + +def main() -> int: + p = argparse.ArgumentParser(description=__doc__) + p.add_argument("--cik", help="Central Index Key (will be 10-digit zero-padded)") + p.add_argument("--company", help="Resolve to CIK by company name") + p.add_argument("--types", default="", help="Comma-separated form types (e.g. 10-K,10-Q,8-K)") + p.add_argument("--since", help="Skip filings before YYYY-MM-DD") + p.add_argument("--out", required=True) + a = p.parse_args() + types = [t for t in (a.types or "").split(",") if t.strip()] + n = fetch(cik=a.cik, company=a.company, types=types, since=a.since, out_path=a.out) + print(f"Wrote {n} EDGAR filing rows to {a.out}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/optional-skills/research/osint-investigation/scripts/fetch_senate_ld.py b/optional-skills/research/osint-investigation/scripts/fetch_senate_ld.py new file mode 100644 index 00000000000..3119ff8a9a5 --- /dev/null +++ b/optional-skills/research/osint-investigation/scripts/fetch_senate_ld.py @@ -0,0 +1,146 @@ +#!/usr/bin/env python3 +"""Fetch Senate Lobbying Disclosure (LD-1 / LD-2) filings. + +Anonymous: 120 req/hour. Token (SENATE_LDA_TOKEN): 1200 req/hour. +""" +from __future__ import annotations + +import argparse +import csv +import os +import sys +import time +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).parent)) +from _http import get_json # noqa: E402 + +ENDPOINT = "https://lda.senate.gov/api/v1/filings/" +COLUMNS = [ + "filing_uuid", + "filing_type", + "filing_year", + "filing_period", + "registrant_name", + "registrant_id", + "client_name", + "client_id", + "client_general_description", + "income", + "expenses", + "lobbyists", + "issues", + "government_entities", + "filing_date", +] + + +def fetch( + client: str | None, + registrant: str | None, + year: int, + token: str | None, + out_path: str, + page_size: int = 100, + max_pages: int = 25, +) -> int: + params: dict = {"filing_year": year, "page_size": page_size} + if client: + params["client_name"] = client + if registrant: + params["registrant_name"] = registrant + + headers = {"Authorization": f"Token {token}"} if token else None + rows: list[dict[str, str]] = [] + url = ENDPOINT + page = 0 + while page < max_pages: + try: + payload = get_json(url, params=params if page == 0 else None, headers=headers) + except Exception as e: # noqa: BLE001 + print(f"Senate LDA error on page {page + 1}: {e}", file=sys.stderr) + break + if not isinstance(payload, dict): + break + results = payload.get("results", []) + for r in results: + client_obj = r.get("client") or {} + registrant_obj = r.get("registrant") or {} + lobbying_activities = r.get("lobbying_activities") or [] + lobbyists = [] + issues = [] + entities = [] + for la in lobbying_activities: + for lob in la.get("lobbyists") or []: + lob_obj = lob.get("lobbyist") or {} + name = " ".join( + x for x in (lob_obj.get("first_name", ""), lob_obj.get("last_name", "")) if x + ) + if name: + lobbyists.append(name) + desc = la.get("description") or "" + if desc: + issues.append(desc) + for ge in la.get("government_entities") or []: + nm = ge.get("name") or "" + if nm: + entities.append(nm) + rows.append( + { + "filing_uuid": r.get("filing_uuid", "") or "", + "filing_type": r.get("filing_type", "") or "", + "filing_year": str(r.get("filing_year", "") or year), + "filing_period": r.get("filing_period", "") or "", + "registrant_name": registrant_obj.get("name", "") or "", + "registrant_id": str(registrant_obj.get("id", "") or ""), + "client_name": client_obj.get("name", "") or "", + "client_id": str(client_obj.get("id", "") or ""), + "client_general_description": client_obj.get("general_description", "") or "", + "income": str(r.get("income", "") or ""), + "expenses": str(r.get("expenses", "") or ""), + "lobbyists": "; ".join(sorted(set(lobbyists))), + "issues": "; ".join(issues), + "government_entities": "; ".join(sorted(set(entities))), + "filing_date": (r.get("dt_posted") or "")[:10], + } + ) + next_url = payload.get("next") + if not next_url: + break + url = next_url + page += 1 + time.sleep(1.0 if not token else 0.3) + + Path(out_path).parent.mkdir(parents=True, exist_ok=True) + with open(out_path, "w", newline="", encoding="utf-8") as fh: + w = csv.DictWriter(fh, fieldnames=COLUMNS) + w.writeheader() + w.writerows(rows) + return len(rows) + + +def main() -> int: + p = argparse.ArgumentParser(description=__doc__) + p.add_argument("--client", help="Client name filter") + p.add_argument("--registrant", help="Registrant (lobbying firm) name filter") + p.add_argument("--year", type=int, default=2024) + p.add_argument("--token", default=os.environ.get("SENATE_LDA_TOKEN")) + p.add_argument("--max-pages", type=int, default=25) + p.add_argument("--out", required=True) + a = p.parse_args() + if not (a.client or a.registrant): + p.error("must supply at least one of --client / --registrant") + n = fetch( + client=a.client, + registrant=a.registrant, + year=a.year, + token=a.token, + out_path=a.out, + max_pages=a.max_pages, + ) + print(f"Wrote {n} Senate LDA rows to {a.out}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/optional-skills/research/osint-investigation/scripts/fetch_usaspending.py b/optional-skills/research/osint-investigation/scripts/fetch_usaspending.py new file mode 100644 index 00000000000..a59c5f17276 --- /dev/null +++ b/optional-skills/research/osint-investigation/scripts/fetch_usaspending.py @@ -0,0 +1,170 @@ +#!/usr/bin/env python3 +"""Fetch federal contracts/awards from USAspending.gov API v2. + +No auth required. POST to /api/v2/search/spending_by_award/ with filters. +""" +from __future__ import annotations + +import argparse +import csv +import json +import sys +import time +import urllib.request +from pathlib import Path + +ENDPOINT = "https://api.usaspending.gov/api/v2/search/spending_by_award/" +COLUMNS = [ + "award_id", + "recipient_name", + "recipient_uei", + "recipient_duns", + "recipient_parent_name", + "recipient_state", + "awarding_agency", + "awarding_sub_agency", + "award_type", + "award_amount", + "award_date", + "period_of_performance_start", + "period_of_performance_end", + "naics_code", + "psc_code", + "competition_extent", + "description", +] + +# USAspending result column "code" → human label mapping for output. +_FIELDS = [ + "Award ID", + "Recipient Name", + "Recipient UEI", + "Recipient DUNS Number", + "Recipient Parent Name", + "Recipient State Code", + "Awarding Agency", + "Awarding Sub Agency", + "Award Type", + "Award Amount", + "Start Date", + "End Date", + "NAICS Code", + "PSC Code", + "Type of Set Aside", + "Description", +] + + +def _post(body: dict) -> dict: + req = urllib.request.Request( + ENDPOINT, + data=json.dumps(body).encode("utf-8"), + headers={"Content-Type": "application/json", "User-Agent": "hermes-agent osint-investigation"}, + method="POST", + ) + with urllib.request.urlopen(req, timeout=60) as resp: + return json.loads(resp.read().decode("utf-8")) + + +def fetch( + recipient: str | None, + agency: str | None, + fy: int, + sole_source_only: bool, + out_path: str, + page_size: int = 100, + max_pages: int = 20, +) -> int: + filters: dict = { + "time_period": [{"start_date": f"{fy - 1}-10-01", "end_date": f"{fy}-09-30"}], + # Contracts only by default; adjust award_type_codes for grants/loans. + "award_type_codes": ["A", "B", "C", "D"], + } + if recipient: + filters["recipient_search_text"] = [recipient] + if agency: + filters["agencies"] = [{"type": "awarding", "tier": "toptier", "name": agency}] + + rows: list[dict[str, str]] = [] + page = 1 + while page <= max_pages: + body = { + "filters": filters, + "fields": _FIELDS, + "page": page, + "limit": page_size, + "sort": "Award Amount", + "order": "desc", + } + try: + payload = _post(body) + except Exception as e: # noqa: BLE001 + print(f"USAspending error on page {page}: {e}", file=sys.stderr) + break + results = payload.get("results", []) + if not results: + break + for r in results: + set_aside = r.get("Type of Set Aside", "") or "" + if sole_source_only and "sole" not in set_aside.lower(): + continue + rows.append( + { + "award_id": r.get("Award ID", "") or "", + "recipient_name": r.get("Recipient Name", "") or "", + "recipient_uei": r.get("Recipient UEI", "") or "", + "recipient_duns": r.get("Recipient DUNS Number", "") or "", + "recipient_parent_name": r.get("Recipient Parent Name", "") or "", + "recipient_state": r.get("Recipient State Code", "") or "", + "awarding_agency": r.get("Awarding Agency", "") or "", + "awarding_sub_agency": r.get("Awarding Sub Agency", "") or "", + "award_type": r.get("Award Type", "") or "", + "award_amount": str(r.get("Award Amount", "") or ""), + "award_date": r.get("Start Date", "") or "", + "period_of_performance_start": r.get("Start Date", "") or "", + "period_of_performance_end": r.get("End Date", "") or "", + "naics_code": str(r.get("NAICS Code", "") or ""), + "psc_code": str(r.get("PSC Code", "") or ""), + "competition_extent": set_aside, + "description": r.get("Description", "") or "", + } + ) + meta = payload.get("page_metadata", {}) + if not meta.get("hasNext"): + break + page += 1 + time.sleep(0.5) + + Path(out_path).parent.mkdir(parents=True, exist_ok=True) + with open(out_path, "w", newline="", encoding="utf-8") as fh: + w = csv.DictWriter(fh, fieldnames=COLUMNS) + w.writeheader() + w.writerows(rows) + return len(rows) + + +def main() -> int: + p = argparse.ArgumentParser(description=__doc__) + p.add_argument("--recipient", help="Recipient name search") + p.add_argument("--agency", help="Awarding agency (top-tier)") + p.add_argument("--fy", type=int, default=2024, help="Federal fiscal year") + p.add_argument("--sole-source-only", action="store_true") + p.add_argument("--max-pages", type=int, default=20) + p.add_argument("--out", required=True) + a = p.parse_args() + if not (a.recipient or a.agency): + p.error("must supply at least one of --recipient / --agency") + n = fetch( + recipient=a.recipient, + agency=a.agency, + fy=a.fy, + sole_source_only=a.sole_source_only, + out_path=a.out, + max_pages=a.max_pages, + ) + print(f"Wrote {n} USAspending rows to {a.out}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/optional-skills/research/osint-investigation/scripts/fetch_wayback.py b/optional-skills/research/osint-investigation/scripts/fetch_wayback.py new file mode 100644 index 00000000000..fb9147f22c2 --- /dev/null +++ b/optional-skills/research/osint-investigation/scripts/fetch_wayback.py @@ -0,0 +1,142 @@ +#!/usr/bin/env python3 +"""Search the Internet Archive Wayback Machine via the CDX server. + +The CDX API indexes ~900B+ archived web pages. Anonymous read access, +no auth required. Useful for finding deleted / changed pages by URL, +domain, or substring match. +""" +from __future__ import annotations + +import argparse +import csv +import sys +import urllib.parse +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).parent)) +from _http import get_json # noqa: E402 + +BASE = "https://web.archive.org/cdx/search/cdx" + +COLUMNS = [ + "url", + "timestamp", + "wayback_url", + "mimetype", + "status", + "digest", + "length", +] + + +def fetch( + url_or_host: str, + match_type: str, + from_date: str | None, + to_date: str | None, + status: str | None, + mime: str | None, + collapse: str | None, + limit: int, + out_path: str, +) -> int: + params: dict[str, str] = { + "url": url_or_host, + "matchType": match_type, + "output": "json", + "limit": str(limit), + } + if from_date: + params["from"] = from_date.replace("-", "") + if to_date: + params["to"] = to_date.replace("-", "") + if status: + params["filter"] = f"statuscode:{status}" + if mime: + params.setdefault("filter", "") + # Multiple filters: CDX accepts repeated filter params via urlencode list + params["filter"] = f"mimetype:{mime}" + if collapse: + params["collapse"] = collapse + + url = f"{BASE}?{urllib.parse.urlencode(params)}" + try: + payload = get_json(url) + except Exception as e: # noqa: BLE001 + print(f"Wayback CDX error: {e}", file=sys.stderr) + payload = [] + + rows: list[dict[str, str]] = [] + if isinstance(payload, list) and len(payload) > 1: + header = payload[0] + idx = {h: i for i, h in enumerate(header)} + for entry in payload[1:]: + ts = entry[idx["timestamp"]] if "timestamp" in idx else "" + orig = entry[idx["original"]] if "original" in idx else "" + rows.append( + { + "url": orig, + "timestamp": ts, + "wayback_url": f"https://web.archive.org/web/{ts}/{orig}" if ts and orig else "", + "mimetype": entry[idx["mimetype"]] if "mimetype" in idx else "", + "status": entry[idx["statuscode"]] if "statuscode" in idx else "", + "digest": entry[idx["digest"]] if "digest" in idx else "", + "length": entry[idx["length"]] if "length" in idx else "", + } + ) + + Path(out_path).parent.mkdir(parents=True, exist_ok=True) + with open(out_path, "w", newline="", encoding="utf-8") as fh: + w = csv.DictWriter(fh, fieldnames=COLUMNS) + w.writeheader() + w.writerows(rows) + if not rows: + print( + f"Wayback Machine: 0 captures for {url_or_host!r} matchType={match_type}.", + file=sys.stderr, + ) + return len(rows) + + +def main() -> int: + p = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) + p.add_argument("--url", required=True, help="URL or host to look up in the archive") + p.add_argument( + "--match", + default="exact", + choices=["exact", "prefix", "host", "domain"], + help=( + "exact: this URL only. " + "prefix: this URL's path-prefix. " + "host: any URL on this host. " + "domain: any URL on this domain or subdomains." + ), + ) + p.add_argument("--from-date", help="Earliest capture YYYY-MM-DD") + p.add_argument("--to-date", help="Latest capture YYYY-MM-DD") + p.add_argument("--status", help="HTTP status filter (e.g. 200)") + p.add_argument("--mime", help="MIME type filter (e.g. text/html)") + p.add_argument( + "--collapse", + help="Collapse adjacent identical entries (e.g. 'digest' for unique-content captures)", + ) + p.add_argument("--limit", type=int, default=200) + p.add_argument("--out", required=True) + a = p.parse_args() + n = fetch( + url_or_host=a.url, + match_type=a.match, + from_date=a.from_date, + to_date=a.to_date, + status=a.status, + mime=a.mime, + collapse=a.collapse, + limit=a.limit, + out_path=a.out, + ) + print(f"Wrote {n} Wayback capture rows to {a.out}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/optional-skills/research/osint-investigation/scripts/fetch_wikipedia.py b/optional-skills/research/osint-investigation/scripts/fetch_wikipedia.py new file mode 100644 index 00000000000..4ce5c93813c --- /dev/null +++ b/optional-skills/research/osint-investigation/scripts/fetch_wikipedia.py @@ -0,0 +1,267 @@ +#!/usr/bin/env python3 +"""Search Wikipedia + Wikidata for an entity (person, company, place, concept). + +Two free APIs: + - Wikipedia OpenSearch + REST summary endpoint for narrative bio + - Wikidata SPARQL endpoint for structured facts (birth, employer, awards, etc.) + +Both are anonymous-access. Useful for resolving who-is-this-entity questions +and surfacing cross-references that other sources can join against. +""" +from __future__ import annotations + +import argparse +import csv +import json +import re +import sys +import urllib.parse +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).parent)) +from _http import get_json # noqa: E402 + +WP_OPENSEARCH = "https://en.wikipedia.org/w/api.php" +WP_SUMMARY = "https://en.wikipedia.org/api/rest_v1/page/summary/" +WD_ACTION = "https://www.wikidata.org/w/api.php" + +COLUMNS = [ + "source", + "label", + "description", + "qid", + "wikipedia_title", + "wikipedia_url", + "wikidata_url", + "instance_of", + "country", + "occupation", + "employer", + "date_of_birth", + "place_of_birth", + "summary", +] + + +def _wp_search(query: str, limit: int) -> list[dict]: + params = { + "action": "opensearch", + "search": query, + "limit": str(min(limit, 20)), + "format": "json", + } + url = f"{WP_OPENSEARCH}?{urllib.parse.urlencode(params)}" + data = get_json(url) + if not isinstance(data, list) or len(data) < 4: + return [] + titles, descs, urls = data[1], data[2], data[3] + out = [] + for i, title in enumerate(titles): + out.append( + { + "title": title, + "description": descs[i] if i < len(descs) else "", + "url": urls[i] if i < len(urls) else "", + } + ) + return out + + +def _wp_summary(title: str) -> dict: + """Pull the REST summary for a title — short bio, image, type.""" + url = f"{WP_SUMMARY}{urllib.parse.quote(title.replace(' ', '_'))}" + try: + return get_json(url) # type: ignore[return-value] + except Exception as e: # noqa: BLE001 + print(f"Wikipedia summary lookup for {title!r} failed: {e}", file=sys.stderr) + return {} + + +def _wd_lookup_by_qid(qid: str) -> dict: + """Pull common facts for a QID via Wikidata's Action API (no SPARQL). + + The Action API is far more lenient on rate-limits than the SPARQL Query + Service. We get claims as QIDs and then resolve labels in one batch call. + """ + # Properties of interest. The Action API returns claims as QIDs or + # typed literals, so the slot mapping is local-only. + interesting = { + "P31": "instance_of", + "P17": "country", # for orgs / places + "P27": "country", # for individuals (country of citizenship) + "P106": "occupation", + "P108": "employer", + "P569": "date_of_birth", + "P19": "place_of_birth", + } + params = { + "action": "wbgetentities", + "ids": qid, + "props": "claims", + "format": "json", + } + url = f"{WD_ACTION}?{urllib.parse.urlencode(params)}" + try: + data = get_json(url) + except Exception as e: # noqa: BLE001 + print(f"Wikidata wbgetentities for {qid} failed: {e}", file=sys.stderr) + return {} + if not isinstance(data, dict): + return {} + claims = (data.get("entities", {}).get(qid, {}) or {}).get("claims", {}) or {} + + # Collect raw values (QIDs or literals) and remember which slot each + # came from. Date literals come back as ISO strings; QIDs need a label + # resolution pass. + qid_to_slots: dict[str, list[str]] = {} + facts: dict[str, list[str]] = {} + for prop_id, slot in interesting.items(): + for claim in claims.get(prop_id, []) or []: + v = (claim.get("mainsnak", {}) or {}).get("datavalue", {}) or {} + vtype = v.get("type") + value = v.get("value") + if vtype == "wikibase-entityid" and isinstance(value, dict): + vqid = value.get("id", "") + if vqid: + qid_to_slots.setdefault(vqid, []) + if slot not in qid_to_slots[vqid]: + qid_to_slots[vqid].append(slot) + elif vtype == "time" and isinstance(value, dict): + raw = value.get("time", "") or "" + # +1955-10-28T00:00:00Z → 1955-10-28 + m = re.search(r"[+-]?(\d{4})-(\d{2})-(\d{2})", raw) + if m: + facts.setdefault(slot, []).append( + f"{m.group(1)}-{m.group(2)}-{m.group(3)}" + ) + elif vtype == "string": + facts.setdefault(slot, []).append(str(value)) + + # Resolve labels for all referenced QIDs in one batch (up to 50 at a time). + qids = list(qid_to_slots) + for i in range(0, len(qids), 50): + batch = qids[i : i + 50] + params = { + "action": "wbgetentities", + "ids": "|".join(batch), + "props": "labels", + "languages": "en", + "format": "json", + } + url = f"{WD_ACTION}?{urllib.parse.urlencode(params)}" + try: + data = get_json(url) + except Exception as e: # noqa: BLE001 + print(f"Wikidata label batch failed: {e}", file=sys.stderr) + continue + if not isinstance(data, dict): + continue + ents = data.get("entities", {}) or {} + for vqid, ent in ents.items(): + label = (ent.get("labels", {}).get("en", {}) or {}).get("value", "") or vqid + for slot in qid_to_slots.get(vqid, []): + facts.setdefault(slot, []).append(label) + + # Deduplicate per slot, preserving order. + deduped: dict[str, list[str]] = {} + for slot, vals in facts.items(): + seen = set() + out = [] + for v in vals: + if v in seen: + continue + seen.add(v) + out.append(v) + deduped[slot] = out + return deduped + + +def _wd_qid_for_title(title: str) -> str: + """Get the Wikidata QID associated with a Wikipedia article title.""" + params = { + "action": "query", + "format": "json", + "prop": "pageprops", + "ppprop": "wikibase_item", + "titles": title, + "redirects": 1, + } + url = f"{WP_OPENSEARCH}?{urllib.parse.urlencode(params)}" + try: + data = get_json(url) + except Exception: # noqa: BLE001 + return "" + if not isinstance(data, dict): + return "" + pages = data.get("query", {}).get("pages", {}) or {} + for page in pages.values(): + qid = (page.get("pageprops") or {}).get("wikibase_item", "") + if qid: + return qid + return "" + + +def fetch(query: str, limit: int, no_wikidata: bool, out_path: str) -> int: + hits = _wp_search(query, limit) + rows: list[dict[str, str]] = [] + for hit in hits[:limit]: + title = hit.get("title", "") + if not title: + continue + summary = _wp_summary(title) + qid = _wd_qid_for_title(title) if not no_wikidata else "" + facts: dict = {} + if qid: + facts = _wd_lookup_by_qid(qid) + rows.append( + { + "source": "wikipedia+wikidata" if qid else "wikipedia", + "label": title, + "description": (summary.get("description") or hit.get("description") or "").strip(), + "qid": qid, + "wikipedia_title": title, + "wikipedia_url": hit.get("url", ""), + "wikidata_url": f"https://www.wikidata.org/wiki/{qid}" if qid else "", + "instance_of": "; ".join(facts.get("instance_of", [])), + "country": "; ".join(facts.get("country", [])), + "occupation": "; ".join(facts.get("occupation", [])), + "employer": "; ".join(facts.get("employer", [])), + "date_of_birth": "; ".join(facts.get("date_of_birth", []))[:10] if facts.get("date_of_birth") else "", + "place_of_birth": "; ".join(facts.get("place_of_birth", [])), + "summary": (summary.get("extract") or "").replace("\n", " ")[:1000], + } + ) + + Path(out_path).parent.mkdir(parents=True, exist_ok=True) + with open(out_path, "w", newline="", encoding="utf-8") as fh: + w = csv.DictWriter(fh, fieldnames=COLUMNS) + w.writeheader() + w.writerows(rows) + if not rows: + print( + f"Wikipedia: 0 articles for query={query!r}. " + "Private individuals not notable enough for a Wikipedia article " + "won't appear here (the bar is real).", + file=sys.stderr, + ) + return len(rows) + + +def main() -> int: + p = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) + p.add_argument("--query", required=True, help="Entity name (person, company, place, concept)") + p.add_argument("--limit", type=int, default=5) + p.add_argument( + "--no-wikidata", + action="store_true", + help="Skip the Wikidata SPARQL enrichment (faster, less detail)", + ) + p.add_argument("--out", required=True) + a = p.parse_args() + n = fetch(query=a.query, limit=a.limit, no_wikidata=a.no_wikidata, out_path=a.out) + print(f"Wrote {n} Wikipedia/Wikidata rows to {a.out}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/optional-skills/research/osint-investigation/scripts/timing_analysis.py b/optional-skills/research/osint-investigation/scripts/timing_analysis.py new file mode 100644 index 00000000000..4e0ece227b4 --- /dev/null +++ b/optional-skills/research/osint-investigation/scripts/timing_analysis.py @@ -0,0 +1,253 @@ +#!/usr/bin/env python3 +"""Permutation test for donation/contract timing correlation (stdlib-only). + +For each (donor, vendor) pair, compute the mean number of days between each +donation and the nearest contract award. Then shuffle contract award dates +N times within the observation window and compute the same statistic. The +one-tailed p-value is the fraction of permutations whose mean is <= the +observed mean (smaller distance = tighter clustering). + +Adapted from ShinMegamiBoson/OpenPlanter (MIT). Differences: + - Pure stdlib (no pandas / numpy) + - Domain-agnostic (no snow-vendor / CRITICAL-politician filter) + - Configurable column names via flags + - Optional --seed for reproducibility +""" +from __future__ import annotations + +import argparse +import csv +import datetime as dt +import json +import math +import random +import statistics +from collections import defaultdict +from pathlib import Path + +_DATE_FORMATS = ("%Y-%m-%d", "%m/%d/%Y", "%Y/%m/%d", "%m-%d-%Y", "%Y%m%d") + + +def parse_date(raw: str) -> dt.date | None: + if not raw: + return None + raw = raw.strip() + for fmt in _DATE_FORMATS: + try: + return dt.datetime.strptime(raw, fmt).date() + except ValueError: + continue + return None + + +def _read(path: str) -> list[dict[str, str]]: + with open(path, newline="", encoding="utf-8") as fh: + return list(csv.DictReader(fh)) + + +def _nearest_distance(donation_date: dt.date, awards: list[dt.date]) -> int: + """Absolute days to nearest award date.""" + return min(abs((donation_date - a).days) for a in awards) + + +def _permute( + awards_count: int, + donations: list[dt.date], + date_min: dt.date, + date_max: dt.date, + rng: random.Random, +) -> float: + """One permutation: draw uniform random award dates, compute mean nearest-distance.""" + span_days = (date_max - date_min).days or 1 + rand_awards = [ + date_min + dt.timedelta(days=rng.randint(0, span_days)) + for _ in range(awards_count) + ] + distances = [_nearest_distance(d, rand_awards) for d in donations] + return statistics.mean(distances) + + +def analyze( + donations_path: str, + donation_date_col: str, + donation_amount_col: str, + donation_donor_col: str, + donation_recipient_col: str, + contracts_path: str, + contract_date_col: str, + contract_vendor_col: str, + cross_links_path: str | None, + n_permutations: int = 1000, + min_donations: int = 3, + p_threshold: float = 0.05, + seed: int | None = None, + out_path: str = "timing.json", +) -> dict: + rng = random.Random(seed) + + donations = _read(donations_path) + contracts = _read(contracts_path) + + # Allow optional join through cross_links — donor (left) ↔ vendor (right). + # When present, donor strings get mapped to matched vendor names so the + # vendor-date index lookup actually finds the contracts. + matched_pairs: set[tuple[str, str]] | None = None + donor_to_vendors: dict[str, set[str]] = defaultdict(set) + if cross_links_path: + matched_pairs = set() + for row in _read(cross_links_path): + left = row.get("left_name", "") + right = row.get("right_name", "") + matched_pairs.add((left, right)) + donor_to_vendors[left].add(right) + + # Index contract dates by vendor name. + vendor_to_award_dates: dict[str, list[dt.date]] = defaultdict(list) + all_award_dates: list[dt.date] = [] + for row in contracts: + d = parse_date(row.get(contract_date_col, "")) + if not d: + continue + vendor_to_award_dates[row.get(contract_vendor_col, "").strip()].append(d) + all_award_dates.append(d) + + if not all_award_dates: + raise SystemExit(f"No parseable dates in {contracts_path}/{contract_date_col}") + global_min = min(all_award_dates) + global_max = max(all_award_dates) + + # Group donations by (donor, recipient). + grouped: dict[tuple[str, str], list[tuple[dt.date, float]]] = defaultdict(list) + for row in donations: + donor = row.get(donation_donor_col, "").strip() + recip = row.get(donation_recipient_col, "").strip() + d = parse_date(row.get(donation_date_col, "")) + try: + amt = float(row.get(donation_amount_col, "0") or 0) + except ValueError: + amt = 0.0 + if not (donor and recip and d): + continue + grouped[(donor, recip)].append((d, amt)) + + results = [] + skipped = 0 + for (donor, recip), records in grouped.items(): + if len(records) < min_donations: + skipped += 1 + continue + # Only test if donor appears in cross-links (when provided). The + # (donor, candidate) tuple itself is NOT what's in matched_pairs — + # cross_links pairs are (donor, vendor). We use the cross-link to + # map donor → vendor name(s) so the vendor-date index resolves. + if matched_pairs is not None and donor not in donor_to_vendors: + skipped += 1 + continue + # Try direct donor→awards first, then go through cross-link vendor names. + award_dates = list(vendor_to_award_dates.get(donor, [])) + if not award_dates: + award_dates = list(vendor_to_award_dates.get(recip, [])) + if not award_dates and donor_to_vendors.get(donor): + for vendor_name in donor_to_vendors[donor]: + award_dates.extend(vendor_to_award_dates.get(vendor_name, [])) + if not award_dates: + skipped += 1 + continue + + donation_dates = [d for (d, _) in records] + observed = statistics.mean( + _nearest_distance(d, award_dates) for d in donation_dates + ) + + permuted_means = [ + _permute(len(award_dates), donation_dates, global_min, global_max, rng) + for _ in range(n_permutations) + ] + p_value = sum(1 for m in permuted_means if m <= observed) / n_permutations + null_mean = statistics.mean(permuted_means) + null_std = statistics.pstdev(permuted_means) or 1.0 + effect_size = (null_mean - observed) / null_std + + results.append( + { + "donor": donor, + "recipient": recip, + "n_donations": len(records), + "n_award_dates": len(award_dates), + "observed_mean_days": round(observed, 2), + "null_mean_days": round(null_mean, 2), + "p_value": round(p_value, 4), + "effect_size_sd": round(effect_size, 2), + "significant": p_value < p_threshold, + "total_donation_amount": round(sum(a for (_, a) in records), 2), + } + ) + + results.sort(key=lambda r: r["p_value"]) + + payload = { + "metadata": { + "n_permutations": n_permutations, + "min_donations": min_donations, + "p_threshold": p_threshold, + "seed": seed, + "n_pairs_tested": len(results), + "n_pairs_skipped": skipped, + "n_significant": sum(1 for r in results if r["significant"]), + "observation_window": [global_min.isoformat(), global_max.isoformat()], + }, + "results": results, + } + + Path(out_path).write_text(json.dumps(payload, indent=2)) + return payload + + +def main() -> int: + p = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) + p.add_argument("--donations", required=True) + p.add_argument("--donation-date-col", required=True) + p.add_argument("--donation-amount-col", required=True) + p.add_argument("--donation-donor-col", required=True) + p.add_argument("--donation-recipient-col", required=True) + p.add_argument("--contracts", required=True) + p.add_argument("--contract-date-col", required=True) + p.add_argument("--contract-vendor-col", required=True) + p.add_argument( + "--cross-links", + help="Optional cross_links.csv to restrict (donor, vendor) pairs", + ) + p.add_argument("--permutations", type=int, default=1000) + p.add_argument("--min-donations", type=int, default=3) + p.add_argument("--p-threshold", type=float, default=0.05) + p.add_argument("--seed", type=int) + p.add_argument("--out", default="timing.json") + a = p.parse_args() + + payload = analyze( + donations_path=a.donations, + donation_date_col=a.donation_date_col, + donation_amount_col=a.donation_amount_col, + donation_donor_col=a.donation_donor_col, + donation_recipient_col=a.donation_recipient_col, + contracts_path=a.contracts, + contract_date_col=a.contract_date_col, + contract_vendor_col=a.contract_vendor_col, + cross_links_path=a.cross_links, + n_permutations=a.permutations, + min_donations=a.min_donations, + p_threshold=a.p_threshold, + seed=a.seed, + out_path=a.out, + ) + meta = payload["metadata"] + print( + f"Tested {meta['n_pairs_tested']} pairs ({meta['n_pairs_skipped']} skipped). " + f"Significant (p<{meta['p_threshold']}): {meta['n_significant']}. " + f"Wrote {a.out}" + ) + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/optional-skills/research/osint-investigation/templates/source-template.md b/optional-skills/research/osint-investigation/templates/source-template.md new file mode 100644 index 00000000000..b023cc26888 --- /dev/null +++ b/optional-skills/research/osint-investigation/templates/source-template.md @@ -0,0 +1,59 @@ +# + +## 1. Summary + +What this data source is, who publishes it, why it matters for investigations. + +## 2. Access Methods + +- API endpoint(s) +- Bulk download URLs +- Auth requirements (none / API key / OAuth) +- Rate limits + +## 3. Data Schema + +Key fields, record types, table relationships. List the columns the fetch +script emits. + +## 4. Coverage + +- Jurisdiction +- Time range +- Update frequency +- Data volume (rows / GB) + +## 5. Cross-Reference Potential + +Which other sources can be joined and on what keys. Be explicit: + +- `` ↔ `` (join key: ) + +## 6. Data Quality + +Known issues — formatting inconsistencies, missing fields, duplicates, +historical gaps, redaction. + +## 7. Acquisition Script + +Path: `scripts/fetch_.py` + +Example: + +```bash +python3 SKILL_DIR/scripts/fetch_.py -- --out data/.csv +``` + +Output CSV columns: `, , ...` + +## 8. Legal & Licensing + +- Public records law / FOIA basis +- Terms of use / acceptable use +- Attribution requirements (if any) + +## 9. References + +- Official docs: +- Data dictionary: +- Related coverage / journalism: diff --git a/plugins/kanban/dashboard/dist/index.js b/plugins/kanban/dashboard/dist/index.js index 720cdb9e1e2..6f05df72bf6 100644 --- a/plugins/kanban/dashboard/dist/index.js +++ b/plugins/kanban/dashboard/dist/index.js @@ -68,7 +68,7 @@ const FALLBACK_COLUMN_HELP = { triage: "Raw ideas — a specifier will flesh out the spec", todo: "Waiting on dependencies or unassigned", - ready: "Assigned and waiting for a dispatcher tick", + ready: "Dependencies satisfied; assign a profile to dispatch", running: "Claimed by a worker — in-flight", blocked: "Worker asked for human input", done: "Completed", @@ -2048,6 +2048,7 @@ }; const progress = t.progress; + const needsAssignee = t.status === "ready" && !t.assignee; return h("div", { ref: cardRef, @@ -2118,6 +2119,13 @@ title: `${progress.done} of ${progress.total} child tasks done`, }, `${progress.done}/${progress.total}`) : null, + needsAssignee + ? h(Badge, { + variant: "outline", + className: "hermes-kanban-needs-assignee", + title: tx(i18n, "needsAssigneeHint", "Dependencies are satisfied, but the dispatcher skips this task until you assign a profile."), + }, tx(i18n, "needsAssignee", "Needs assignee")) + : null, ), h("div", { className: "hermes-kanban-card-title" }, t.title || tx(i18n, "untitled", "(untitled)")), @@ -2126,7 +2134,9 @@ ? h("span", { className: "hermes-kanban-assignee", title: `Assigned to Hermes profile @${t.assignee}` }, "@", t.assignee) : h("span", { className: "hermes-kanban-unassigned", - title: "No profile assigned. The dispatcher will pick one from available profiles when the task is Ready." }, + title: needsAssignee + ? tx(i18n, "needsAssigneeHint", "Dependencies are satisfied, but the dispatcher skips this task until you assign a profile.") + : "No profile assigned." }, tx(i18n, "unassigned", "unassigned")), t.comment_count > 0 ? h("span", { className: "hermes-kanban-count", diff --git a/plugins/kanban/dashboard/dist/style.css b/plugins/kanban/dashboard/dist/style.css index 3bcfccb289b..f3d66a88597 100644 --- a/plugins/kanban/dashboard/dist/style.css +++ b/plugins/kanban/dashboard/dist/style.css @@ -280,6 +280,14 @@ padding: 0.05rem 0.3rem !important; } +.hermes-kanban-needs-assignee { + font-size: 0.6rem !important; + padding: 0.05rem 0.3rem !important; + background: color-mix(in srgb, var(--color-warning, #d4b348) 16%, transparent); + border-color: color-mix(in srgb, var(--color-warning, #d4b348) 45%, var(--color-border)); + color: var(--color-foreground); +} + .hermes-kanban-assignee { font-weight: 500; color: color-mix(in srgb, var(--color-foreground) 80%, var(--color-muted-foreground)); diff --git a/pyproject.toml b/pyproject.toml index abc8940f586..e5553c2f3ff 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "hermes-agent" -version = "0.13.0" +version = "0.14.0" description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere" readme = "README.md" requires-python = ">=3.11" @@ -216,12 +216,11 @@ hermes-acp = "acp_adapter.entry:main" py-modules = ["run_agent", "model_tools", "toolsets", "batch_runner", "trajectory_compressor", "toolset_distributions", "cli", "hermes_bootstrap", "hermes_constants", "hermes_state", "hermes_time", "hermes_logging", "utils"] [tool.setuptools.package-data] -hermes_cli = ["web_dist/**/*", "tui_dist/**/*", "scripts/install.sh"] +hermes_cli = ["web_dist/**/*"] gateway = ["assets/**/*"] -acp_adapter = ["bootstrap/*.sh", "bootstrap/*.ps1"] [tool.setuptools.packages.find] -include = ["agent", "agent.*", "tools", "tools.*", "hermes_cli", "gateway", "gateway.*", "tui_gateway", "tui_gateway.*", "cron", "acp_adapter", "acp_adapter.*", "plugins", "plugins.*", "providers", "providers.*"] +include = ["agent", "agent.*", "tools", "tools.*", "hermes_cli", "gateway", "gateway.*", "tui_gateway", "tui_gateway.*", "cron", "acp_adapter", "plugins", "plugins.*", "providers", "providers.*"] [tool.pytest.ini_options] testpaths = ["tests"] diff --git a/run_agent.py b/run_agent.py index b3cde9eb1ea..1dd4219b22e 100644 --- a/run_agent.py +++ b/run_agent.py @@ -393,6 +393,19 @@ def _is_destructive_command(cmd: str) -> bool: return False +def _is_mcp_tool_parallel_safe(tool_name: str) -> bool: + """Check if an MCP tool comes from a server with parallel tool calls enabled. + + Lazy-imports from ``tools.mcp_tool`` to avoid circular dependencies. + Returns False if the MCP module is not available. + """ + try: + from tools.mcp_tool import is_mcp_tool_parallel_safe + return is_mcp_tool_parallel_safe(tool_name) + except Exception: + return False + + def _should_parallelize_tool_batch(tool_calls) -> bool: """Return True when a tool-call batch is safe to run concurrently.""" if len(tool_calls) <= 1: @@ -432,7 +445,9 @@ def _should_parallelize_tool_batch(tool_calls) -> bool: continue if tool_name not in _PARALLEL_SAFE_TOOLS: - return False + # Check if it's an MCP tool from a server that opted into parallel calls. + if not _is_mcp_tool_parallel_safe(tool_name): + return False return True @@ -3027,6 +3042,24 @@ class AIAgent: parts.append(f"{type(e).__name__}({msg})" if msg else type(e).__name__) return " <- ".join(parts) if parts else type(error).__name__ + def _is_provider_stream_parse_error(self, error: BaseException) -> bool: + """Return True for malformed provider streaming data from SDK parsers. + + Some Anthropic-compatible streaming providers can send a malformed + event-stream frame. The Anthropic SDK surfaces that as a plain + ``ValueError`` such as ``expected ident at line 1 column 149``. That + is provider wire-format trouble, not local request validation, so it + should follow the same retry path as a truncated JSON body. + """ + if getattr(self, "api_mode", None) != "anthropic_messages": + return False + if not isinstance(error, ValueError): + return False + if isinstance(error, (UnicodeEncodeError, json.JSONDecodeError)): + return False + message = str(error).strip().lower() + return "expected ident at line" in message + def _log_stream_retry( self, *, @@ -5080,6 +5113,12 @@ class AIAgent: """ raw = str(error) + if ( + isinstance(error, ValueError) + and "expected ident at line" in raw.lower() + ): + return f"Malformed provider streaming response: {raw[:300]}" + # Cloudflare / proxy HTML pages: grab the for a clean summary if "<!DOCTYPE" in raw or "<html" in raw: m = re.search(r"<title[^>]*>([^<]+)", raw, re.IGNORECASE) @@ -8528,6 +8567,7 @@ class AIAgent: _is_conn_err = isinstance( e, (_httpx.ConnectError, _httpx.RemoteProtocolError, ConnectionError) ) + _is_stream_parse_err = self._is_provider_stream_parse_error(e) # If the stream died AFTER some tokens were delivered: # normally we don't retry (the user already saw text, @@ -8567,7 +8607,10 @@ class AIAgent: for phrase in _SSE_PREVIEW_PHRASES ) _is_transient = ( - _is_timeout or _is_conn_err or _is_sse_conn_err_preview + _is_timeout + or _is_conn_err + or _is_sse_conn_err_preview + or _is_stream_parse_err ) _can_silent_retry = ( _partial_tool_in_flight @@ -8665,7 +8708,7 @@ class AIAgent: for phrase in _SSE_CONN_PHRASES ) - if _is_timeout or _is_conn_err or _is_sse_conn_err: + if _is_timeout or _is_conn_err or _is_sse_conn_err or _is_stream_parse_err: # Transient network / timeout error. Retry the # streaming request with a fresh connection first. if _stream_attempt < _max_stream_retries: @@ -8706,12 +8749,20 @@ class AIAgent: mid_tool_call=False, diag=request_client_holder.get("diag"), ) - self._emit_status( - "❌ Connection to provider failed after " - f"{_max_stream_retries + 1} attempts. " - "The provider may be experiencing issues — " - "try again in a moment." - ) + if _is_stream_parse_err: + self._emit_status( + "❌ Provider returned malformed streaming data after " + f"{_max_stream_retries + 1} attempts. " + "The provider may be experiencing issues — " + "try again in a moment." + ) + else: + self._emit_status( + "❌ Connection to provider failed after " + f"{_max_stream_retries + 1} attempts. " + "The provider may be experiencing issues — " + "try again in a moment." + ) else: _err_lower = str(e).lower() _is_stream_unsupported = ( @@ -14133,6 +14184,39 @@ class AIAgent: "interrupted": True, } + # Actionable hint for GitHub Models (Azure) 413 errors. + # The free tier enforces a hard 8K token cap per request, + # which Hermes' system prompt + tool schemas alone exceed. + # Compression can't help — the floor is the system prompt + # itself, not the conversation — so surface a clear "not + # compatible" message instead of looping into three futile + # compression attempts. + if ( + status_code == 413 + and isinstance(_base, str) + and "models.inference.ai.azure.com" in _base + ): + self._vprint( + f"{self.log_prefix} 💡 GitHub Models free tier (models.inference.ai.azure.com) caps every", + force=True, + ) + self._vprint( + f"{self.log_prefix} request at ~8K tokens. Hermes' system prompt + tool schemas baseline", + force=True, + ) + self._vprint( + f"{self.log_prefix} exceeds that floor, so this endpoint cannot run an agentic loop.", + force=True, + ) + self._vprint( + f"{self.log_prefix} Use the `copilot` provider with a Copilot subscription token (`hermes", + force=True, + ) + self._vprint( + f"{self.log_prefix} setup` → GitHub Copilot), or pick any other provider.", + force=True, + ) + # Check for 413 payload-too-large BEFORE generic 4xx handler. # A 413 is a payload-size error — the correct response is to # compress history and retry, not abort immediately. @@ -14509,11 +14593,16 @@ class AIAgent: # provider/network failure (malformed response body, # truncated stream, routing layer corruption), not a # local programming bug, and should be retried (#14782). + # Exclude Anthropic stream parser ValueErrors for the + # same reason: third-party Anthropic-compatible providers + # can emit malformed event-stream frames that SDK parsers + # raise as plain ValueError. is_local_validation_error = ( isinstance(api_error, (ValueError, TypeError)) and not isinstance( api_error, (UnicodeEncodeError, json.JSONDecodeError) ) + and not self._is_provider_stream_parse_error(api_error) # ssl.SSLError (and its subclass SSLCertVerificationError) # inherits from OSError *and* ValueError via Python MRO, # so the isinstance(ValueError) check above would diff --git a/scripts/release.py b/scripts/release.py index 6084e0754c0..5d4cb3eb82f 100755 --- a/scripts/release.py +++ b/scripts/release.py @@ -59,6 +59,8 @@ AUTHOR_MAP = { "m@mobrienv.dev": "mikeyobrien", "qiyin.zuo@pcitc.com": "qiyin-code", "mr.aashiz@gmail.com": "aashizpoudel", + "70629228+shaun0927@users.noreply.github.com": "shaun0927", + "98262967+Bihruze@users.noreply.github.com": "Bihruze", "nidhi2894@gmail.com": "nidhi-singh02", "30312689+aashizpoudel@users.noreply.github.com": "aashizpoudel", "oleksii.lisikh@gmail.com": "olisikh", @@ -91,6 +93,7 @@ AUTHOR_MAP = { "30397170+1000Delta@users.noreply.github.com": "1000Delta", "szymonclawd@mac.home": "szymonclawd", "257759490+szymonclawd@users.noreply.github.com": "szymonclawd", + "101180447+worlldz@users.noreply.github.com": "worlldz", "zhanganzhe@tenclass.com": "luoyuctl", "51604064+luoyuctl@users.noreply.github.com": "luoyuctl", "127238744+teknium1@users.noreply.github.com": "teknium1", @@ -1078,6 +1081,11 @@ AUTHOR_MAP = { "nidhi2894@gmail.com": "nidhi-singh02", # PR #2752 salvage (slack whitespace-only IndexError guard) "38173192+nidhi-singh02@users.noreply.github.com": "nidhi-singh02", "Jaaneek@users.noreply.github.com": "Jaaneek", # PR #26457 (xAI Grok OAuth provider) + # v0.14.0 additions + "chuang.guo@hopechart.com": "wuwuzhijing", # PR #21063 salvage (gateway docs mention Weixin) + "nightcityblade@gmail.com": "nightcityblade", # PR #24138 (docs voice/tts table) + "pol.kuijken@gmail.com": "polkn", # PR #6136 salvage (skill_view collision refusal) + "robin@soal.org": "rewbs", } diff --git a/tests/acp/test_server.py b/tests/acp/test_server.py index 511d6e00934..65dd6fd6b72 100644 --- a/tests/acp/test_server.py +++ b/tests/acp/test_server.py @@ -13,6 +13,7 @@ from acp.schema import ( AgentCapabilities, AgentMessageChunk, AgentPlanUpdate, + AgentThoughtChunk, AuthenticateResponse, AvailableCommandsUpdate, Implementation, @@ -467,25 +468,296 @@ class TestSessionOps: ) @pytest.mark.asyncio - async def test_load_session_schedules_history_replay_after_response(self, agent): - """Zed only attaches replayed updates after session/load has completed.""" + async def test_load_session_replays_reasoning_thought_before_message(self, agent): + """Thinking-model thoughts must be replayed via ``agent_thought_chunk``. + + Regression for #12285 — when a session is loaded, persisted assistant + ``reasoning_content`` / ``reasoning`` fields must surface as ACP + ``AgentThoughtChunk`` notifications in the same relative position they + had live (thought streams before the assistant message text), so Zed's + collapsed Thinking pane rebuilds instead of vanishing on reconnect. + """ + mock_conn = MagicMock(spec=acp.Client) + mock_conn.session_update = AsyncMock() + agent._conn = mock_conn + + new_resp = await agent.new_session(cwd="/tmp") + state = agent.session_manager.get_session(new_resp.session_id) + state.history = [ + {"role": "user", "content": "Walk me through it."}, + { + "role": "assistant", + "reasoning_content": "Let me think step by step about the request.", + "content": "Here is the plan.", + }, + {"role": "user", "content": "And the legacy case?"}, + { + "role": "assistant", + # No reasoning_content — exercise the legacy "reasoning" fallback + # path so sessions persisted before #16892 still replay thoughts. + "reasoning": "Older sessions stored the trace under the internal key.", + "content": "Same idea, older field name.", + }, + ] + + mock_conn.session_update.reset_mock() + resp = await agent.load_session(cwd="/tmp", session_id=new_resp.session_id) + await asyncio.sleep(0) + await asyncio.sleep(0) + + assert isinstance(resp, LoadSessionResponse) + + replay_kinds = [ + getattr(call.kwargs.get("update"), "session_update", None) + for call in mock_conn.session_update.await_args_list + if getattr(call.kwargs.get("update"), "session_update", None) + in {"user_message_chunk", "agent_message_chunk", "agent_thought_chunk"} + ] + assert replay_kinds == [ + "user_message_chunk", + "agent_thought_chunk", + "agent_message_chunk", + "user_message_chunk", + "agent_thought_chunk", + "agent_message_chunk", + ] + + thought_updates = [ + call.kwargs["update"] + for call in mock_conn.session_update.await_args_list + if isinstance(call.kwargs.get("update"), AgentThoughtChunk) + ] + assert len(thought_updates) == 2 + assert thought_updates[0].content.text == "Let me think step by step about the request." + assert thought_updates[1].content.text == "Older sessions stored the trace under the internal key." + + @pytest.mark.asyncio + async def test_load_session_replays_reasoning_only_turn(self, agent): + """Assistant turns with reasoning but no content should still emit a thought. + + Pure reasoning-only assistant entries (e.g. a thinking step before a + tool-call turn) commonly carry ``reasoning_content`` with empty + ``content``. The replay must still surface the thought so the editor's + Thinking pane rebuilds, even when there is no message text to follow. + """ + mock_conn = MagicMock(spec=acp.Client) + mock_conn.session_update = AsyncMock() + agent._conn = mock_conn + + new_resp = await agent.new_session(cwd="/tmp") + state = agent.session_manager.get_session(new_resp.session_id) + state.history = [ + { + "role": "assistant", + "reasoning_content": "I should call the search tool next.", + "content": "", + }, + ] + + mock_conn.session_update.reset_mock() + await agent.load_session(cwd="/tmp", session_id=new_resp.session_id) + await asyncio.sleep(0) + await asyncio.sleep(0) + + thought_updates = [ + call.kwargs["update"] + for call in mock_conn.session_update.await_args_list + if isinstance(call.kwargs.get("update"), AgentThoughtChunk) + ] + message_updates = [ + call.kwargs["update"] + for call in mock_conn.session_update.await_args_list + if isinstance(call.kwargs.get("update"), AgentMessageChunk) + ] + assert len(thought_updates) == 1 + assert thought_updates[0].content.text == "I should call the search tool next." + assert message_updates == [] + + @pytest.mark.asyncio + async def test_load_session_skips_empty_reasoning_fields(self, agent): + """Empty/whitespace reasoning fields must not produce notifications.""" + mock_conn = MagicMock(spec=acp.Client) + mock_conn.session_update = AsyncMock() + agent._conn = mock_conn + + new_resp = await agent.new_session(cwd="/tmp") + state = agent.session_manager.get_session(new_resp.session_id) + state.history = [ + { + "role": "assistant", + "reasoning_content": "", + "reasoning": " \n\t", + "content": "Just a regular answer.", + }, + ] + + mock_conn.session_update.reset_mock() + await agent.load_session(cwd="/tmp", session_id=new_resp.session_id) + await asyncio.sleep(0) + await asyncio.sleep(0) + + thought_updates = [ + call.kwargs["update"] + for call in mock_conn.session_update.await_args_list + if isinstance(call.kwargs.get("update"), AgentThoughtChunk) + ] + assert thought_updates == [] + + @pytest.mark.asyncio + async def test_load_session_replays_thought_then_tool_call_without_message(self, agent): + """Canonical thinking-model shape: reasoning + tool_call + no body text. + + Thinking models commonly emit a pre-tool thought followed by a + tool_calls turn with empty ``content``. Replay must emit: + ``agent_thought_chunk`` then ``tool_call`` then ``tool_call_update`` + for the matching tool result — and crucially, NO ``agent_message_chunk`` + for the empty-text assistant body. Regression for the canonical + thinking-then-tool flow on #12285. + """ + mock_conn = MagicMock(spec=acp.Client) + mock_conn.session_update = AsyncMock() + agent._conn = mock_conn + + new_resp = await agent.new_session(cwd="/tmp") + state = agent.session_manager.get_session(new_resp.session_id) + state.history = [ + {"role": "user", "content": "Find the bug."}, + { + "role": "assistant", + "reasoning_content": "I should grep for the function name first.", + "content": "", + "tool_calls": [ + { + "id": "call_grep_1", + "type": "function", + "function": { + "name": "search_files", + "arguments": '{"pattern":"foo","path":"."}', + }, + } + ], + }, + { + "role": "tool", + "tool_call_id": "call_grep_1", + "content": '{"total_count":1,"matches":[{"path":"x.py","line":1,"content":"foo"}]}', + }, + ] + + mock_conn.session_update.reset_mock() + await agent.load_session(cwd="/tmp", session_id=new_resp.session_id) + await asyncio.sleep(0) + await asyncio.sleep(0) + + kinds = [ + getattr(call.kwargs.get("update"), "session_update", None) + for call in mock_conn.session_update.await_args_list + if getattr(call.kwargs.get("update"), "session_update", None) + in { + "user_message_chunk", + "agent_thought_chunk", + "agent_message_chunk", + "tool_call", + "tool_call_update", + } + ] + # No agent_message_chunk for the empty-content assistant turn. + assert "agent_message_chunk" not in kinds + # Thought must precede the tool_call_start within the assistant turn, + # and the tool result follows. + assert kinds == [ + "user_message_chunk", + "agent_thought_chunk", + "tool_call", + "tool_call_update", + ] + + @pytest.mark.asyncio + async def test_load_session_replays_history_before_returning_response(self, agent): + """Per ACP spec, replay must complete BEFORE load_session returns. + + Spec-compliant ACP clients (Codex, Claude Code, OpenCode, Pi, Zed) + attach their ``session/update`` listeners before awaiting the + ``loadSession`` RPC and rely on receiving the full transcript within + the request's lifetime. Deferring replay via ``loop.call_soon`` (the + prior behavior in May 2026) broke clients that read notification + counts synchronously against the load response — see #12285 follow-up. + """ new_resp = await agent.new_session(cwd="/tmp") state = agent.session_manager.get_session(new_resp.session_id) state.history = [{"role": "user", "content": "hello from history"}] - events = [] + events: list[str] = [] - async def replay_after_response(_state): + async def replay_records(_state): events.append("replay") - with patch.object(agent, "_replay_session_history", side_effect=replay_after_response): + with patch.object(agent, "_replay_session_history", side_effect=replay_records): resp = await agent.load_session(cwd="/tmp", session_id=new_resp.session_id) events.append("returned") assert isinstance(resp, LoadSessionResponse) - assert events == ["returned"] - await asyncio.sleep(0) - await asyncio.sleep(0) - assert events == ["returned", "replay"] + # Replay must have happened BEFORE the response was constructed — + # i.e. before the `events.append("returned")` after the await resolves. + assert events == ["replay", "returned"] + + @pytest.mark.asyncio + async def test_resume_session_replays_history_before_returning_response(self, agent): + """Same spec rationale as ``load_session`` — replay before responding.""" + new_resp = await agent.new_session(cwd="/tmp") + state = agent.session_manager.get_session(new_resp.session_id) + state.history = [{"role": "user", "content": "hello from history"}] + events: list[str] = [] + + async def replay_records(_state): + events.append("replay") + + with patch.object(agent, "_replay_session_history", side_effect=replay_records): + resp = await agent.resume_session(cwd="/tmp", session_id=new_resp.session_id) + events.append("returned") + + assert isinstance(resp, ResumeSessionResponse) + assert events == ["replay", "returned"] + + @pytest.mark.asyncio + async def test_load_session_survives_replay_helper_exception(self, agent, caplog): + """A replay helper raising must not turn load_session into an error. + + With awaited replay, an exception in ``_replay_session_history`` now + propagates into the ``load_session`` handler. The defensive try/except + guard at the call site must catch and log it so the JSON-RPC client + still receives a ``LoadSessionResponse`` — partial transcripts are + acceptable, total load failure is not. + """ + new_resp = await agent.new_session(cwd="/tmp") + state = agent.session_manager.get_session(new_resp.session_id) + state.history = [{"role": "user", "content": "hi"}] + + async def boom(_state): + raise RuntimeError("simulated replay helper crash") + + with caplog.at_level("WARNING", logger="acp_adapter.server"): + with patch.object(agent, "_replay_session_history", side_effect=boom): + resp = await agent.load_session(cwd="/tmp", session_id=new_resp.session_id) + + assert isinstance(resp, LoadSessionResponse) + assert "history replay raised during session/load" in caplog.text + + @pytest.mark.asyncio + async def test_resume_session_survives_replay_helper_exception(self, agent, caplog): + """Same guarantee as ``load_session`` for the resume path.""" + new_resp = await agent.new_session(cwd="/tmp") + state = agent.session_manager.get_session(new_resp.session_id) + state.history = [{"role": "user", "content": "hi"}] + + async def boom(_state): + raise RuntimeError("simulated replay helper crash") + + with caplog.at_level("WARNING", logger="acp_adapter.server"): + with patch.object(agent, "_replay_session_history", side_effect=boom): + resp = await agent.resume_session(cwd="/tmp", session_id=new_resp.session_id) + + assert isinstance(resp, ResumeSessionResponse) + assert "history replay raised during session/resume" in caplog.text @pytest.mark.asyncio async def test_resume_session_creates_new_if_missing(self, agent): diff --git a/tests/agent/test_anthropic_oauth_pkce.py b/tests/agent/test_anthropic_oauth_pkce.py new file mode 100644 index 00000000000..5cf74d7a6a5 --- /dev/null +++ b/tests/agent/test_anthropic_oauth_pkce.py @@ -0,0 +1,170 @@ +"""Regression tests for the Anthropic OAuth PKCE flow. + +Guards against re-introducing the bug where the PKCE ``code_verifier`` was +reused as the OAuth ``state`` parameter, leaking the verifier via the +authorization URL (browser history, Referer headers, auth-server logs) and +removing CSRF protection on the callback path. + +History: + - PR #1775 first fixed this on ``run_hermes_oauth_login()``. + - PR #2647 (b17e5c10) added ``run_hermes_oauth_login_pure()`` and silently + copy-pasted the pre-#1775 vulnerable pattern. + - PR #3107 removed the old function, leaving only the regressed copy. + - PR #10699 (issue #10693) fixed the regression on the surviving function. +""" + +from __future__ import annotations + +import io +import json +from typing import Any, Dict +from urllib.parse import parse_qs, urlparse + + +def _patch_oauth_flow( + monkeypatch, + *, + callback_code: str, + token_response: Dict[str, Any] | None = None, + capture_token_request: Dict[str, Any] | None = None, + capture_auth_url: Dict[str, str] | None = None, +) -> None: + """Wire up monkeypatches that let ``run_hermes_oauth_login_pure()`` run + end-to-end without touching a real browser, stdin, or HTTP endpoint. + + ``callback_code`` is the literal string the user would paste back into the + terminal (``"#"`` format). + ``capture_token_request`` and ``capture_auth_url`` are out-dict captures + so the test can introspect what was sent to the auth URL and the token + endpoint, respectively. + """ + import urllib.request + + if token_response is None: + token_response = { + "access_token": "sk-ant-test-access", + "refresh_token": "sk-ant-test-refresh", + "expires_in": 3600, + } + + def fake_open(url): + if capture_auth_url is not None: + capture_auth_url["url"] = url + return True + + monkeypatch.setattr("webbrowser.open", fake_open) + monkeypatch.setattr("builtins.input", lambda *_a, **_kw: callback_code) + + class _FakeResponse: + def __init__(self, body: bytes) -> None: + self._body = body + + def __enter__(self): + return self + + def __exit__(self, *_exc): + return False + + def read(self): + return self._body + + def fake_urlopen(req, *_a, **_kw): + if capture_token_request is not None: + capture_token_request["url"] = req.full_url + capture_token_request["data"] = json.loads(req.data.decode()) + capture_token_request["headers"] = dict(req.headers) + return _FakeResponse(json.dumps(token_response).encode()) + + monkeypatch.setattr(urllib.request, "urlopen", fake_urlopen) + + +def test_authorization_url_state_is_not_pkce_verifier(monkeypatch, tmp_path): + """The ``state`` parameter in the authorization URL must NOT equal the + PKCE ``code_verifier``. + + Reusing the verifier as state leaks the verifier into browser history, + Referer headers, and auth-server access logs — defeating RFC 7636. + """ + monkeypatch.setenv("HERMES_HOME", str(tmp_path)) + + captured_url: Dict[str, str] = {} + captured_token: Dict[str, Any] = {} + _patch_oauth_flow( + monkeypatch, + # state echoed back unchanged so the CSRF guard passes + callback_code="auth-code-from-anthropic#PLACEHOLDER", + capture_auth_url=captured_url, + capture_token_request=captured_token, + ) + + # Stub the callback parse: we need the state echoed back to match. To do + # that without hardcoding the state value, override input() AFTER seeing + # the auth URL. + import builtins + + real_input_calls = {"count": 0} + + def fake_input(*_a, **_kw): + real_input_calls["count"] += 1 + # First (and only) call is the "Authorization code:" prompt. + url = captured_url.get("url", "") + qs = parse_qs(urlparse(url).query) + state = qs.get("state", [""])[0] + return f"auth-code-from-anthropic#{state}" + + monkeypatch.setattr(builtins, "input", fake_input) + + from agent.anthropic_adapter import run_hermes_oauth_login_pure + + result = run_hermes_oauth_login_pure() + assert result is not None, "OAuth flow should succeed with matching state" + + url = captured_url["url"] + qs = parse_qs(urlparse(url).query) + + assert "state" in qs and qs["state"][0], "authorization URL must include state" + assert "code_challenge" in qs, "authorization URL must include code_challenge" + + state_in_url = qs["state"][0] + verifier_sent = captured_token["data"]["code_verifier"] + + # The whole point: state and verifier must be independent values. + assert state_in_url != verifier_sent, ( + "PKCE code_verifier was reused as OAuth state — regression of #10693 / " + "#1775. The verifier is supposed to be a secret known only to the " + "client; placing it in the authorization URL leaks it via browser " + "history, Referer headers, and auth-server logs." + ) + + # And the verifier MUST NOT appear anywhere in the URL. + assert verifier_sent not in url, ( + "PKCE verifier leaked into authorization URL — regression of #10693" + ) + + +def test_callback_state_mismatch_aborts(monkeypatch, tmp_path, caplog): + """If the state returned in the callback does not match the one we sent + in the authorization URL, the flow must abort before exchanging the code. + + Without this check, an attacker who tricks the user into pasting a + crafted ``#`` string can complete the token exchange — the + CSRF protection that ``state`` is supposed to provide (RFC 6749 §10.12) + would be absent. + """ + monkeypatch.setenv("HERMES_HOME", str(tmp_path)) + + captured_token: Dict[str, Any] = {} + _patch_oauth_flow( + monkeypatch, + callback_code="attacker-code#attacker-state-does-not-match", + capture_token_request=captured_token, + ) + + from agent.anthropic_adapter import run_hermes_oauth_login_pure + + result = run_hermes_oauth_login_pure() + + assert result is None, "mismatched state must abort the flow" + assert "url" not in captured_token, ( + "token exchange must NOT happen when state mismatches" + ) diff --git a/tests/agent/test_copilot_acp_deprecation.py b/tests/agent/test_copilot_acp_deprecation.py new file mode 100644 index 00000000000..a0da7736732 --- /dev/null +++ b/tests/agent/test_copilot_acp_deprecation.py @@ -0,0 +1,77 @@ +"""Tests for gh-copilot CLI deprecation detection and GitHub Models Azure URL mapping.""" + +import pytest + +from agent.copilot_acp_client import _is_gh_copilot_deprecation_message + + +class TestDeprecationPatternDetection: + """Verify that stderr from the deprecated `gh copilot` extension is caught + without false-positiving on the new `@github/copilot` CLI.""" + + _REAL_DEPRECATION_STDERR = ( + "The gh-copilot extension has been deprecated in favor of the newer " + "GitHub Copilot CLI.\nFor more information, visit:\n" + "- Copilot CLI: https://github.com/github/copilot-cli\n" + "- Deprecation announcement: https://github.blog/changelog/" + "2025-09-25-upcoming-deprecation-of-gh-copilot-cli-extension\n" + "No commands will be executed." + ) + + def test_real_deprecation_message_matches(self): + assert _is_gh_copilot_deprecation_message(self._REAL_DEPRECATION_STDERR) + + @pytest.mark.parametrize( + "stderr_text", + [ + # The deprecation banner uses both halves of the fingerprint. + "The gh-copilot extension has been deprecated.", + "gh-copilot: no commands will be executed.", + # Mixed casing — match is case-insensitive. + "The GH-Copilot Extension HAS BEEN DEPRECATED.", + ], + ) + def test_genuine_deprecation_variants_match(self, stderr_text: str): + assert _is_gh_copilot_deprecation_message(stderr_text) + + @pytest.mark.parametrize( + "stderr_text", + [ + # Generic errors — no fingerprint at all. + "Error: connection refused", + "", + # The NEW @github/copilot CLI's repo is github.com/github/copilot-cli. + # Its stderr can legitimately mention "copilot-cli" or "deprecation" + # in unrelated contexts; neither alone should trip the detector. + "copilot-cli: failed to authenticate with the API", + "warning: the --foo flag is scheduled for deprecation in v3", + "See https://github.com/github/copilot-cli/issues for support", + # Half the fingerprint without the other half. + "gh-copilot: command not found", + "extension has been deprecated (some other extension)", + ], + ) + def test_does_not_false_positive(self, stderr_text: str): + assert not _is_gh_copilot_deprecation_message(stderr_text) + + +class TestGitHubModelsAzureUrl: + """Verify that the Azure GitHub Models URL is recognised.""" + + def test_url_to_provider_contains_azure_models(self): + from agent.model_metadata import _URL_TO_PROVIDER + + # Maps to the canonical "copilot" provider (same convention as the + # other GitHub-family entries) — not the "github-models" alias. + assert _URL_TO_PROVIDER.get("models.inference.ai.azure.com") == "copilot" + + def test_is_github_models_base_url_recognises_azure(self): + from hermes_cli.models import _is_github_models_base_url + + assert _is_github_models_base_url("https://models.inference.ai.azure.com") + assert _is_github_models_base_url("https://models.inference.ai.azure.com/v1/chat") + + def test_is_github_models_base_url_still_recognises_github_ai(self): + from hermes_cli.models import _is_github_models_base_url + + assert _is_github_models_base_url("https://models.github.ai/inference") diff --git a/tests/gateway/test_active_session_text_merge.py b/tests/gateway/test_active_session_text_merge.py new file mode 100644 index 00000000000..087f8dbabd0 --- /dev/null +++ b/tests/gateway/test_active_session_text_merge.py @@ -0,0 +1,152 @@ +"""Regression test for #4469. + +When the agent is actively running (session present in +``adapter._active_sessions``) and the user fires off multiple TEXT +follow-ups in rapid succession, the previous behaviour was a single-slot +replacement at ``gateway/platforms/base.py``: + + self._pending_messages[session_key] = event + +So three rapid messages ``A``, ``B``, ``C`` arriving while the agent was +still working on the initial turn produced a pending slot containing only +``C``; ``A`` and ``B`` were silently dropped. + +The fix routes the follow-up through ``merge_pending_message_event(..., +merge_text=True)`` so TEXT events accumulate into the existing pending +event's text instead of clobbering it. Photo / media bursts continue to +merge through the same helper (they always did). +""" + +from __future__ import annotations + +import asyncio +import sys +import types +from unittest.mock import AsyncMock, MagicMock + +import pytest + +# Minimal telegram stub so importing gateway.platforms.base does not pull +# in the real python-telegram-bot dependency. +_tg = sys.modules.get("telegram") or types.ModuleType("telegram") +_tg.constants = sys.modules.get("telegram.constants") or types.ModuleType("telegram.constants") +_ct = MagicMock() +_ct.PRIVATE = "private" +_ct.GROUP = "group" +_ct.SUPERGROUP = "supergroup" +_tg.constants.ChatType = _ct +sys.modules.setdefault("telegram", _tg) +sys.modules.setdefault("telegram.constants", _tg.constants) +sys.modules.setdefault("telegram.ext", types.ModuleType("telegram.ext")) + +from gateway.config import Platform, PlatformConfig +from gateway.platforms.base import ( + BasePlatformAdapter, + MessageEvent, + MessageType, +) +from gateway.session import SessionSource, build_session_key + + +def _make_event(text: str, chat_id: str = "12345") -> MessageEvent: + source = SessionSource( + platform=Platform.TELEGRAM, + chat_id=chat_id, + chat_type="dm", + user_id="u1", + ) + return MessageEvent( + text=text, + message_type=MessageType.TEXT, + source=source, + message_id=f"msg-{text[:8]}", + ) + + +def _make_adapter() -> BasePlatformAdapter: + """Build a BasePlatformAdapter without running its heavy __init__. + + We only need the bits ``handle_message`` touches on the active-session + path: ``_active_sessions``, ``_pending_messages``, + ``_message_handler``, ``_busy_session_handler``, ``config``, ``platform``. + """ + + class _DummyAdapter(BasePlatformAdapter): # type: ignore[misc] + async def connect(self): + pass + + async def disconnect(self): + pass + + async def get_chat_info(self, chat_id): + return None + + async def send(self, *args, **kwargs): + return MagicMock(success=True, message_id="x", retryable=False) + + adapter = object.__new__(_DummyAdapter) + adapter.config = PlatformConfig(enabled=True, token="***") + adapter.platform = Platform.TELEGRAM + adapter._message_handler = AsyncMock(return_value=None) + adapter._busy_session_handler = None + adapter._active_sessions = {} + adapter._pending_messages = {} + adapter._session_tasks = {} + adapter._background_tasks = set() + adapter._post_delivery_callbacks = {} + adapter._expected_cancelled_tasks = set() + adapter._fatal_error_code = None + adapter._fatal_error_message = None + adapter._fatal_error_retryable = True + adapter._fatal_error_handler = None + adapter._running = True + adapter._auto_tts_default = False + adapter._auto_tts_enabled_chats = set() + adapter._auto_tts_disabled_chats = set() + adapter._typing_paused = set() + return adapter + + +@pytest.mark.asyncio +async def test_rapid_text_followups_accumulate_instead_of_replacing(): + """Three rapid TEXT follow-ups during an active session must all + survive in ``adapter._pending_messages[session_key].text``.""" + adapter = _make_adapter() + first = _make_event("part one") + session_key = build_session_key(first.source) + + # Mark the session as active so subsequent messages take the + # "already running" branch in handle_message. + adapter._active_sessions[session_key] = asyncio.Event() + + second = _make_event("part two") + third = _make_event("part three") + + await adapter.handle_message(second) + await adapter.handle_message(third) + + # Both rapid follow-ups must be preserved, not just the last one. + pending = adapter._pending_messages[session_key] + assert pending.text == "part two\npart three", ( + f"expected accumulated text, got {pending.text!r}" + ) + # Interrupt event must be signalled exactly like before. + assert adapter._active_sessions[session_key].is_set() + + +@pytest.mark.asyncio +async def test_single_followup_is_stored_as_is(): + """One TEXT follow-up still lands as the event object itself + (no spurious wrapping / mutation) — guards against the merge path + breaking the simple case.""" + adapter = _make_adapter() + first = _make_event("only one") + session_key = build_session_key(first.source) + + adapter._active_sessions[session_key] = asyncio.Event() + await adapter.handle_message(first) + + pending = adapter._pending_messages[session_key] + assert pending is first + assert pending.text == "only one" + assert adapter._active_sessions[session_key].is_set() diff --git a/tests/hermes_cli/test_doctor.py b/tests/hermes_cli/test_doctor.py index 34e75045eff..ee419656a71 100644 --- a/tests/hermes_cli/test_doctor.py +++ b/tests/hermes_cli/test_doctor.py @@ -839,3 +839,108 @@ class TestGitHubTokenCheck: assert "gh auth" in str(call_log) or any(c[0] == "gh" for c in call_log), f"gh not called: {call_log}" assert "GitHub authenticated via gh CLI" in out or "token configured" in out + + +def _run_doctor_with_healthy_oauth_fallback( + monkeypatch, + tmp_path, + *, + env_key: str, + bad_key: str, + failing_host: str, + gemini_oauth_status: dict, + minimax_oauth_status: dict, +) -> str: + home = tmp_path / ".hermes" + home.mkdir(parents=True, exist_ok=True) + (home / "config.yaml").write_text( + "model:\n" + " provider: nous\n" + " default: moonshotai/kimi-k2.6\n", + encoding="utf-8", + ) + project = tmp_path / "project" + project.mkdir(exist_ok=True) + + monkeypatch.setattr(doctor_mod, "HERMES_HOME", home) + monkeypatch.setattr(doctor_mod, "PROJECT_ROOT", project) + monkeypatch.setattr(doctor_mod, "_DHH", str(home)) + monkeypatch.setenv(env_key, bad_key) + monkeypatch.delenv("OPENROUTER_API_KEY", raising=False) + monkeypatch.delenv("OPENAI_API_KEY", raising=False) + monkeypatch.delenv("GEMINI_API_KEY", raising=False) + monkeypatch.delenv("GOOGLE_API_KEY", raising=False) + monkeypatch.delenv("MINIMAX_API_KEY", raising=False) + monkeypatch.delenv("MINIMAX_CN_API_KEY", raising=False) + monkeypatch.setenv(env_key, bad_key) + + fake_model_tools = types.SimpleNamespace( + check_tool_availability=lambda *a, **kw: ([], []), + TOOLSET_REQUIREMENTS={}, + ) + monkeypatch.setitem(sys.modules, "model_tools", fake_model_tools) + + from hermes_cli import auth as _auth_mod + + monkeypatch.setattr(_auth_mod, "get_nous_auth_status", lambda: {"logged_in": True}) + monkeypatch.setattr(_auth_mod, "get_codex_auth_status", lambda: {}) + monkeypatch.setattr(_auth_mod, "get_gemini_oauth_auth_status", lambda: gemini_oauth_status) + monkeypatch.setattr(_auth_mod, "get_minimax_oauth_auth_status", lambda: minimax_oauth_status) + + def fake_get(url, headers=None, timeout=None): + status = 401 if failing_host in url else 200 + return types.SimpleNamespace(status_code=status) + + import httpx + + monkeypatch.setattr(httpx, "get", fake_get) + + buf = io.StringIO() + with contextlib.redirect_stdout(buf): + doctor_mod.run_doctor(Namespace(fix=False)) + return buf.getvalue() + + +@pytest.mark.parametrize( + ("env_key", "bad_key", "failing_host", "gemini_oauth_status", "minimax_oauth_status", "unexpected_issue"), + [ + ( + "GOOGLE_API_KEY", + "bad-gemini-key", + "googleapis.com", + {"logged_in": True, "email": "user@example.com"}, + {}, + "Check GOOGLE_API_KEY in .env", + ), + ( + "MINIMAX_API_KEY", + "bad-minimax-key", + "minimax.io", + {}, + {"logged_in": True, "region": "global"}, + "Check MINIMAX_API_KEY in .env", + ), + ], +) +def test_run_doctor_ignores_invalid_direct_keys_when_oauth_fallback_is_healthy( + monkeypatch, + tmp_path, + env_key, + bad_key, + failing_host, + gemini_oauth_status, + minimax_oauth_status, + unexpected_issue, +): + out = _run_doctor_with_healthy_oauth_fallback( + monkeypatch, + tmp_path, + env_key=env_key, + bad_key=bad_key, + failing_host=failing_host, + gemini_oauth_status=gemini_oauth_status, + minimax_oauth_status=minimax_oauth_status, + ) + + assert "invalid API key" in out + assert unexpected_issue not in out diff --git a/tests/hermes_cli/test_plugins.py b/tests/hermes_cli/test_plugins.py index 7be43a236f2..0c500297a2b 100644 --- a/tests/hermes_cli/test_plugins.py +++ b/tests/hermes_cli/test_plugins.py @@ -662,6 +662,129 @@ class TestPluginContext: from tools.registry import registry assert "plugin_echo" in registry._tools + def test_register_tool_rejects_shadow_without_override(self, tmp_path, monkeypatch, caplog): + """Without override=True, registering a tool name claimed by a different toolset is rejected.""" + from tools.registry import registry + + # Seed an existing entry from a non-plugin toolset. + registry.register( + name="shadow_target", + toolset="terminal", + schema={"name": "shadow_target", "description": "Built-in", "parameters": {"type": "object", "properties": {}}}, + handler=lambda args, **kw: "built-in", + ) + original_handler = registry._tools["shadow_target"].handler + try: + plugins_dir = tmp_path / "hermes_test" / "plugins" + plugin_dir = plugins_dir / "shadow_plugin" + plugin_dir.mkdir(parents=True) + (plugin_dir / "plugin.yaml").write_text(yaml.dump({"name": "shadow_plugin"})) + (plugin_dir / "__init__.py").write_text( + 'def register(ctx):\n' + ' ctx.register_tool(\n' + ' name="shadow_target",\n' + ' toolset="plugin_shadow_plugin",\n' + ' schema={"name": "shadow_target", "description": "Plugin", "parameters": {"type": "object", "properties": {}}},\n' + ' handler=lambda args, **kw: "plugin",\n' + ' )\n' + ) + hermes_home = tmp_path / "hermes_test" + (hermes_home / "config.yaml").write_text( + yaml.safe_dump({"plugins": {"enabled": ["shadow_plugin"]}}) + ) + monkeypatch.setenv("HERMES_HOME", str(hermes_home)) + + with caplog.at_level(logging.ERROR, logger="tools.registry"): + mgr = PluginManager() + mgr.discover_and_load() + + # Original handler must still be in place — registration was rejected. + assert registry._tools["shadow_target"].handler is original_handler + assert registry._tools["shadow_target"].toolset == "terminal" + # And an ERROR was logged explaining why and how to opt in. + assert any("override=True" in r.message for r in caplog.records) + finally: + registry.deregister("shadow_target") + + def test_register_tool_override_replaces_existing(self, tmp_path, monkeypatch, caplog): + """override=True lets a plugin replace an existing built-in tool.""" + from tools.registry import registry + + registry.register( + name="override_target", + toolset="terminal", + schema={"name": "override_target", "description": "Built-in", "parameters": {"type": "object", "properties": {}}}, + handler=lambda args, **kw: "built-in", + ) + try: + plugins_dir = tmp_path / "hermes_test" / "plugins" + plugin_dir = plugins_dir / "override_plugin" + plugin_dir.mkdir(parents=True) + (plugin_dir / "plugin.yaml").write_text(yaml.dump({"name": "override_plugin"})) + (plugin_dir / "__init__.py").write_text( + 'def register(ctx):\n' + ' ctx.register_tool(\n' + ' name="override_target",\n' + ' toolset="plugin_override_plugin",\n' + ' schema={"name": "override_target", "description": "Plugin", "parameters": {"type": "object", "properties": {}}},\n' + ' handler=lambda args, **kw: "plugin",\n' + ' override=True,\n' + ' )\n' + ) + hermes_home = tmp_path / "hermes_test" + (hermes_home / "config.yaml").write_text( + yaml.safe_dump({"plugins": {"enabled": ["override_plugin"]}}) + ) + monkeypatch.setenv("HERMES_HOME", str(hermes_home)) + + with caplog.at_level(logging.INFO, logger="tools.registry"): + mgr = PluginManager() + mgr.discover_and_load() + + # Plugin handler replaced the built-in one. + assert registry._tools["override_target"].toolset == "plugin_override_plugin" + assert registry._tools["override_target"].handler({}, ) == "plugin" + # Override is audit-logged at INFO. + assert any( + "overriding existing" in r.message and "override_target" in r.message + for r in caplog.records + ) + # Plugin tracks it. + assert "override_target" in mgr._plugin_tool_names + finally: + registry.deregister("override_target") + + def test_register_tool_override_on_new_name_is_noop_path(self, tmp_path, monkeypatch): + """override=True on a brand-new name still registers cleanly (no existing entry to replace).""" + from tools.registry import registry + + plugins_dir = tmp_path / "hermes_test" / "plugins" + plugin_dir = plugins_dir / "new_override_plugin" + plugin_dir.mkdir(parents=True) + (plugin_dir / "plugin.yaml").write_text(yaml.dump({"name": "new_override_plugin"})) + (plugin_dir / "__init__.py").write_text( + 'def register(ctx):\n' + ' ctx.register_tool(\n' + ' name="brand_new_override_tool",\n' + ' toolset="plugin_new_override_plugin",\n' + ' schema={"name": "brand_new_override_tool", "description": "New", "parameters": {"type": "object", "properties": {}}},\n' + ' handler=lambda args, **kw: "ok",\n' + ' override=True,\n' + ' )\n' + ) + hermes_home = tmp_path / "hermes_test" + (hermes_home / "config.yaml").write_text( + yaml.safe_dump({"plugins": {"enabled": ["new_override_plugin"]}}) + ) + monkeypatch.setenv("HERMES_HOME", str(hermes_home)) + + try: + mgr = PluginManager() + mgr.discover_and_load() + assert "brand_new_override_tool" in registry._tools + finally: + registry.deregister("brand_new_override_tool") + # ── TestPluginToolVisibility ─────────────────────────────────────────────── diff --git a/tests/run_agent/test_run_agent.py b/tests/run_agent/test_run_agent.py index c493f91509a..cd62cd41ded 100644 --- a/tests/run_agent/test_run_agent.py +++ b/tests/run_agent/test_run_agent.py @@ -2269,6 +2269,60 @@ class TestParallelScopePathNormalization: assert not _should_parallelize_tool_batch([tc1, tc2]) +class TestMcpParallelToolBatch: + """Integration test: _should_parallelize_tool_batch respects MCP parallel flag.""" + + def test_mcp_tools_default_sequential(self): + """MCP tools without supports_parallel_tool_calls are sequential.""" + from run_agent import _should_parallelize_tool_batch + tc1 = _mock_tool_call(name="mcp_github_list_repos", arguments='{"org":"openai"}', call_id="c1") + tc2 = _mock_tool_call(name="mcp_github_search_code", arguments='{"q":"test"}', call_id="c2") + assert not _should_parallelize_tool_batch([tc1, tc2]) + + def test_mcp_tools_parallel_when_server_opted_in(self): + """MCP tools from a parallel-safe server can run concurrently.""" + from run_agent import _should_parallelize_tool_batch + from tools.mcp_tool import _parallel_safe_servers, _lock + with _lock: + _parallel_safe_servers.add("github") + try: + tc1 = _mock_tool_call(name="mcp_github_list_repos", arguments='{"org":"openai"}', call_id="c1") + tc2 = _mock_tool_call(name="mcp_github_search_code", arguments='{"q":"test"}', call_id="c2") + assert _should_parallelize_tool_batch([tc1, tc2]) + finally: + with _lock: + _parallel_safe_servers.discard("github") + + def test_mixed_mcp_and_builtin_parallel(self): + """MCP parallel tools mixed with built-in parallel-safe tools.""" + from run_agent import _should_parallelize_tool_batch + from tools.mcp_tool import _parallel_safe_servers, _lock + with _lock: + _parallel_safe_servers.add("docs") + try: + tc1 = _mock_tool_call(name="mcp_docs_search", arguments='{"query":"api"}', call_id="c1") + tc2 = _mock_tool_call(name="web_search", arguments='{"query":"test"}', call_id="c2") + assert _should_parallelize_tool_batch([tc1, tc2]) + finally: + with _lock: + _parallel_safe_servers.discard("docs") + + def test_mixed_parallel_and_serial_mcp_servers(self): + """One parallel MCP server + one non-parallel MCP server = sequential.""" + from run_agent import _should_parallelize_tool_batch + from tools.mcp_tool import _parallel_safe_servers, _lock + with _lock: + _parallel_safe_servers.add("docs") + # "github" is NOT in _parallel_safe_servers + try: + tc1 = _mock_tool_call(name="mcp_docs_search", arguments='{"query":"api"}', call_id="c1") + tc2 = _mock_tool_call(name="mcp_github_list_repos", arguments='{"org":"openai"}', call_id="c2") + assert not _should_parallelize_tool_batch([tc1, tc2]) + finally: + with _lock: + _parallel_safe_servers.discard("docs") + + class TestHandleMaxIterations: def test_returns_summary(self, agent): resp = _mock_response(content="Here is a summary of what I did.") diff --git a/tests/run_agent/test_streaming.py b/tests/run_agent/test_streaming.py index e636498c462..1ce140f82bf 100644 --- a/tests/run_agent/test_streaming.py +++ b/tests/run_agent/test_streaming.py @@ -999,6 +999,88 @@ class TestAnthropicStreamCallbacks: assert touch_calls.count("receiving stream response") == len(events) + @patch("run_agent.AIAgent._replace_primary_openai_client") + def test_anthropic_stream_parser_valueerror_retries_before_delivery( + self, mock_replace, monkeypatch, + ): + """Malformed Anthropic event-stream frames retry instead of surfacing HTTP None.""" + from run_agent import AIAgent + + agent = AIAgent( + api_key="test-key", + base_url="https://api.minimax.io/anthropic", + provider="minimax", + model="MiniMax-M2.7", + quiet_mode=True, + skip_context_files=True, + skip_memory=True, + ) + agent.api_mode = "anthropic_messages" + agent._interrupt_requested = False + monkeypatch.setenv("HERMES_STREAM_RETRIES", "1") + + class _BadStream: + response = None + + def __enter__(self): + return self + + def __exit__(self, *_args): + return False + + def __iter__(self): + raise ValueError("expected ident at line 1 column 149") + + final_message = SimpleNamespace(content=[], stop_reason="end_turn") + good_stream = MagicMock() + good_stream.__enter__ = MagicMock(return_value=good_stream) + good_stream.__exit__ = MagicMock(return_value=False) + good_stream.__iter__ = MagicMock(return_value=iter([])) + good_stream.get_final_message.return_value = final_message + + agent._anthropic_client = MagicMock() + agent._anthropic_client.messages.stream.side_effect = [ + _BadStream(), + good_stream, + ] + + response = agent._interruptible_streaming_api_call({}) + + assert response is final_message + assert agent._anthropic_client.messages.stream.call_count == 2 + assert mock_replace.call_count == 1 + + @patch("run_agent.AIAgent._replace_primary_openai_client") + def test_generic_anthropic_valueerror_still_propagates_without_stream_retry( + self, mock_replace, monkeypatch, + ): + """Only known provider stream parser ValueErrors are treated as transient.""" + from run_agent import AIAgent + + agent = AIAgent( + api_key="test-key", + base_url="https://api.minimax.io/anthropic", + provider="minimax", + model="MiniMax-M2.7", + quiet_mode=True, + skip_context_files=True, + skip_memory=True, + ) + agent.api_mode = "anthropic_messages" + agent._interrupt_requested = False + monkeypatch.setenv("HERMES_STREAM_RETRIES", "1") + + agent._anthropic_client = MagicMock() + agent._anthropic_client.messages.stream.side_effect = ValueError( + "invalid local request shape" + ) + + with pytest.raises(ValueError, match="invalid local request shape"): + agent._interruptible_streaming_api_call({}) + + assert agent._anthropic_client.messages.stream.call_count == 1 + assert mock_replace.call_count == 0 + class TestPartialToolCallWarning: """Regression: when a stream dies mid tool-call argument generation after @@ -1504,4 +1586,3 @@ class TestCopilotACPStreamingDecision: _use_streaming = False assert _use_streaming is True - diff --git a/tests/skills/test_darwinian_evolver_skill.py b/tests/skills/test_darwinian_evolver_skill.py new file mode 100644 index 00000000000..8b3a14b8da9 --- /dev/null +++ b/tests/skills/test_darwinian_evolver_skill.py @@ -0,0 +1,102 @@ +""" +Smoke tests for the darwinian-evolver optional skill. + +We can't actually run the evolution loop in CI (it needs network + a paid LLM), +so these tests verify: + - SKILL.md frontmatter conforms to the hardline format + - shipped scripts parse as valid Python + - the scripts reference the right env var / module paths +""" +from __future__ import annotations + +import ast +import re +from pathlib import Path + +import pytest +import yaml + +SKILL_DIR = Path(__file__).resolve().parents[2] / "optional-skills" / "research" / "darwinian-evolver" + + +@pytest.fixture(scope="module") +def frontmatter() -> dict: + src = (SKILL_DIR / "SKILL.md").read_text() + m = re.search(r"^---\n(.*?)\n---", src, re.DOTALL) + assert m, "SKILL.md missing YAML frontmatter" + return yaml.safe_load(m.group(1)) + + +def test_skill_dir_exists() -> None: + assert SKILL_DIR.is_dir(), f"missing skill dir: {SKILL_DIR}" + + +def test_skill_md_present() -> None: + assert (SKILL_DIR / "SKILL.md").is_file() + + +def test_description_under_60_chars(frontmatter) -> None: + desc = frontmatter["description"] + assert len(desc) <= 60, f"description is {len(desc)} chars (hardline ≤60): {desc!r}" + + +def test_name_matches_dir(frontmatter) -> None: + assert frontmatter["name"] == "darwinian-evolver" + + +def test_platforms_excludes_windows(frontmatter) -> None: + # Upstream uses func_timeout (POSIX signals) and uv subprocess pipelines; the + # skill is gated [linux, macos]. If we ever port to Windows, update this test + # to assert ["linux", "macos", "windows"]. + assert "windows" not in frontmatter["platforms"] + assert set(frontmatter["platforms"]) >= {"linux", "macos"} + + +def test_author_credits_contributor(frontmatter) -> None: + author = frontmatter["author"] + assert "Bihruze" in author, f"author should credit the original contributor: {author!r}" + + +def test_license_mit(frontmatter) -> None: + assert frontmatter["license"] == "MIT" + + +@pytest.mark.parametrize( + "path", + [ + "scripts/parrot_openrouter.py", + "scripts/show_snapshot.py", + "templates/custom_problem_template.py", + ], +) +def test_shipped_scripts_parse(path: str) -> None: + src = (SKILL_DIR / path).read_text() + ast.parse(src) # raises SyntaxError on broken Python + + +def test_parrot_script_uses_openrouter() -> None: + src = (SKILL_DIR / "scripts" / "parrot_openrouter.py").read_text() + assert "OPENROUTER_API_KEY" in src, "parrot driver should read OPENROUTER_API_KEY" + assert "openrouter.ai/api/v1" in src, "parrot driver should target OpenRouter" + assert "EVOLVER_MODEL" in src, "model should be overridable via EVOLVER_MODEL" + + +def test_parrot_script_has_error_swallowing() -> None: + """Provider content-filter / rate-limit must not kill the run — see Pitfall 2.""" + src = (SKILL_DIR / "scripts" / "parrot_openrouter.py").read_text() + assert "LLM_ERROR" in src, "_prompt_llm should swallow provider errors and tag them" + + +def test_skill_calls_out_agpl(frontmatter) -> None: + """The upstream tool is AGPL-3.0. The skill MUST flag this so users don't + import it into MIT-licensed code by accident.""" + src = (SKILL_DIR / "SKILL.md").read_text() + assert "AGPL" in src, "SKILL.md must mention upstream AGPL license" + + +def test_skill_pitfalls_section_present() -> None: + src = (SKILL_DIR / "SKILL.md").read_text() + assert "## Pitfalls" in src + # Pitfalls we discovered during the spike — keep them in sync with reality. + assert "Initial organism must be viable" in src + assert "generator" in src # loop.run() pitfall diff --git a/tests/test_sanitize_tool_error.py b/tests/test_sanitize_tool_error.py new file mode 100644 index 00000000000..3a0685bf3d7 --- /dev/null +++ b/tests/test_sanitize_tool_error.py @@ -0,0 +1,137 @@ +"""Tests for `_sanitize_tool_error` in model_tools. + +Ported from ironclaw#1639 — defense-in-depth on tool exception strings before +they enter the model's `tool` message content. Note that `json.dumps()` in +`handle_function_call` already handles quote/backslash escaping at the wire +layer; this helper exists to strip structural framing tokens the model +itself might react to (XML role tags, CDATA, markdown code fences) and to +cap pathological lengths. +""" +from __future__ import annotations + +from model_tools import _sanitize_tool_error, _TOOL_ERROR_MAX_LEN + + +class TestRoleTagStripping: + def test_strips_tool_call_tags(self): + out = _sanitize_tool_error("bad injected happened") + assert "" not in out + assert "" not in out + assert "bad injected happened" in out + + def test_strips_function_call_tags(self): + out = _sanitize_tool_error("x") + assert "" not in out + assert "" not in out + + def test_strips_role_tags(self): + # Each of these should be stripped + for tag in ("system", "assistant", "user", "result", "response", "output", "input"): + raw = f"prefix <{tag}>hi suffix" + out = _sanitize_tool_error(raw) + assert f"<{tag}>" not in out, f"failed to strip <{tag}>" + assert f"" not in out, f"failed to strip " + + def test_role_tag_strip_is_case_insensitive(self): + out = _sanitize_tool_error("x") + assert "<" not in out.replace("[TOOL_ERROR]", "") # only the prefix bracket survives + + def test_unrelated_xml_kept(self): + # We intentionally only strip the role-like tag whitelist, not all XML + out = _sanitize_tool_error("Error parsing line 5") + assert "" in out + + +class TestCDATAStripping: + def test_strips_cdata(self): + out = _sanitize_tool_error("error: here") + assert "" not in out + + def test_strips_multiline_cdata(self): + out = _sanitize_tool_error("a\n\nb") + assert "CDATA" not in out + assert "a" in out and "b" in out + + +class TestCodeFenceStripping: + def test_strips_leading_fence_with_lang(self): + out = _sanitize_tool_error("```json\n{\"x\": 1}") + assert not out.replace("[TOOL_ERROR] ", "").startswith("```") + + def test_strips_trailing_fence(self): + out = _sanitize_tool_error("payload\n```") + assert not out.rstrip().endswith("```") + + def test_strips_bare_fence(self): + out = _sanitize_tool_error("```\nstuff") + assert "```" not in out.split("\n")[0] + + +class TestTruncation: + def test_caps_long_input(self): + long = "A" * (_TOOL_ERROR_MAX_LEN * 2) + out = _sanitize_tool_error(long) + # Total length is prefix + truncated body + body = out[len("[TOOL_ERROR] "):] + assert len(body) == _TOOL_ERROR_MAX_LEN + assert body.endswith("...") + + def test_does_not_truncate_short_input(self): + msg = "short error" + out = _sanitize_tool_error(msg) + assert "..." not in out + assert msg in out + + +class TestEnvelope: + def test_wraps_with_prefix(self): + out = _sanitize_tool_error("oh no") + assert out.startswith("[TOOL_ERROR] ") + + def test_empty_input(self): + out = _sanitize_tool_error("") + assert out == "[TOOL_ERROR] " + + def test_preserves_normal_error_text(self): + msg = "Error executing read_file: FileNotFoundError: /tmp/missing" + out = _sanitize_tool_error(msg) + assert msg in out + + +class TestHandleFunctionCallIntegration: + """Verify handle_function_call routes exception-path errors through the sanitizer. + + Note: the "Unknown tool: ..." early-return in tools/registry.py is a + *different* code path from `except Exception` in handle_function_call — + that one returns directly without sanitization (and there's nothing to + sanitize in a hardcoded format string anyway). This test exercises the + real exception path by passing args that make a known tool raise. + """ + + def test_exception_path_error_is_sanitized(self): + import json + from model_tools import handle_function_call + from tools.registry import registry as _registry + + # Force a known tool to raise with a payload containing role tags. + def boom(_args, **_kwargs): + raise RuntimeError("injected boom") + + all_tools = _registry.get_all_tool_names() + assert all_tools, "no tools registered — test environment broken" + target = all_tools[0] + original = _registry._tools[target].handler + _registry._tools[target].handler = boom + try: + result_str = handle_function_call(target, {}) + finally: + _registry._tools[target].handler = original + + payload = json.loads(result_str) + assert "error" in payload, payload + assert payload["error"].startswith("[TOOL_ERROR] "), payload["error"] + # Role-tag stripping carried through + assert "" not in payload["error"] + assert "" not in payload["error"] + assert "boom" in payload["error"] diff --git a/tests/tools/test_approval.py b/tests/tools/test_approval.py index 7ec2d5868f1..0694dbcdc91 100644 --- a/tests/tools/test_approval.py +++ b/tests/tools/test_approval.py @@ -1102,3 +1102,206 @@ class TestDetectSudoStdin: "make 2>&1 | tee build.log" ) assert is_dangerous is False + + +class TestMacOSPrivateSystemPaths: + """Inspired by Claude Code 2.1.113 "dangerous path protection". + + On macOS, /etc, /var, /tmp, /home are symlinks to + /private/{etc,var,tmp,home}. A command that writes to + /private/etc/sudoers works identically to /etc/sudoers but bypasses + a plain "/etc/" pattern check. These tests guard the shared + _SYSTEM_CONFIG_PATH fragment used across redirect / tee / cp / mv / + install / sed -i patterns. + """ + + def test_private_etc_redirect(self): + dangerous, _, desc = detect_dangerous_command( + "echo 'root ALL=NOPASSWD: ALL' > /private/etc/sudoers" + ) + assert dangerous is True + assert "system config" in desc.lower() + + def test_private_var_redirect(self): + dangerous, _, _ = detect_dangerous_command( + "echo payload > /private/var/db/dslocal/nodes/x" + ) + assert dangerous is True + + def test_private_etc_via_tee(self): + dangerous, _, desc = detect_dangerous_command( + "echo malicious | tee /private/etc/hosts" + ) + assert dangerous is True + assert "tee" in desc.lower() or "system" in desc.lower() + + def test_private_etc_cp(self): + dangerous, _, desc = detect_dangerous_command( + "cp malicious.conf /private/etc/hosts" + ) + assert dangerous is True + assert "copy" in desc.lower() or "system config" in desc.lower() + + def test_private_etc_mv(self): + dangerous, _, _ = detect_dangerous_command( + "mv evil /private/etc/ssh/sshd_config" + ) + assert dangerous is True + + def test_private_etc_install(self): + dangerous, _, _ = detect_dangerous_command( + "install -m 600 key /private/etc/ssh/keys" + ) + assert dangerous is True + + def test_private_etc_sed_in_place(self): + dangerous, _, desc = detect_dangerous_command( + "sed -i 's/root/pwned/' /private/etc/passwd" + ) + assert dangerous is True + assert "in-place" in desc.lower() or "system config" in desc.lower() + + def test_private_var_sed_long_flag(self): + dangerous, _, _ = detect_dangerous_command( + "sed --in-place 's/x/y/' /private/var/log/wtmp" + ) + assert dangerous is True + + def test_private_tmp_cp(self): + dangerous, _, _ = detect_dangerous_command( + "cp rootkit /private/tmp/payload" + ) + assert dangerous is True + + def test_ls_private_is_safe(self): + """Reading under /private/ must not trigger approval.""" + dangerous, _, _ = detect_dangerous_command("ls /private") + assert dangerous is False + + def test_echo_mentioning_private_path_is_safe(self): + """Literal mention of /private/etc in an echo string must not fire.""" + dangerous, _, _ = detect_dangerous_command( + "echo 'the macOS path is /private/etc on disk'" + ) + assert dangerous is False + + +class TestKillallKillSignals: + """Inspired by Claude Code 2.1.113 expanded deny rules. + + The existing pattern caught `pkill -9` but not the equivalent + `killall -9` / `-KILL` / `-s KILL` / `-r ` broad sweeps that + can wipe out unrelated processes. + """ + + def test_killall_dash_9(self): + dangerous, _, desc = detect_dangerous_command("killall -9 firefox") + assert dangerous is True + assert "kill" in desc.lower() + + def test_killall_dash_kill(self): + dangerous, _, _ = detect_dangerous_command("killall -KILL firefox") + assert dangerous is True + + def test_killall_dash_sigkill(self): + dangerous, _, _ = detect_dangerous_command("killall -SIGKILL firefox") + assert dangerous is True + + def test_killall_dash_s_kill(self): + dangerous, _, _ = detect_dangerous_command("killall -s KILL firefox") + assert dangerous is True + + def test_killall_dash_s_signum(self): + dangerous, _, _ = detect_dangerous_command("killall -s 9 firefox") + assert dangerous is True + + def test_killall_regex(self): + """killall -r is a broad sweep; require approval.""" + dangerous, _, desc = detect_dangerous_command("killall -r 'fire.*'") + assert dangerous is True + assert "regex" in desc.lower() or "kill" in desc.lower() + + def test_killall_combined_flags(self): + dangerous, _, _ = detect_dangerous_command("killall -9 -r 'herm.*'") + assert dangerous is True + + def test_killall_list_signals_is_safe(self): + """`killall -l` lists signals and is harmless — must not fire.""" + dangerous, _, _ = detect_dangerous_command("killall -l") + assert dangerous is False + + def test_killall_version_is_safe(self): + dangerous, _, _ = detect_dangerous_command("killall -V") + assert dangerous is False + + +class TestFindExecdir: + """Inspired by Claude Code 2.1.113 tightening of find rules. + + `find -execdir rm` has the same destructive effect as `find -exec rm` + but ran in each match's directory. Previously missed because the + pattern required a literal `-exec ` followed by a space. + """ + + def test_find_execdir_rm(self): + dangerous, _, desc = detect_dangerous_command( + "find . -execdir rm {} \\;" + ) + assert dangerous is True + assert "find" in desc.lower() or "rm" in desc.lower() + + def test_find_execdir_with_absolute_rm(self): + dangerous, _, _ = detect_dangerous_command( + "find /var -execdir /bin/rm -rf {} \\;" + ) + assert dangerous is True + + def test_find_exec_rm_still_caught(self): + """Original -exec pattern must still fire (regression guard).""" + dangerous, _, _ = detect_dangerous_command( + "find . -exec rm {} \\;" + ) + assert dangerous is True + + def test_find_execdir_ls_is_safe(self): + """-execdir with a read-only command is not dangerous.""" + dangerous, _, _ = detect_dangerous_command( + "find . -execdir ls {} \\;" + ) + assert dangerous is False + + +class TestEtcPatternsUnaffectedByRefactor: + """Regression guard: the /etc/ patterns were refactored to share the + _SYSTEM_CONFIG_PATH fragment with the /private/ mirror. Make sure the + existing /etc/ coverage remains identical. + """ + + def test_etc_redirect(self): + dangerous, _, _ = detect_dangerous_command("echo x > /etc/hosts") + assert dangerous is True + + def test_etc_cp(self): + dangerous, _, _ = detect_dangerous_command("cp evil /etc/hosts") + assert dangerous is True + + def test_etc_sed_inline(self): + dangerous, _, _ = detect_dangerous_command( + "sed -i 's/a/b/' /etc/hosts" + ) + assert dangerous is True + + def test_etc_tee(self): + dangerous, _, _ = detect_dangerous_command( + "echo x | tee /etc/hosts" + ) + assert dangerous is True + + def test_cat_etc_hostname_is_safe(self): + """Reading /etc/ files is safe — only writes require approval.""" + dangerous, _, _ = detect_dangerous_command("cat /etc/hostname") + assert dangerous is False + + def test_grep_etc_passwd_is_safe(self): + dangerous, _, _ = detect_dangerous_command("grep root /etc/passwd") + assert dangerous is False diff --git a/tests/tools/test_delegate.py b/tests/tools/test_delegate.py index 468fbdaf942..684f24f5da8 100644 --- a/tests/tools/test_delegate.py +++ b/tests/tools/test_delegate.py @@ -890,6 +890,63 @@ class TestDelegationCredentialResolution(unittest.TestCase): self.assertEqual(creds["api_key"], "local-key") self.assertEqual(creds["api_mode"], "chat_completions") + def test_direct_endpoint_auto_detects_anthropic_messages_suffix(self): + # Issue #10213: Azure AI Foundry exposes Anthropic-compatible models at + # a /anthropic URL suffix. Subagents must pick anthropic_messages + # automatically, matching the main agent's runtime resolver. + parent = _make_mock_parent(depth=0) + cfg = { + "model": "claude-opus-4-6", + "provider": "custom", + "base_url": "https://myfoundry.services.ai.azure.com/anthropic", + "api_key": "foundry-key", + } + creds = _resolve_delegation_credentials(cfg, parent) + self.assertEqual(creds["provider"], "custom") + self.assertEqual(creds["base_url"], "https://myfoundry.services.ai.azure.com/anthropic") + self.assertEqual(creds["api_key"], "foundry-key") + self.assertEqual(creds["api_mode"], "anthropic_messages") + + def test_direct_endpoint_honors_explicit_api_mode(self): + # When delegation.api_mode is set explicitly, it overrides URL-based + # detection so users can force a transport on non-standard endpoints. + parent = _make_mock_parent(depth=0) + cfg = { + "model": "claude-opus-4-6", + "provider": "custom", + "base_url": "https://proxy.example.com/v1", + "api_key": "proxy-key", + "api_mode": "anthropic_messages", + } + creds = _resolve_delegation_credentials(cfg, parent) + self.assertEqual(creds["api_mode"], "anthropic_messages") + + def test_direct_endpoint_explicit_api_mode_overrides_url_detection(self): + # Explicit api_mode in config always wins over auto-detection. + parent = _make_mock_parent(depth=0) + cfg = { + "model": "claude-opus-4-6", + "provider": "custom", + "base_url": "https://myfoundry.services.ai.azure.com/anthropic", + "api_key": "foundry-key", + "api_mode": "chat_completions", + } + creds = _resolve_delegation_credentials(cfg, parent) + self.assertEqual(creds["api_mode"], "chat_completions") + + def test_direct_endpoint_invalid_api_mode_falls_back_to_detection(self): + # An invalid api_mode string must not break detection; fall back to URL heuristic. + parent = _make_mock_parent(depth=0) + cfg = { + "model": "claude-opus-4-6", + "provider": "custom", + "base_url": "https://myfoundry.services.ai.azure.com/anthropic", + "api_key": "foundry-key", + "api_mode": "garbage", + } + creds = _resolve_delegation_credentials(cfg, parent) + self.assertEqual(creds["api_mode"], "anthropic_messages") + def test_direct_endpoint_returns_none_api_key_when_not_configured(self): # When base_url is set without api_key, api_key should be None so # _build_child_agent inherits the parent's key (effective_api_key = override or parent). diff --git a/tests/tools/test_mcp_tool.py b/tests/tools/test_mcp_tool.py index 7f6c3f6704c..0a094eb5467 100644 --- a/tests/tools/test_mcp_tool.py +++ b/tests/tools/test_mcp_tool.py @@ -3762,3 +3762,135 @@ class TestRegisterMcpServers: ) _servers.pop("srv", None) + + +# --------------------------------------------------------------------------- +# Tests for parallel tool call support (port from openai/codex#17667) +# --------------------------------------------------------------------------- + +class TestMcpParallelToolCalls: + """Tests for the supports_parallel_tool_calls config option.""" + + def test_is_mcp_tool_parallel_safe_non_mcp_tool(self): + """Non-MCP tool names always return False.""" + from tools.mcp_tool import is_mcp_tool_parallel_safe + assert is_mcp_tool_parallel_safe("web_search") is False + assert is_mcp_tool_parallel_safe("read_file") is False + assert is_mcp_tool_parallel_safe("terminal") is False + assert is_mcp_tool_parallel_safe("") is False + + def test_is_mcp_tool_parallel_safe_no_servers(self): + """MCP tool from unknown server returns False.""" + from tools.mcp_tool import is_mcp_tool_parallel_safe, _parallel_safe_servers, _lock + with _lock: + _parallel_safe_servers.clear() + assert is_mcp_tool_parallel_safe("mcp_docs_search") is False + + def test_is_mcp_tool_parallel_safe_with_flag(self): + """MCP tool from a parallel-safe server returns True.""" + from tools.mcp_tool import is_mcp_tool_parallel_safe, _parallel_safe_servers, _lock + with _lock: + _parallel_safe_servers.add("docs") + try: + assert is_mcp_tool_parallel_safe("mcp_docs_search") is True + assert is_mcp_tool_parallel_safe("mcp_docs_read_file") is True + # Different server should be False + assert is_mcp_tool_parallel_safe("mcp_github_list_repos") is False + finally: + with _lock: + _parallel_safe_servers.discard("docs") + + def test_is_mcp_tool_parallel_safe_server_with_underscores(self): + """Server names containing underscores are correctly matched.""" + from tools.mcp_tool import is_mcp_tool_parallel_safe, _parallel_safe_servers, _lock + with _lock: + _parallel_safe_servers.add("my_server") + try: + assert is_mcp_tool_parallel_safe("mcp_my_server_query") is True + finally: + with _lock: + _parallel_safe_servers.discard("my_server") + + def test_is_mcp_tool_parallel_safe_no_tool_suffix(self): + """Tool name that is just 'mcp_{server}' without a tool part returns False.""" + from tools.mcp_tool import is_mcp_tool_parallel_safe, _parallel_safe_servers, _lock + with _lock: + _parallel_safe_servers.add("docs") + try: + # "mcp_docs" has no tool part after the server name + assert is_mcp_tool_parallel_safe("mcp_docs") is False + # "mcp_docs_" has empty tool part + assert is_mcp_tool_parallel_safe("mcp_docs_") is False + finally: + with _lock: + _parallel_safe_servers.discard("docs") + + def test_register_mcp_servers_tracks_parallel_flag(self): + """register_mcp_servers populates _parallel_safe_servers from config.""" + from tools.mcp_tool import ( + register_mcp_servers, _parallel_safe_servers, _lock, + sanitize_mcp_name_component, + ) + fake_config = { + "parallel_srv": { + "command": "echo", + "supports_parallel_tool_calls": True, + }, + "serial_srv": { + "command": "echo", + "supports_parallel_tool_calls": False, + }, + "default_srv": { + "command": "echo", + # no supports_parallel_tool_calls key + }, + } + with patch("tools.mcp_tool._MCP_AVAILABLE", True), \ + patch("tools.mcp_tool._ensure_mcp_loop"), \ + patch("tools.mcp_tool._run_on_mcp_loop"), \ + patch("tools.mcp_tool._existing_tool_names", return_value=[]): + register_mcp_servers(fake_config) + + with _lock: + assert sanitize_mcp_name_component("parallel_srv") in _parallel_safe_servers + assert sanitize_mcp_name_component("serial_srv") not in _parallel_safe_servers + assert sanitize_mcp_name_component("default_srv") not in _parallel_safe_servers + # Cleanup + _parallel_safe_servers.discard(sanitize_mcp_name_component("parallel_srv")) + + def test_register_mcp_servers_removes_parallel_flag_on_toggle(self): + """Toggling supports_parallel_tool_calls to false removes server from the set.""" + from tools.mcp_tool import ( + register_mcp_servers, _parallel_safe_servers, _lock, + sanitize_mcp_name_component, + ) + + # First registration: parallel enabled + config_on = { + "toggle_srv": { + "command": "echo", + "supports_parallel_tool_calls": True, + }, + } + with patch("tools.mcp_tool._MCP_AVAILABLE", True), \ + patch("tools.mcp_tool._ensure_mcp_loop"), \ + patch("tools.mcp_tool._run_on_mcp_loop"), \ + patch("tools.mcp_tool._existing_tool_names", return_value=[]): + register_mcp_servers(config_on) + with _lock: + assert sanitize_mcp_name_component("toggle_srv") in _parallel_safe_servers + + # Second registration: parallel disabled + config_off = { + "toggle_srv": { + "command": "echo", + "supports_parallel_tool_calls": False, + }, + } + with patch("tools.mcp_tool._MCP_AVAILABLE", True), \ + patch("tools.mcp_tool._ensure_mcp_loop"), \ + patch("tools.mcp_tool._run_on_mcp_loop"), \ + patch("tools.mcp_tool._existing_tool_names", return_value=[]): + register_mcp_servers(config_off) + with _lock: + assert sanitize_mcp_name_component("toggle_srv") not in _parallel_safe_servers diff --git a/tests/tools/test_x_search_tool.py b/tests/tools/test_x_search_tool.py new file mode 100644 index 00000000000..7cbc4841a8a --- /dev/null +++ b/tests/tools/test_x_search_tool.py @@ -0,0 +1,438 @@ +"""Tests for the X (Twitter) Search tool backed by xAI Responses API. + +Covers: +- HTTP request shape (URL, headers, payload, model from config) +- Handle filter validation (allowed vs excluded mutual exclusion) +- Inline url_citation extraction from message annotations +- Structured error handling (4xx with code, 5xx retry, ReadTimeout retry) +- Credential resolution: API key path, OAuth path, both-set preference, none-set +- check_x_search_requirements gating in registry +""" + +import json + +import requests + + +class _FakeResponse: + def __init__(self, payload, *, status_code=200, text=None): + self._payload = payload + self.status_code = status_code + self.text = text if text is not None else json.dumps(payload) + + def raise_for_status(self): + if self.status_code >= 400: + err = requests.HTTPError(f"{self.status_code} Client Error") + err.response = self + raise err + + def json(self): + return self._payload + + +# --------------------------------------------------------------------------- +# Original PR #10786 test coverage (HTTP shape, handle validation, citations, +# retry behavior) — preserved verbatim. Uses XAI_API_KEY env var via the +# default resolver path. +# --------------------------------------------------------------------------- + +def test_x_search_posts_responses_request(monkeypatch): + from tools.x_search_tool import x_search_tool + from hermes_cli import __version__ + + captured = {} + + def _fake_post(url, headers=None, json=None, timeout=None): + captured["url"] = url + captured["headers"] = headers + captured["json"] = json + captured["timeout"] = timeout + return _FakeResponse( + { + "output_text": "People on X are discussing xAI's latest launch.", + "citations": [{"url": "https://x.com/example/status/1", "title": "Example post"}], + } + ) + + monkeypatch.setenv("XAI_API_KEY", "xai-test-key") + monkeypatch.setattr("requests.post", _fake_post) + + result = json.loads( + x_search_tool( + query="What are people saying about xAI on X?", + allowed_x_handles=["xai", "@grok"], + from_date="2026-04-01", + to_date="2026-04-10", + enable_image_understanding=True, + ) + ) + + tool_def = captured["json"]["tools"][0] + assert captured["url"] == "https://api.x.ai/v1/responses" + assert captured["headers"]["User-Agent"] == f"Hermes-Agent/{__version__}" + assert captured["json"]["model"] == "grok-4.20-reasoning" + assert captured["json"]["store"] is False + assert tool_def["type"] == "x_search" + assert tool_def["allowed_x_handles"] == ["xai", "grok"] + assert tool_def["from_date"] == "2026-04-01" + assert tool_def["to_date"] == "2026-04-10" + assert tool_def["enable_image_understanding"] is True + assert result["success"] is True + assert result["answer"] == "People on X are discussing xAI's latest launch." + + +def test_x_search_rejects_conflicting_handle_filters(monkeypatch): + from tools.x_search_tool import x_search_tool + + monkeypatch.setenv("XAI_API_KEY", "xai-test-key") + + result = json.loads( + x_search_tool( + query="latest xAI discussion", + allowed_x_handles=["xai"], + excluded_x_handles=["grok"], + ) + ) + + assert result["error"] == "allowed_x_handles and excluded_x_handles cannot be used together" + + +def test_x_search_extracts_inline_url_citations(monkeypatch): + from tools.x_search_tool import x_search_tool + + def _fake_post(url, headers=None, json=None, timeout=None): + return _FakeResponse( + { + "output": [ + { + "type": "message", + "content": [ + { + "type": "output_text", + "text": "xAI posted an update on X.", + "annotations": [ + { + "type": "url_citation", + "url": "https://x.com/xai/status/123", + "title": "xAI update", + "start_index": 0, + "end_index": 3, + } + ], + } + ], + } + ] + } + ) + + monkeypatch.setenv("XAI_API_KEY", "xai-test-key") + monkeypatch.setattr("requests.post", _fake_post) + + result = json.loads(x_search_tool(query="latest post from xai")) + + assert result["success"] is True + assert result["answer"] == "xAI posted an update on X." + assert result["inline_citations"] == [ + { + "url": "https://x.com/xai/status/123", + "title": "xAI update", + "start_index": 0, + "end_index": 3, + } + ] + + +def test_x_search_returns_structured_http_error(monkeypatch): + from tools.x_search_tool import x_search_tool + + class _FailingResponse: + status_code = 403 + text = '{"code":"forbidden","error":"x_search is not enabled for this model"}' + + def json(self): + return { + "code": "forbidden", + "error": "x_search is not enabled for this model", + } + + def raise_for_status(self): + err = requests.HTTPError("403 Client Error: Forbidden") + err.response = self + raise err + + monkeypatch.setenv("XAI_API_KEY", "xai-test-key") + monkeypatch.setattr("requests.post", lambda *a, **k: _FailingResponse()) + + result = json.loads(x_search_tool(query="latest xai discussion")) + + assert result["success"] is False + assert result["provider"] == "xai" + assert result["tool"] == "x_search" + assert result["error_type"] == "HTTPError" + assert result["error"] == "forbidden: x_search is not enabled for this model" + + +def test_x_search_retries_read_timeout_then_succeeds(monkeypatch): + from tools.x_search_tool import x_search_tool + + calls = {"count": 0} + + def _fake_post(url, headers=None, json=None, timeout=None): + calls["count"] += 1 + if calls["count"] == 1: + raise requests.ReadTimeout("timed out") + return _FakeResponse( + { + "output_text": "Recovered after retry.", + "citations": [], + } + ) + + monkeypatch.setenv("XAI_API_KEY", "xai-test-key") + monkeypatch.setattr("requests.post", _fake_post) + monkeypatch.setattr("tools.x_search_tool.time.sleep", lambda *_: None) + + result = json.loads(x_search_tool(query="grok xai")) + + assert calls["count"] == 2 + assert result["success"] is True + assert result["answer"] == "Recovered after retry." + + +def test_x_search_retries_5xx_then_succeeds(monkeypatch): + from tools.x_search_tool import x_search_tool + + calls = {"count": 0} + + def _fake_post(url, headers=None, json=None, timeout=None): + calls["count"] += 1 + if calls["count"] == 1: + return _FakeResponse( + {"code": "Internal error", "error": "Service temporarily unavailable."}, + status_code=500, + ) + return _FakeResponse({"output_text": "Recovered after 5xx retry."}) + + monkeypatch.setenv("XAI_API_KEY", "xai-test-key") + monkeypatch.setattr("requests.post", _fake_post) + monkeypatch.setattr("tools.x_search_tool.time.sleep", lambda *_: None) + + result = json.loads(x_search_tool(query="grok xai")) + + assert calls["count"] == 2 + assert result["success"] is True + assert result["answer"] == "Recovered after 5xx retry." + + +# --------------------------------------------------------------------------- +# Credential-resolution coverage — the OAuth-or-API-key gating contract. +# --------------------------------------------------------------------------- + +def _no_xai_env(monkeypatch): + """Strip any XAI_* env vars so the resolver doesn't see a leaked dev key.""" + for var in ("XAI_API_KEY", "XAI_BASE_URL", "HERMES_XAI_BASE_URL"): + monkeypatch.delenv(var, raising=False) + + +def test_x_search_uses_xai_oauth_when_only_oauth_available(monkeypatch): + """OAuth-only user: credential_source should be ``xai-oauth``.""" + from tools.registry import invalidate_check_fn_cache + from tools.x_search_tool import check_x_search_requirements, x_search_tool + + _no_xai_env(monkeypatch) + + def _fake_resolve(): + return { + "provider": "xai-oauth", + "api_key": "oauth-bearer-token", + "base_url": "https://api.x.ai/v1", + } + + monkeypatch.setattr( + "tools.x_search_tool.resolve_xai_http_credentials", _fake_resolve + ) + invalidate_check_fn_cache() + + assert check_x_search_requirements() is True + + captured = {} + + def _fake_post(url, headers=None, json=None, timeout=None): + captured["headers"] = headers + return _FakeResponse({"output_text": "Found posts via OAuth."}) + + monkeypatch.setattr("requests.post", _fake_post) + + result = json.loads(x_search_tool(query="anything about xai")) + + assert result["success"] is True + assert result["credential_source"] == "xai-oauth" + assert captured["headers"]["Authorization"] == "Bearer oauth-bearer-token" + + +def test_x_search_uses_api_key_when_only_xai_api_key_set(monkeypatch): + """API-key-only user: credential_source should be ``xai``.""" + from tools.registry import invalidate_check_fn_cache + from tools.x_search_tool import check_x_search_requirements, x_search_tool + + _no_xai_env(monkeypatch) + + def _fake_resolve(): + # Real ``resolve_xai_http_credentials`` returns ``"xai"`` when it + # falls through to the XAI_API_KEY env var path. + return { + "provider": "xai", + "api_key": "raw-api-key", + "base_url": "https://api.x.ai/v1", + } + + monkeypatch.setattr( + "tools.x_search_tool.resolve_xai_http_credentials", _fake_resolve + ) + invalidate_check_fn_cache() + + assert check_x_search_requirements() is True + + captured = {} + + def _fake_post(url, headers=None, json=None, timeout=None): + captured["headers"] = headers + return _FakeResponse({"output_text": "Found posts via API key."}) + + monkeypatch.setattr("requests.post", _fake_post) + + result = json.loads(x_search_tool(query="anything")) + + assert result["success"] is True + assert result["credential_source"] == "xai" + assert captured["headers"]["Authorization"] == "Bearer raw-api-key" + + +def test_x_search_prefers_oauth_when_both_available(monkeypatch): + """Both credentials present: OAuth wins (matches Teknium's billing preference). + + The real ordering is implemented in ``tools.xai_http.resolve_xai_http_credentials`` + — OAuth runtime first, fallback OAuth resolver second, ``XAI_API_KEY`` third. + This test exercises the contract by having the resolver return the OAuth + bearer (the ``xai-oauth`` ``provider`` tag is the marker). + """ + from tools.registry import invalidate_check_fn_cache + from tools.x_search_tool import x_search_tool + + monkeypatch.setenv("XAI_API_KEY", "raw-api-key") + + # Mimic xai_http's preference: OAuth wins, so we return the OAuth tuple + # even though XAI_API_KEY is also set. + def _fake_resolve(): + return { + "provider": "xai-oauth", + "api_key": "oauth-bearer-token", + "base_url": "https://api.x.ai/v1", + } + + monkeypatch.setattr( + "tools.x_search_tool.resolve_xai_http_credentials", _fake_resolve + ) + invalidate_check_fn_cache() + + captured = {} + + def _fake_post(url, headers=None, json=None, timeout=None): + captured["headers"] = headers + return _FakeResponse({"output_text": "OAuth preferred."}) + + monkeypatch.setattr("requests.post", _fake_post) + + result = json.loads(x_search_tool(query="anything")) + + assert result["credential_source"] == "xai-oauth" + assert captured["headers"]["Authorization"] == "Bearer oauth-bearer-token" + + +def test_x_search_returns_tool_error_when_no_credentials(monkeypatch): + """No credentials anywhere: tool returns a clear error, not a 401 from xAI.""" + from tools.registry import invalidate_check_fn_cache + from tools.x_search_tool import check_x_search_requirements, x_search_tool + + _no_xai_env(monkeypatch) + + def _fake_resolve(): + return { + "provider": "xai", + "api_key": "", + "base_url": "https://api.x.ai/v1", + } + + monkeypatch.setattr( + "tools.x_search_tool.resolve_xai_http_credentials", _fake_resolve + ) + invalidate_check_fn_cache() + + assert check_x_search_requirements() is False + + # If a model somehow invokes the tool despite a False check_fn, the call + # surfaces a friendly error rather than an HTTP exception. + result = x_search_tool(query="anything") + assert "No xAI credentials available" in result + assert "hermes auth add xai-oauth" in result + + +def test_x_search_check_fn_false_when_resolver_raises(monkeypatch): + """Resolver exceptions (e.g. expired token + failed refresh) gate the tool out.""" + from tools.registry import invalidate_check_fn_cache + from tools.x_search_tool import check_x_search_requirements + + _no_xai_env(monkeypatch) + + def _boom(): + raise RuntimeError("token revoked and refresh failed") + + monkeypatch.setattr( + "tools.x_search_tool.resolve_xai_http_credentials", _boom + ) + invalidate_check_fn_cache() + + assert check_x_search_requirements() is False + + +def test_x_search_honors_config_model_and_timeout(monkeypatch, tmp_path): + """``x_search.model`` and ``x_search.timeout_seconds`` override the defaults.""" + from tools.x_search_tool import x_search_tool + + monkeypatch.setenv("XAI_API_KEY", "xai-test-key") + + # Patch the in-module config loader so tests don't touch ~/.hermes/config.yaml. + monkeypatch.setattr( + "tools.x_search_tool._load_x_search_config", + lambda: {"model": "grok-custom-test", "timeout_seconds": 45, "retries": 0}, + ) + + captured = {} + + def _fake_post(url, headers=None, json=None, timeout=None): + captured["model"] = json["model"] + captured["timeout"] = timeout + return _FakeResponse({"output_text": "Custom model OK."}) + + monkeypatch.setattr("requests.post", _fake_post) + + result = json.loads(x_search_tool(query="anything")) + + assert result["success"] is True + assert captured["model"] == "grok-custom-test" + assert captured["timeout"] == 45 + + +def test_x_search_registered_in_registry_with_check_fn(): + """The tool is registered under the x_search toolset with the gating check_fn.""" + import tools.x_search_tool # noqa: F401 — ensures registration runs + from tools.registry import registry + + entry = registry.get_entry("x_search") + assert entry is not None + assert entry.toolset == "x_search" + assert entry.check_fn is not None + assert entry.check_fn.__name__ == "check_x_search_requirements" + assert "XAI_API_KEY" in entry.requires_env + assert entry.emoji == "🐦" diff --git a/tools/approval.py b/tools/approval.py index 84d02cc6a98..cf5df644ff8 100644 --- a/tools/approval.py +++ b/tools/approval.py @@ -133,8 +133,19 @@ _CREDENTIAL_FILES = ( r'(?:~|\$home|\$\{home\})/\.' r'(?:netrc|pgpass|npmrc|pypirc)\b' ) +# macOS: /etc, /var, /tmp, /home are symlinks to /private/{etc,var,tmp,home}. +# A command written to target /private/etc/sudoers works identically to +# /etc/sudoers on macOS but bypasses a plain "/etc/" pattern check. Match +# both forms. Inspired by Claude Code 2.1.113's "dangerous path protection". +_MACOS_PRIVATE_SYSTEM_PATH = r'/private/(?:etc|var|tmp|home)/' +# System-config paths that should trigger approval for any write/edit, +# collapsing /etc, its macOS /private/etc mirror, and /etc/sudoers.d/ into +# one shared fragment so new DANGEROUS_PATTERNS stay consistent. +_SYSTEM_CONFIG_PATH = ( + rf'(?:/etc/|{_MACOS_PRIVATE_SYSTEM_PATH})' +) _SENSITIVE_WRITE_TARGET = ( - r'(?:/etc/|/dev/sd|' + rf'(?:{_SYSTEM_CONFIG_PATH}|/dev/sd|' rf'{_SSH_SENSITIVE_PATH}|' rf'{_HERMES_ENV_PATH}|' rf'{_SHELL_RC_FILES}|' @@ -318,10 +329,17 @@ DANGEROUS_PATTERNS = [ # *next* line to satisfy the negative lookahead, silently allowing DELETE without WHERE. (r'\bDELETE\s+FROM\b(?![^\n]*\bWHERE\b)', "SQL DELETE without WHERE"), (r'\bTRUNCATE\s+(TABLE)?\s*\w', "SQL TRUNCATE"), - (r'>\s*/etc/', "overwrite system config"), + (rf'>\s*{_SYSTEM_CONFIG_PATH}', "overwrite system config"), (r'\bsystemctl\s+(-[^\s]+\s+)*(stop|restart|disable|mask)\b', "stop/restart system service"), (r'\bkill\s+-9\s+-1\b', "kill all processes"), (r'\bpkill\s+-9\b', "force kill processes"), + # killall with SIGKILL (parallel to pkill -9). Catches -9 / -KILL / + # -s KILL / -SIGKILL forms, and also `killall -r ` broad sweeps + # that can wipe out unrelated processes by accident. + # Inspired by Claude Code 2.1.113 expanded deny rules. + (r'\bkillall\s+(-[^\s]*\s+)*-(9|KILL|SIGKILL)\b', "force kill processes (killall -KILL)"), + (r'\bkillall\s+(-[^\s]*\s+)*-s\s+(KILL|SIGKILL|9)\b', "force kill processes (killall -s KILL)"), + (r'\bkillall\s+(-[^\s]*\s+)*-r\b', "kill processes by regex (killall -r)"), (r':\(\)\s*\{\s*:\s*\|\s*:\s*&\s*\}\s*;\s*:', "fork bomb"), # Any shell invocation via -c or combined flags like -lc, -ic, etc. (r'\b(bash|sh|zsh|ksh)\s+-[^\s]*c(\s+|$)', "shell command via -c/-lc flag"), @@ -333,7 +351,11 @@ DANGEROUS_PATTERNS = [ (rf'\btee\b.*["\']?{_PROJECT_SENSITIVE_WRITE_TARGET}["\']?{_COMMAND_TAIL}', "overwrite project env/config via tee"), (rf'>>?\s*["\']?{_PROJECT_SENSITIVE_WRITE_TARGET}["\']?{_COMMAND_TAIL}', "overwrite project env/config via redirection"), (r'\bxargs\s+.*\brm\b', "xargs with rm"), - (r'\bfind\b.*-exec\s+(/\S*/)?rm\b', "find -exec rm"), + # find -exec rm / -execdir rm — the -execdir variant (same semantics, + # runs in the directory of each match) was previously missed. Claude + # Code 2.1.113 tightened their equivalent find rule to stop auto- + # approving -exec / -delete flags. + (r'\bfind\b.*-exec(?:dir)?\s+(/\S*/)?rm\b', "find -exec/-execdir rm"), (r'\bfind\b.*-delete\b', "find -delete"), # Gateway lifecycle protection: prevent the agent from killing its own # gateway process. These commands trigger a gateway restart/stop that @@ -351,11 +373,12 @@ DANGEROUS_PATTERNS = [ # to regex at detection time. Catch the structural pattern instead. (r'\bkill\b.*\$\(\s*pgrep\b', "kill process via pgrep expansion (self-termination)"), (r'\bkill\b.*`\s*pgrep\b', "kill process via backtick pgrep expansion (self-termination)"), - # File copy/move/edit into sensitive system paths - (r'\b(cp|mv|install)\b.*\s/etc/', "copy/move file into /etc/"), + # File copy/move/edit into sensitive system paths (/etc/ and macOS + # /private/etc/ mirror). + (rf'\b(cp|mv|install)\b.*\s{_SYSTEM_CONFIG_PATH}', "copy/move file into system config path"), (rf'\b(cp|mv|install)\b.*\s["\']?{_PROJECT_SENSITIVE_WRITE_TARGET}["\']?{_COMMAND_TAIL}', "overwrite project env/config file"), - (r'\bsed\s+-[^\s]*i.*\s/etc/', "in-place edit of system config"), - (r'\bsed\s+--in-place\b.*\s/etc/', "in-place edit of system config (long flag)"), + (rf'\bsed\s+-[^\s]*i.*\s{_SYSTEM_CONFIG_PATH}', "in-place edit of system config"), + (rf'\bsed\s+--in-place\b.*\s{_SYSTEM_CONFIG_PATH}', "in-place edit of system config (long flag)"), # Script execution via heredoc — bypasses the -e/-c flag patterns above. # `python3 << 'EOF'` feeds arbitrary code via stdin without -c/-e flags. (r'\b(python[23]?|perl|ruby|node)\s+<<', "script execution via heredoc"), diff --git a/tools/delegate_tool.py b/tools/delegate_tool.py index f3a037c4341..136ea63ac40 100644 --- a/tools/delegate_tool.py +++ b/tools/delegate_tool.py @@ -2362,6 +2362,7 @@ def _resolve_delegation_credentials(cfg: dict, parent_agent) -> dict: configured_provider = str(cfg.get("provider") or "").strip() or None configured_base_url = str(cfg.get("base_url") or "").strip() or None configured_api_key = str(cfg.get("api_key") or "").strip() or None + configured_api_mode = str(cfg.get("api_mode") or "").strip().lower() or None if configured_base_url: # When delegation.api_key is not set, return None so _build_child_agent @@ -2372,9 +2373,17 @@ def _resolve_delegation_credentials(cfg: dict, parent_agent) -> dict: # callers to duplicate the key under delegation.api_key. api_key = configured_api_key # None → inherited from parent in _build_child_agent + # Use the shared URL-based api_mode detector (same path the main agent's + # runtime resolver uses) so Anthropic-compatible direct endpoints with a + # /anthropic suffix — Azure AI Foundry, MiniMax, Zhipu GLM, LiteLLM + # proxies — pick the right transport automatically. Without this, + # subagents would default to chat_completions and hit 404s on endpoints + # that only speak the Anthropic Messages protocol. Fixes #10213. + from hermes_cli.runtime_provider import _detect_api_mode_for_url + base_lower = configured_base_url.lower() provider = "custom" - api_mode = "chat_completions" + api_mode = _detect_api_mode_for_url(configured_base_url) or "chat_completions" if ( base_url_hostname(configured_base_url) == "chatgpt.com" and "/backend-api/codex" in base_lower @@ -2388,6 +2397,11 @@ def _resolve_delegation_credentials(cfg: dict, parent_agent) -> dict: provider = "custom" api_mode = "anthropic_messages" + # Explicit delegation.api_mode in config always wins. Lets users force + # a transport for non-standard endpoints the URL heuristic can't detect. + if configured_api_mode in {"chat_completions", "codex_responses", "anthropic_messages"}: + api_mode = configured_api_mode + return { "model": configured_model, "provider": provider, diff --git a/tools/lazy_deps.py b/tools/lazy_deps.py index 258a09ef667..faaf7ec42bf 100644 --- a/tools/lazy_deps.py +++ b/tools/lazy_deps.py @@ -78,7 +78,7 @@ LAZY_DEPS: dict[str, tuple[str, ...]] = { # ─── Inference providers ─────────────────────────────────────────────── # Native Anthropic SDK — needed when provider=anthropic (not via # OpenRouter / aggregators which use the openai SDK). - "provider.anthropic": ("anthropic==0.86.0",), + "provider.anthropic": ("anthropic==0.87.0",), # CVE-2026-34450, CVE-2026-34452 # AWS Bedrock provider "provider.bedrock": ("boto3==1.42.89",), @@ -125,7 +125,7 @@ LAZY_DEPS: dict[str, tuple[str, ...]] = { "platform.slack": ( "slack-bolt==1.27.0", "slack-sdk==3.40.1", - "aiohttp==3.13.3", + "aiohttp==3.13.4", # CVE-2026-34513/34518/34519/34520/34525 ), "platform.matrix": ( "mautrix[encryption]==0.21.0", diff --git a/tools/mcp_tool.py b/tools/mcp_tool.py index ba104cc4273..b24bb9705ad 100644 --- a/tools/mcp_tool.py +++ b/tools/mcp_tool.py @@ -24,6 +24,7 @@ Example config:: args: ["-y", "@modelcontextprotocol/server-github"] env: GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_..." + supports_parallel_tool_calls: true # tools from this server may run concurrently remote_api: url: "https://my-mcp-server.example.com/mcp" headers: @@ -56,6 +57,8 @@ Features: - Thread-safe architecture with dedicated background event loop - Sampling support: MCP servers can request LLM completions via sampling/createMessage (text and tool-use responses) + - Parallel tool call opt-in: per-server ``supports_parallel_tool_calls`` + flag allows concurrent execution of tools from the same server Architecture: A dedicated background event loop (_mcp_loop) runs in a daemon thread. @@ -1976,11 +1979,16 @@ def _handle_session_expired_and_retry( return None +# Sanitized server names whose ``supports_parallel_tool_calls`` config is True. +# Populated during ``register_mcp_servers()`` and queried by +# ``is_mcp_tool_parallel_safe()`` for the parallel-execution check in run_agent. +_parallel_safe_servers: set = set() + # Dedicated event loop running in a background daemon thread. _mcp_loop: Optional[asyncio.AbstractEventLoop] = None _mcp_thread: Optional[threading.Thread] = None -# Protects _mcp_loop, _mcp_thread, _servers, and _stdio_pids. +# Protects _mcp_loop, _mcp_thread, _servers, _parallel_safe_servers, and _stdio_pids. _lock = threading.Lock() # PIDs of stdio MCP server subprocesses. Tracked so we can force-kill @@ -3098,6 +3106,12 @@ def register_mcp_servers(servers: Dict[str, dict]) -> List[str]: for k, v in servers.items() if k not in _servers and _parse_boolish(v.get("enabled", True), default=True) } + # Track which servers opt-in to parallel tool calls (idempotent). + for srv_name, srv_cfg in servers.items(): + if _parse_boolish(srv_cfg.get("supports_parallel_tool_calls", False), default=False): + _parallel_safe_servers.add(sanitize_mcp_name_component(srv_name)) + else: + _parallel_safe_servers.discard(sanitize_mcp_name_component(srv_name)) if not new_servers: return _existing_tool_names() @@ -3208,6 +3222,29 @@ def discover_mcp_tools() -> List[str]: return tool_names +def is_mcp_tool_parallel_safe(tool_name: str) -> bool: + """Check if an MCP tool belongs to a server that supports parallel tool calls. + + MCP tool names follow the pattern ``mcp_{server}_{tool}``. This extracts + the server component and checks it against the set of servers whose config + includes ``supports_parallel_tool_calls: true``. + + Returns False for non-MCP tools or tools from servers without the flag. + """ + if not tool_name.startswith("mcp_"): + return False + # Strip the "mcp_" prefix and extract the server name. + # Tool names are: mcp_{sanitized_server}_{sanitized_tool} + # We need to check all possible server prefixes because the server name + # itself may contain underscores after sanitization. + rest = tool_name[4:] # strip "mcp_" + with _lock: + for server_name in _parallel_safe_servers: + if rest.startswith(server_name + "_") and len(rest) > len(server_name) + 1: + return True + return False + + def get_mcp_status() -> List[dict]: """Return status of all configured MCP servers for banner display. diff --git a/tools/registry.py b/tools/registry.py index 9cac53084bd..7bb92e85f96 100644 --- a/tools/registry.py +++ b/tools/registry.py @@ -244,8 +244,16 @@ class ToolRegistry: emoji: str = "", max_result_size_chars: int | float | None = None, dynamic_schema_overrides: Callable = None, + override: bool = False, ): - """Register a tool. Called at module-import time by each tool file.""" + """Register a tool. Called at module-import time by each tool file. + + ``override=True`` is an explicit opt-in for plugins that intend to + replace an existing built-in tool implementation (e.g. swap the + default browser tool for a headed-Chrome CDP backend). Without it, + registrations that would shadow an existing tool from a different + toolset are rejected to prevent accidental overwrites. + """ with self._lock: existing = self._tools.get(name) if existing and existing.toolset != toolset: @@ -260,13 +268,22 @@ class ToolRegistry: "Tool '%s': MCP toolset '%s' overwriting MCP toolset '%s'", name, toolset, existing.toolset, ) + elif override: + # Explicit plugin opt-in: replace the existing tool. + # Logged at INFO so the override is auditable in agent.log. + logger.info( + "Tool '%s': toolset '%s' overriding existing toolset '%s' " + "(override=True opt-in)", + name, toolset, existing.toolset, + ) else: # Reject shadowing — prevent plugins/MCP from overwriting # built-in tools or vice versa. logger.error( "Tool registration REJECTED: '%s' (toolset '%s') would " - "shadow existing tool from toolset '%s'. Deregister the " - "existing tool first if this is intentional.", + "shadow existing tool from toolset '%s'. Pass " + "override=True to register() if the replacement is " + "intentional, or deregister the existing tool first.", name, toolset, existing.toolset, ) return @@ -387,7 +404,16 @@ class ToolRegistry: return entry.handler(args, **kwargs) except Exception as e: logger.exception("Tool %s dispatch error: %s", name, e) - return json.dumps({"error": f"Tool execution failed: {type(e).__name__}: {e}"}) + # Route through the sanitizer so framing tokens / CDATA / fences + # in exception strings don't reach the model as structural noise. + # See model_tools._sanitize_tool_error for rationale. + raw = f"Tool execution failed: {type(e).__name__}: {e}" + try: + from model_tools import _sanitize_tool_error + sanitized = _sanitize_tool_error(raw) + except Exception: + sanitized = raw # defensive: never let the sanitizer block error propagation + return json.dumps({"error": sanitized}) # ------------------------------------------------------------------ # Query helpers (replace redundant dicts in model_tools.py) diff --git a/tools/x_search_tool.py b/tools/x_search_tool.py new file mode 100644 index 00000000000..8b242ee0ca8 --- /dev/null +++ b/tools/x_search_tool.py @@ -0,0 +1,424 @@ +#!/usr/bin/env python3 +"""X Search tool backed by xAI's built-in ``x_search`` Responses API tool. + +Authentication +-------------- +The tool registers when **either** xAI credential path is available: + +* ``XAI_API_KEY`` is set in ``~/.hermes/.env`` or the process environment + (paid xAI API key), OR +* The user is signed in via xAI Grok OAuth — SuperGrok subscription — + i.e. ``hermes auth add xai-oauth`` has been run and the stored refresh + token still works. + +Credential preference at call time matches +:func:`tools.xai_http.resolve_xai_http_credentials`: SuperGrok OAuth first, +direct OAuth resolver second, ``XAI_API_KEY`` last. That helper also +auto-refreshes the OAuth access token when it's within the refresh skew +window, so a ``True`` from :func:`check_x_search_requirements` means the +bearer is fetchable AND non-empty. + +Salvaged from PR #10786 (originally by @Jaaneek); credential resolution +reworked to honor both auth modes per Teknium's design. +""" + +from __future__ import annotations + +import json +import logging +import os +import time +from typing import Any, Dict, List, Optional, Tuple + +import requests + +from tools.registry import registry, tool_error +from tools.xai_http import hermes_xai_user_agent, resolve_xai_http_credentials + +logger = logging.getLogger(__name__) + +DEFAULT_XAI_BASE_URL = "https://api.x.ai/v1" +DEFAULT_X_SEARCH_MODEL = "grok-4.20-reasoning" +DEFAULT_X_SEARCH_TIMEOUT_SECONDS = 180 +DEFAULT_X_SEARCH_RETRIES = 2 +MAX_HANDLES = 10 + + +# --------------------------------------------------------------------------- +# Config +# --------------------------------------------------------------------------- + +def _load_x_search_config() -> Dict[str, Any]: + try: + from hermes_cli.config import load_config + + return load_config().get("x_search", {}) or {} + except Exception: + return {} + + +def _get_x_search_model() -> str: + cfg = _load_x_search_config() + return (str(cfg.get("model") or "").strip() or DEFAULT_X_SEARCH_MODEL) + + +def _get_x_search_timeout_seconds() -> int: + cfg = _load_x_search_config() + raw_value = cfg.get("timeout_seconds", DEFAULT_X_SEARCH_TIMEOUT_SECONDS) + try: + return max(30, int(raw_value)) + except Exception: + return DEFAULT_X_SEARCH_TIMEOUT_SECONDS + + +def _get_x_search_retries() -> int: + cfg = _load_x_search_config() + raw_value = cfg.get("retries", DEFAULT_X_SEARCH_RETRIES) + try: + return max(0, int(raw_value)) + except Exception: + return DEFAULT_X_SEARCH_RETRIES + + +# --------------------------------------------------------------------------- +# Credential resolution +# --------------------------------------------------------------------------- + +def _resolve_xai_bearer() -> Tuple[str, str, str]: + """Return ``(api_key, base_url, source)``. + + ``source`` is one of ``"xai-oauth"`` or ``"xai"`` so callers (and tests) + can tell which credential path won. Raises ``RuntimeError`` if no usable + credential is available — the registered :func:`check_x_search_requirements` + gate makes that case unreachable in normal operation, but the runtime + check exists so a credential that expires between registration and + invocation produces a clean tool error instead of a 401. + """ + creds = resolve_xai_http_credentials() + api_key = str(creds.get("api_key") or "").strip() + if not api_key: + raise RuntimeError( + "No xAI credentials available. Run `hermes auth add xai-oauth` " + "to sign in with your SuperGrok subscription, or set XAI_API_KEY." + ) + base_url = str(creds.get("base_url") or DEFAULT_XAI_BASE_URL).strip().rstrip("/") + source = str(creds.get("provider") or "xai") + return api_key, base_url, source + + +def check_x_search_requirements() -> bool: + """Return True when xAI credentials are available AND valid. + + ``resolve_xai_http_credentials`` calls + :func:`hermes_cli.auth.resolve_xai_oauth_runtime_credentials` which + auto-refreshes the OAuth access token if it's expiring; a successful + return therefore implies a usable bearer. + """ + try: + creds = resolve_xai_http_credentials() + return bool(str(creds.get("api_key") or "").strip()) + except Exception: + return False + + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + +def _normalize_handles(handles: Optional[List[str]], field_name: str) -> List[str]: + cleaned: List[str] = [] + for handle in handles or []: + normalized = str(handle or "").strip().lstrip("@") + if normalized: + cleaned.append(normalized) + if len(cleaned) > MAX_HANDLES: + raise ValueError(f"{field_name} supports at most {MAX_HANDLES} handles") + return cleaned + + +def _extract_response_text(payload: Dict[str, Any]) -> str: + output_text = str(payload.get("output_text") or "").strip() + if output_text: + return output_text + + parts: List[str] = [] + for item in payload.get("output", []) or []: + if item.get("type") != "message": + continue + for content in item.get("content", []) or []: + ctype = content.get("type") + if ctype in ("output_text", "text"): + text = str(content.get("text") or "").strip() + if text: + parts.append(text) + return "\n\n".join(parts).strip() + + +def _extract_inline_citations(payload: Dict[str, Any]) -> List[Dict[str, Any]]: + citations: List[Dict[str, Any]] = [] + for item in payload.get("output", []) or []: + if item.get("type") != "message": + continue + for content in item.get("content", []) or []: + for annotation in content.get("annotations", []) or []: + if annotation.get("type") != "url_citation": + continue + citations.append( + { + "url": annotation.get("url", ""), + "title": annotation.get("title", ""), + "start_index": annotation.get("start_index"), + "end_index": annotation.get("end_index"), + } + ) + return citations + + +def _http_error_message(exc: requests.HTTPError) -> str: + response = getattr(exc, "response", None) + if response is None: + return str(exc) + + try: + payload = response.json() + except Exception: + payload = None + + if isinstance(payload, dict): + code = str(payload.get("code") or "").strip() + error = str(payload.get("error") or "").strip() + message = error or str(payload) + if code and code not in message: + message = f"{code}: {message}" + return message or str(exc) + + text = str(getattr(response, "text", "") or "").strip() + if text: + return text[:500] + return str(exc) + + +# --------------------------------------------------------------------------- +# Tool implementation +# --------------------------------------------------------------------------- + +def x_search_tool( + query: str, + allowed_x_handles: Optional[List[str]] = None, + excluded_x_handles: Optional[List[str]] = None, + from_date: str = "", + to_date: str = "", + enable_image_understanding: bool = False, + enable_video_understanding: bool = False, +) -> str: + if not query or not query.strip(): + return tool_error("query is required for x_search") + + try: + api_key, base_url, source = _resolve_xai_bearer() + except RuntimeError as exc: + return tool_error(str(exc)) + + try: + allowed = _normalize_handles(allowed_x_handles, "allowed_x_handles") + excluded = _normalize_handles(excluded_x_handles, "excluded_x_handles") + if allowed and excluded: + return tool_error("allowed_x_handles and excluded_x_handles cannot be used together") + + tool_def: Dict[str, Any] = {"type": "x_search"} + if allowed: + tool_def["allowed_x_handles"] = allowed + if excluded: + tool_def["excluded_x_handles"] = excluded + if from_date.strip(): + tool_def["from_date"] = from_date.strip() + if to_date.strip(): + tool_def["to_date"] = to_date.strip() + if enable_image_understanding: + tool_def["enable_image_understanding"] = True + if enable_video_understanding: + tool_def["enable_video_understanding"] = True + + payload = { + "model": _get_x_search_model(), + "input": [ + { + "role": "user", + "content": query.strip(), + } + ], + "tools": [tool_def], + "store": False, + } + + timeout_seconds = _get_x_search_timeout_seconds() + max_retries = _get_x_search_retries() + response: Optional[requests.Response] = None + for attempt in range(max_retries + 1): + try: + response = requests.post( + f"{base_url}/responses", + headers={ + "Authorization": f"Bearer {api_key}", + "Content-Type": "application/json", + "User-Agent": hermes_xai_user_agent(), + }, + json=payload, + timeout=timeout_seconds, + ) + response.raise_for_status() + break + except requests.HTTPError as e: + status_code = getattr(getattr(e, "response", None), "status_code", None) + if status_code is None or status_code < 500 or attempt >= max_retries: + raise + logger.warning( + "x_search upstream failure on attempt %s/%s: %s", + attempt + 1, + max_retries + 1, + _http_error_message(e), + ) + time.sleep(min(5.0, 1.5 * (attempt + 1))) + except (requests.ReadTimeout, requests.ConnectionError) as e: + if attempt >= max_retries: + raise + logger.warning( + "x_search transient failure on attempt %s/%s: %s", + attempt + 1, + max_retries + 1, + e, + ) + time.sleep(min(5.0, 1.5 * (attempt + 1))) + + if response is None: + raise RuntimeError("x_search request did not return a response") + + data = response.json() + + answer = _extract_response_text(data) + citations = list(data.get("citations") or []) + inline_citations = _extract_inline_citations(data) + + return json.dumps( + { + "success": True, + "provider": "xai", + "credential_source": source, + "tool": "x_search", + "model": payload["model"], + "query": query.strip(), + "answer": answer, + "citations": citations, + "inline_citations": inline_citations, + }, + ensure_ascii=False, + ) + except requests.HTTPError as e: + logger.error("x_search failed: %s", e, exc_info=True) + return json.dumps( + { + "success": False, + "provider": "xai", + "tool": "x_search", + "error": _http_error_message(e), + "error_type": type(e).__name__, + }, + ensure_ascii=False, + ) + except requests.ReadTimeout as e: + logger.error("x_search timed out: %s", e, exc_info=True) + return json.dumps( + { + "success": False, + "provider": "xai", + "tool": "x_search", + "error": f"xAI x_search timed out after {_get_x_search_timeout_seconds()} seconds", + "error_type": type(e).__name__, + }, + ensure_ascii=False, + ) + except Exception as e: + logger.error("x_search failed: %s", e, exc_info=True) + return json.dumps( + { + "success": False, + "provider": "xai", + "tool": "x_search", + "error": str(e), + "error_type": type(e).__name__, + }, + ensure_ascii=False, + ) + + +X_SEARCH_SCHEMA = { + "name": "x_search", + "description": ( + "Search X (Twitter) posts, profiles, and threads using xAI's built-in " + "X Search tool. Use this for current discussion, reactions, or claims " + "on X rather than general web pages. Available when xAI credentials " + "are configured (SuperGrok OAuth or XAI_API_KEY)." + ), + "parameters": { + "type": "object", + "properties": { + "query": { + "type": "string", + "description": "What to look up on X.", + }, + "allowed_x_handles": { + "type": "array", + "items": {"type": "string"}, + "description": "Optional list of X handles to include exclusively (max 10).", + }, + "excluded_x_handles": { + "type": "array", + "items": {"type": "string"}, + "description": "Optional list of X handles to exclude (max 10).", + }, + "from_date": { + "type": "string", + "description": "Optional start date in YYYY-MM-DD format.", + }, + "to_date": { + "type": "string", + "description": "Optional end date in YYYY-MM-DD format.", + }, + "enable_image_understanding": { + "type": "boolean", + "description": "Whether xAI should analyze images attached to matching X posts.", + "default": False, + }, + "enable_video_understanding": { + "type": "boolean", + "description": "Whether xAI should analyze videos attached to matching X posts.", + "default": False, + }, + }, + "required": ["query"], + }, +} + + +def _handle_x_search(args, **kw): + return x_search_tool( + query=args.get("query", ""), + allowed_x_handles=args.get("allowed_x_handles"), + excluded_x_handles=args.get("excluded_x_handles"), + from_date=args.get("from_date", ""), + to_date=args.get("to_date", ""), + enable_image_understanding=bool(args.get("enable_image_understanding", False)), + enable_video_understanding=bool(args.get("enable_video_understanding", False)), + ) + + +registry.register( + name="x_search", + toolset="x_search", + schema=X_SEARCH_SCHEMA, + handler=_handle_x_search, + check_fn=check_x_search_requirements, + requires_env=["XAI_API_KEY"], + emoji="🐦", + max_result_size_chars=100_000, +) diff --git a/toolsets.py b/toolsets.py index 8ec45f11a2f..5de07e4c7a1 100644 --- a/toolsets.py +++ b/toolsets.py @@ -88,6 +88,17 @@ TOOLSETS = { "tools": ["web_search"], "includes": [] }, + + "x_search": { + "description": ( + "Search X (Twitter) posts and threads via xAI's built-in " + "x_search Responses tool. Available when xAI credentials are " + "configured (SuperGrok OAuth or XAI_API_KEY). Off by default; " + "enable in `hermes tools` → X (Twitter) Search." + ), + "tools": ["x_search"], + "includes": [] + }, "vision": { "description": "Image analysis and vision tools", diff --git a/ui-tui/packages/hermes-ink/index.d.ts b/ui-tui/packages/hermes-ink/index.d.ts index 637c4bb43b6..5d5ae9387c0 100644 --- a/ui-tui/packages/hermes-ink/index.d.ts +++ b/ui-tui/packages/hermes-ink/index.d.ts @@ -21,6 +21,7 @@ export { default as Text } from './src/ink/components/Text.tsx' export type { Props as TextProps } from './src/ink/components/Text.tsx' export type { Key } from './src/ink/events/input-event.ts' export { default as useApp } from './src/ink/hooks/use-app.ts' +export { useCursorAdvance } from './src/ink/hooks/use-cursor-advance.ts' export { useDeclaredCursor } from './src/ink/hooks/use-declared-cursor.ts' export { default as useInput } from './src/ink/hooks/use-input.ts' export { useHasSelection, useSelection } from './src/ink/hooks/use-selection.ts' diff --git a/ui-tui/packages/hermes-ink/src/entry-exports.ts b/ui-tui/packages/hermes-ink/src/entry-exports.ts index 355faa16f97..d173e0c9bb1 100644 --- a/ui-tui/packages/hermes-ink/src/entry-exports.ts +++ b/ui-tui/packages/hermes-ink/src/entry-exports.ts @@ -12,6 +12,7 @@ export { default as ScrollBox } from './ink/components/ScrollBox.js' export { default as Spacer } from './ink/components/Spacer.js' export { default as Text } from './ink/components/Text.js' export { default as useApp } from './ink/hooks/use-app.js' +export { useCursorAdvance } from './ink/hooks/use-cursor-advance.js' export { useDeclaredCursor } from './ink/hooks/use-declared-cursor.js' export { type RunExternalProcess, useExternalProcess, withInkSuspended } from './ink/hooks/use-external-process.js' export { default as useInput } from './ink/hooks/use-input.js' diff --git a/ui-tui/packages/hermes-ink/src/ink/components/App.tsx b/ui-tui/packages/hermes-ink/src/ink/components/App.tsx index 5851c4bef66..54892e3b7b1 100644 --- a/ui-tui/packages/hermes-ink/src/ink/components/App.tsx +++ b/ui-tui/packages/hermes-ink/src/ink/components/App.tsx @@ -33,6 +33,7 @@ import { DBP, DFE, DISABLE_MOUSE_TRACKING, EBP, EFE, SHOW_CURSOR } from '../term import AppContext from './AppContext.js' import { ClockProvider } from './ClockContext.js' +import CursorAdvanceContext, { type CursorAdvanceNotifier } from './CursorAdvanceContext.js' import CursorDeclarationContext, { type CursorDeclarationSetter } from './CursorDeclarationContext.js' import ErrorOverview from './ErrorOverview.js' import StdinContext from './StdinContext.js' @@ -100,6 +101,18 @@ type Props = { // Enables IME composition at the input caret and lets screen readers / // magnifiers track the input. Optional so testing.tsx doesn't stub it. readonly onCursorDeclaration?: CursorDeclarationSetter + // Receives notifications that the physical cursor was advanced out-of-band + // (e.g. TextInput's fast-echo bypass writing directly to stdout). The + // handler in ink.tsx updates two pieces of state from a single call: + // - `displayCursor` (the relative-move basis log-update uses on the + // next frame; skipped on alt-screen where CSI H resets it every + // frame anyway), and + // - the active `cursorDeclaration.relativeX/Y` (the target the cursor + // parks at after every frame; bumped on BOTH screens because + // onRender's alt-screen branch emits an absolute CUP from it and + // a stale declaration there is still visibly wrong). + // Optional so testing.tsx doesn't need to stub it. + readonly onCursorAdvance?: CursorAdvanceNotifier // Dispatch a keyboard event through the DOM tree. Called for each // parsed key alongside the legacy EventEmitter path. readonly dispatchKeyboardEvent: (parsedKey: ParsedKey) => void @@ -196,7 +209,9 @@ export default class App extends PureComponent { {})}> - {this.state.error ? : this.props.children} + {})}> + {this.state.error ? : this.props.children} + diff --git a/ui-tui/packages/hermes-ink/src/ink/components/CursorAdvanceContext.ts b/ui-tui/packages/hermes-ink/src/ink/components/CursorAdvanceContext.ts new file mode 100644 index 00000000000..52566c1a917 --- /dev/null +++ b/ui-tui/packages/hermes-ink/src/ink/components/CursorAdvanceContext.ts @@ -0,0 +1,35 @@ +import { createContext } from 'react' + +/** + * Notify Ink that the physical terminal cursor was advanced by an + * out-of-band stdout.write (e.g. the TextInput fast-echo path). + * + * This is a two-part notification — calling it updates both: + * + * 1. Ink's cached `displayCursor` (the basis log-update uses to + * compute relative cursor moves for the next frame's preamble). + * Without this, the next frame's preamble starts from a stale + * parked position and the diff is rendered N cells offset. + * This half is SKIPPED on alt-screen — every alt-screen frame + * begins with CSI H which absolutely repositions the cursor, so + * the relative-move basis is reset for free. + * + * 2. Ink's active `cursorDeclaration` (the target the cursor parks + * at after every frame, set by `useDeclaredCursor`). Without + * this, an unrelated component re-rendering before the deferred + * React state catches up would publish a stale declaration and + * visually undo the fast-echo's advance. This half applies to + * BOTH main-screen and alt-screen — on alt-screen the cursor- + * park branch in onRender emits an absolute CUP to + * `rect.x + decl.relativeX`, so a stale declaration there is + * still wrong even though displayCursor is skipped. + * + * `dx`/`dy` are deltas in terminal cells (positive = right/down, + * negative = left/up). The caller is responsible for ensuring the + * physical cursor really did move by that amount. + */ +export type CursorAdvanceNotifier = (dx: number, dy?: number) => void + +const CursorAdvanceContext = createContext(() => {}) + +export default CursorAdvanceContext diff --git a/ui-tui/packages/hermes-ink/src/ink/hooks/use-cursor-advance.ts b/ui-tui/packages/hermes-ink/src/ink/hooks/use-cursor-advance.ts new file mode 100644 index 00000000000..15831ed86ab --- /dev/null +++ b/ui-tui/packages/hermes-ink/src/ink/hooks/use-cursor-advance.ts @@ -0,0 +1,33 @@ +import { useContext } from 'react' + +import CursorAdvanceContext, { type CursorAdvanceNotifier } from '../components/CursorAdvanceContext.js' + +/** + * Returns a function that notifies Ink the physical terminal cursor was + * advanced out-of-band (e.g. by a direct stdout.write from the + * TextInput fast-echo bypass). + * + * Calling the returned function updates two pieces of Ink state: + * + * - `displayCursor` — the cached parked-cursor position log-update + * uses as the relative-move basis for the next frame. Skipped on + * alt-screen, where every frame's CSI H resets the cursor anyway. + * + * - The active `cursorDeclaration` — the target the cursor parks at + * after every frame. Bumped on BOTH main- and alt-screen, because + * onRender's alt-screen park branch emits an absolute CUP from + * this value and a stale declaration there is still visibly wrong. + * The next React commit that publishes a fresh declaration + * supersedes the bump. + * + * The caller is responsible for the stdout write itself; this hook + * only reports the resulting cursor delta. Pass `dx` and optional + * `dy` in terminal cells (positive = moved right/down, negative = + * moved left/up). + * + * If the host isn't an Ink render root (test stubs, non-Ink renderer) + * the returned callback is a safe no-op. + */ +export function useCursorAdvance(): CursorAdvanceNotifier { + return useContext(CursorAdvanceContext) +} diff --git a/ui-tui/packages/hermes-ink/src/ink/ink-cursor-advance.test.ts b/ui-tui/packages/hermes-ink/src/ink/ink-cursor-advance.test.ts new file mode 100644 index 00000000000..a3cc1757ab6 --- /dev/null +++ b/ui-tui/packages/hermes-ink/src/ink/ink-cursor-advance.test.ts @@ -0,0 +1,234 @@ +import { EventEmitter } from 'events' + +import React from 'react' +import { describe, expect, it } from 'vitest' + +import Text from './components/Text.js' +import Ink from './ink.js' + +class FakeTty extends EventEmitter { + chunks: string[] = [] + columns = 40 + rows = 8 + isTTY = true + + write(chunk: string | Uint8Array, cb?: (err?: Error | null) => void): boolean { + this.chunks.push(typeof chunk === 'string' ? chunk : Buffer.from(chunk).toString('utf8')) + cb?.() + + return true + } +} + +function makeInk() { + const stdout = new FakeTty() + const stdin = new FakeTty() + const stderr = new FakeTty() + + const ink = new Ink({ + exitOnCtrlC: false, + patchConsole: false, + stderr: stderr as unknown as NodeJS.WriteStream, + stdin: stdin as unknown as NodeJS.ReadStream, + stdout: stdout as unknown as NodeJS.WriteStream + }) + + return { ink, stdout, stdin, stderr } +} + +// Cast helper instead of exposing __get*ForTest methods on production Ink — +// these are internal frame/cursor caches we only inspect from tests. +type InkPrivate = { + displayCursor: { x: number; y: number } | null + cursorDeclaration: { node: unknown; relativeX: number; relativeY: number } | null + frontFrame: { cursor: { x: number; y: number } } +} +const peek = (ink: Ink): InkPrivate => ink as unknown as InkPrivate + +// Closes the cursor-drift bug: when TextInput's fast-echo path writes a +// printable character directly to stdout, the hardware cursor advances by +// one cell BUT Ink's `displayCursor` cache (used as the basis for the +// next frame's relative cursor preamble) wasn't being updated. On long +// sessions an unrelated re-render (status bar timer, streaming +// reasoning, etc.) would then park the hardware cursor N cells offset +// from the actual caret — visible as "extra whitespace between my last +// typed character and the cursor block". +describe('Ink.noteExternalCursorAdvance', () => { + it('bumps an already-tracked displayCursor by the given delta', () => { + const { ink } = makeInk() + + ink.render(React.createElement(Text, null, 'hi')) + ink.onRender() + + // Seed a known parked position directly. In production this is set by + // the cursor-park branch in onRender when a useDeclaredCursor caller + // commits a declaration; this test bypasses React for hermeticity. + peek(ink).displayCursor = { x: 5, y: 0 } + + ink.noteExternalCursorAdvance(3) + expect(peek(ink).displayCursor).toEqual({ x: 8, y: 0 }) + + ink.noteExternalCursorAdvance(-1) + expect(peek(ink).displayCursor).toEqual({ x: 7, y: 0 }) + + ink.noteExternalCursorAdvance(0, 2) + expect(peek(ink).displayCursor).toEqual({ x: 7, y: 2 }) + + ink.unmount() + }) + + it('seeds displayCursor from frontFrame.cursor when nothing was parked', () => { + const { ink } = makeInk() + + ink.render(React.createElement(Text, null, 'hello')) + ink.onRender() + + expect(peek(ink).displayCursor).toBeNull() + const base = { x: peek(ink).frontFrame.cursor.x, y: peek(ink).frontFrame.cursor.y } + + ink.noteExternalCursorAdvance(4) + expect(peek(ink).displayCursor).toEqual({ x: base.x + 4, y: base.y }) + + ink.unmount() + }) + + it('is a no-op when the delta is zero', () => { + const { ink } = makeInk() + + ink.render(React.createElement(Text, null, 'hi')) + ink.onRender() + + ink.noteExternalCursorAdvance(0) + expect(peek(ink).displayCursor).toBeNull() + + ink.noteExternalCursorAdvance(0, 0) + expect(peek(ink).displayCursor).toBeNull() + + ink.unmount() + }) + + it('skips displayCursor on alt-screen — CSI H resets every frame', () => { + const { ink } = makeInk() + + ink.setAltScreenActive(true) + ink.render(React.createElement(Text, null, 'hi')) + ink.onRender() + peek(ink).displayCursor = { x: 5, y: 0 } + + ink.noteExternalCursorAdvance(3) + + expect(peek(ink).displayCursor).toEqual({ x: 5, y: 0 }) + + ink.unmount() + }) + + // Closes Copilot follow-up on PR #26717: the default TUI wraps the + // composer in , so alt-screen is the production + // path. CSI H only resets the log-update relative-move basis — the + // declared cursor target is still consulted by onRender's alt-screen + // park branch (`cursorPosition(row, col)` using rect + decl). So + // cursorDeclaration MUST advance on alt-screen too, even though + // displayCursor doesn't need to. + it('still advances cursorDeclaration on alt-screen', () => { + const { ink } = makeInk() + + ink.setAltScreenActive(true) + ink.render(React.createElement(Text, null, 'hi')) + ink.onRender() + + const fakeNode = {} as unknown as Record + + peek(ink).cursorDeclaration = { node: fakeNode, relativeX: 7, relativeY: 0 } + peek(ink).displayCursor = { x: 12, y: 0 } + + ink.noteExternalCursorAdvance(3) + + // displayCursor untouched on alt-screen + expect(peek(ink).displayCursor).toEqual({ x: 12, y: 0 }) + // declaration still advanced — onRender's alt-screen park reads this + expect(peek(ink).cursorDeclaration).toEqual({ node: fakeNode, relativeX: 10, relativeY: 0 }) + + ink.unmount() + }) + + // Closes Copilot review feedback on PR #26717: even after the + // TextInput-level fix where layout reads `curRef.current` directly, + // there's still a window where a fast-echo wrote to stdout but the + // current cursor declaration on Ink (set by an earlier render's + // useDeclaredCursor commit) points at the PRE-keystroke caret + // column. If we advanced only `displayCursor`, an unrelated re-render + // in that window would re-run onRender's cursor-park branch with the + // stale declaration and visually undo the fast-echo's advance. We + // must bump BOTH so the cursor stays anchored to the physical caret + // until the next React commit publishes a fresh declaration + // (computed from `curRef.current` via the cursorLayout call in + // textInput.tsx) that supersedes the bump. + it('advances the active cursorDeclaration in lock-step with displayCursor', () => { + const { ink } = makeInk() + + ink.render(React.createElement(Text, null, 'hi')) + ink.onRender() + + const fakeNode = {} as unknown as Record + + peek(ink).cursorDeclaration = { node: fakeNode, relativeX: 7, relativeY: 0 } + peek(ink).displayCursor = { x: 12, y: 0 } + + ink.noteExternalCursorAdvance(3) + + expect(peek(ink).displayCursor).toEqual({ x: 15, y: 0 }) + expect(peek(ink).cursorDeclaration).toEqual({ node: fakeNode, relativeX: 10, relativeY: 0 }) + + ink.noteExternalCursorAdvance(-1) + expect(peek(ink).displayCursor).toEqual({ x: 14, y: 0 }) + expect(peek(ink).cursorDeclaration).toEqual({ node: fakeNode, relativeX: 9, relativeY: 0 }) + + ink.unmount() + }) + + // Closes Copilot follow-up on PR #26717: the dy half of the notifier + // contract was tested for `displayCursor` but not for + // `cursorDeclaration.relativeY`. Newlines in fast-echoed text never + // hit the bypass today (canFastAppendShape rejects '\n'), but `dy` + // is part of the public API and must propagate symmetrically with + // dx so future callers (e.g. multi-line paste shortcuts) don't get + // a half-implemented contract. + it('advances cursorDeclaration.relativeY when dy is non-zero', () => { + const { ink } = makeInk() + + ink.render(React.createElement(Text, null, 'hi')) + ink.onRender() + + const fakeNode = {} as unknown as Record + + peek(ink).cursorDeclaration = { node: fakeNode, relativeX: 2, relativeY: 1 } + peek(ink).displayCursor = { x: 4, y: 2 } + + ink.noteExternalCursorAdvance(1, 3) + + expect(peek(ink).displayCursor).toEqual({ x: 5, y: 5 }) + expect(peek(ink).cursorDeclaration).toEqual({ node: fakeNode, relativeX: 3, relativeY: 4 }) + + // Negative dy too — cursor moving up across visual rows. + ink.noteExternalCursorAdvance(0, -2) + expect(peek(ink).displayCursor).toEqual({ x: 5, y: 3 }) + expect(peek(ink).cursorDeclaration).toEqual({ node: fakeNode, relativeX: 3, relativeY: 2 }) + + ink.unmount() + }) + + it('leaves cursorDeclaration unchanged when no declaration is active', () => { + const { ink } = makeInk() + + ink.render(React.createElement(Text, null, 'hi')) + ink.onRender() + + expect(peek(ink).cursorDeclaration).toBeNull() + + ink.noteExternalCursorAdvance(3) + + expect(peek(ink).cursorDeclaration).toBeNull() + + ink.unmount() + }) +}) diff --git a/ui-tui/packages/hermes-ink/src/ink/ink.tsx b/ui-tui/packages/hermes-ink/src/ink/ink.tsx index 8cdfe781395..49fdf704488 100644 --- a/ui-tui/packages/hermes-ink/src/ink/ink.tsx +++ b/ui-tui/packages/hermes-ink/src/ink/ink.tsx @@ -16,6 +16,7 @@ import { logError } from '../utils/log.js' import { colorize } from './colorize.js' import App from './components/App.js' +import type { CursorAdvanceNotifier } from './components/CursorAdvanceContext.js' import type { CursorDeclaration, CursorDeclarationSetter } from './components/CursorDeclarationContext.js' import { FRAME_INTERVAL_MS } from './constants.js' import * as dom from './dom.js' @@ -2219,6 +2220,85 @@ export default class Ink { this.cursorDeclaration = decl } + // Caller writes raw bytes to stdout that move the physical terminal + // cursor (e.g. TextInput's fast-echo bypass). Without this notification, + // Ink's `displayCursor` cache and log-update's prevFrame.cursor stay + // unchanged, so the next frame's relative cursor moves compute from a + // stale position and the hardware cursor parks `dx` cells offset from + // the actual caret. Visible symptom: extra whitespace between the just- + // typed character and the cursor block, more pronounced on long + // sessions where unrelated components re-render between fast-echo and + // the deferred composer re-render. + // + // If displayCursor was already tracked, just bump it. Otherwise seed it + // to (prevFrame.cursor + delta) so the next frame's preamble emits a + // (-dx, -dy) relative move that brings the cursor back to log-update's + // expected start position before the diff body runs. + // + // Public so tests can drive it directly without mounting App. + // + // Bumps BOTH `displayCursor` (used by log-update's relative-move + // preamble) AND, if non-null, `cursorDeclaration.relativeX/Y` (the + // target the cursor parks at after every frame). Advancing only one + // of the two would leave the other stale: e.g. if the deferred React + // `setCur` hasn't flushed yet, the next unrelated re-render would + // re-compute `target` from the stale declaration and park the + // hardware cursor back at the old caret column. We advance both so + // the fast-echo is invisible to intervening frames until React + // catches up. + noteExternalCursorAdvance: CursorAdvanceNotifier = (dx, dy = 0) => { + if (dx === 0 && dy === 0) { + return + } + + // displayCursor / log-update relative-move basis only matters on + // main screen — alt-screen frames begin with absolute CSI H every + // frame so the next preamble naturally resets to (0,0). cursorDeclaration, + // however, IS still consulted on alt-screen — onRender's park branch + // emits an absolute CUP using `rect.x + decl.relativeX`, so a stale + // declaration in the deferred-setCur window would park the cursor + // at the pre-keystroke caret. We therefore skip ONLY the displayCursor + // half on alt-screen, not the declaration half. + if (!this.altScreenActive) { + if (this.displayCursor !== null) { + this.displayCursor = { + x: this.displayCursor.x + dx, + y: this.displayCursor.y + dy + } + } else { + // No prior parked position. Seed from frontFrame.cursor (where + // log-update parked the cursor at the end of the last frame) so + // the next preamble's relative move correctly cancels the + // external advance. + const baseX = this.frontFrame.cursor.x + const baseY = this.frontFrame.cursor.y + this.displayCursor = { x: baseX + dx, y: baseY + dy } + } + } + + // Also advance the active cursor declaration if any. Without this, + // a TextInput that defers its React `cur` state update (16ms timer + // in textInput.tsx — perf optimization that batches re-renders + // during heavy typing) leaves `cursorDeclaration.relativeX` pointing + // at the pre-keystroke caret column. If an unrelated component + // re-renders before the deferred `setCur` flushes, the cursor-park + // branch at the end of onRender would move the hardware cursor back + // to that stale relativeX and visually undo the fast-echo's + // advance. Bumping relativeX here keeps the declared target in + // lock-step with the physical cursor until React state catches up. + // Applies to BOTH main-screen and alt-screen — the alt-screen park + // branch uses an absolute CUP to (rect.x + decl.relativeX), so a + // stale declaration there would still produce the wrong column. + const decl = this.cursorDeclaration + + if (decl !== null) { + this.cursorDeclaration = { + node: decl.node, + relativeX: decl.relativeX + dx, + relativeY: decl.relativeY + dy + } + } + } render(node: ReactNode): void { this.currentNode = node @@ -2228,6 +2308,7 @@ export default class Ink { exitOnCtrlC={this.options.exitOnCtrlC} getHyperlinkAt={this.getHyperlinkAt} onClickAt={this.dispatchClick} + onCursorAdvance={this.noteExternalCursorAdvance} onCursorDeclaration={this.setCursorDeclaration} onExit={this.unmount} onHoverAt={this.dispatchHover} diff --git a/ui-tui/src/__tests__/textInputCursorSourceOfTruth.test.ts b/ui-tui/src/__tests__/textInputCursorSourceOfTruth.test.ts new file mode 100644 index 00000000000..b52894d1587 --- /dev/null +++ b/ui-tui/src/__tests__/textInputCursorSourceOfTruth.test.ts @@ -0,0 +1,50 @@ +import { readFileSync } from 'node:fs' +import { dirname, join } from 'node:path' +import { fileURLToPath } from 'node:url' + +import { describe, expect, it } from 'vitest' + +// Locate textInput.tsx relative to this test file so the assertion +// survives moves of the test fixture itself. +const TEXT_INPUT_PATH = join(dirname(fileURLToPath(import.meta.url)), '..', 'components', 'textInput.tsx') +const source = readFileSync(TEXT_INPUT_PATH, 'utf8') + +// Closes Copilot follow-up on PR #26717: the original cursor-drift +// fix bumped Ink's displayCursor / cursorDeclaration on fast-echo, but +// if TextInput itself re-renders before the deferred 16ms `setCur` +// flushes (parent state change, status-bar tick, spinner) the layout +// effect inside `useDeclaredCursor` re-publishes a declaration +// computed from the STALE React `cur` state and clobbers the Ink-level +// bump. The fix is structural: read `curRef.current` (always +// up-to-date) when computing the layout, not the `cur` state. +// +// This file pins that invariant. Switching back to `cur` state — or +// re-introducing a memo keyed on `cur` that uses `curRef.current` +// inside but stops re-computing on rerender — is a regression and +// should be caught here, not via a flaky integration test that mounts +// Ink + stdin. +describe('textInput cursor-layout source of truth', () => { + it('reads curRef.current (not the cur React state) for cursorLayout', () => { + // The line we care about. We allow whitespace / formatting drift, + // but the call itself must use `curRef.current`. + expect(source).toMatch(/cursorLayout\(\s*display\s*,\s*curRef\.current\s*,\s*columns\s*\)/) + }) + + it('does not pass the bare `cur` React state into cursorLayout', () => { + // Any `cursorLayout(display, cur, columns)` invocation would + // reintroduce the stale-declaration window. + expect(source).not.toMatch(/cursorLayout\(\s*display\s*,\s*cur\s*,\s*columns\s*\)/) + }) + + it('keeps the fast-echo notifier calls paired with the stdout writes', () => { + // Both fast-echo paths must call noteCursorAdvance, otherwise Ink + // never learns about the out-of-band write and drifts again. We + // tolerate explanatory comments in between (the rationale block is + // intentionally long), but the pairing itself must hold. + const backspacePattern = /stdout!\.write\(['"`]\\b \\b['"`]\)[\s\S]{0,1000}?noteCursorAdvance\(-1\)/ + expect(source).toMatch(backspacePattern) + + const appendPattern = /stdout!\.write\(text\)[\s\S]{0,1000}?noteCursorAdvance\(text\.length\)/ + expect(source).toMatch(appendPattern) + }) +}) diff --git a/ui-tui/src/__tests__/textInputFastEcho.test.ts b/ui-tui/src/__tests__/textInputFastEcho.test.ts index 7f246f19f21..2e08111ffb4 100644 --- a/ui-tui/src/__tests__/textInputFastEcho.test.ts +++ b/ui-tui/src/__tests__/textInputFastEcho.test.ts @@ -133,4 +133,42 @@ describe('canFastBackspaceShape', () => { it('rejects deleting an emoji', () => { expect(canFastBackspaceShape('hi🙂', 'hi🙂'.length)).toBe(false) }) + + // Closes Copilot PR #26717 round 3: the "\b \b" sequence cannot move + // the terminal cursor onto the previous visual row across a + // soft-wrap boundary. When the caret sits at visual column 0 of a + // wrapped row (column == 0 in the computed cursor layout), backspace + // would leave the physical cursor in place while the logical caret + // moves up to the end of the previous visual line — desyncing both + // Ink's displayCursor model and the user-visible position. The fast + // path must fall through in that case so the normal Ink render path + // can lay out the correct cursor position. + it('rejects fast-backspace at a soft-wrap boundary when columns is known', () => { + // value width 6 in a column of 6 → cursorLayout produces (line 1, col 0) + // i.e. the caret has overflowed onto the next visual line. + const value = 'hello ' + expect(canFastBackspaceShape(value, value.length, 6)).toBe(false) + }) + + it('rejects fast-backspace at an exact multiple of columns (wide wrap)', () => { + // 12 chars at width 6 → two full visual rows, caret at (line 2, col 0). + const value = 'abcdefghijkl' + expect(canFastBackspaceShape(value, value.length, 6)).toBe(false) + }) + + it('still accepts fast-backspace inside a wrapped line', () => { + // Caret mid-visual-line — "\b \b" can move the cursor one cell left + // without crossing a wrap boundary. + expect(canFastBackspaceShape('hello world', 'hello world'.length, 20)).toBe(true) + expect(canFastBackspaceShape('abcdefghi', 9, 6)).toBe(true) // visual line 1, col 3 → ok + }) + + it('skips the wrap-boundary check when columns is omitted (legacy contract)', () => { + // Callers that don't pass `columns` fall back to the pre-wrap-aware + // behavior — the function does NOT magically reject anything that + // could be a wrap boundary without the width. Production callers + // must always pass `columns`; this case is for unit tests of the + // pre-wrap shape contract. + expect(canFastBackspaceShape('hello ', 'hello '.length)).toBe(true) + }) }) diff --git a/ui-tui/src/components/textInput.tsx b/ui-tui/src/components/textInput.tsx index 91e109fa366..b3c79357368 100644 --- a/ui-tui/src/components/textInput.tsx +++ b/ui-tui/src/components/textInput.tsx @@ -16,13 +16,14 @@ import { type InkExt = typeof Ink & { stringWidth: (s: string) => number + useCursorAdvance: () => (dx: number, dy?: number) => void useDeclaredCursor: (a: { line: number; column: number; active: boolean }) => (el: any) => void useStdout: () => { stdout?: NodeJS.WriteStream } useTerminalFocus: () => boolean } const ink = Ink as unknown as InkExt -const { Box, Text, useStdin, useInput, useStdout, stringWidth, useDeclaredCursor, useTerminalFocus } = ink +const { Box, Text, useStdin, useInput, useStdout, stringWidth, useCursorAdvance, useDeclaredCursor, useTerminalFocus } = ink const ESC = '\x1b' const INV = `${ESC}[7m` @@ -238,8 +239,26 @@ export function canFastAppendShape( * ASCII. Anything else (combining marks, IME compositions, wide chars, * tabs, ANSI fragments) goes through the normal render path so Ink can * recompute cell widths. + * + * When `columns` is supplied, ALSO rejects when the physical cursor + * sits at visual column 0 — i.e., right after a soft-wrap boundary. + * The "\b \b" sequence cannot move the cursor onto the previous visual + * row (terminals don't back-step across line wraps), so the physical + * cursor would stay put while the logical caret moves to the end of + * the previous visual line, desyncing both Ink's `displayCursor` model + * and the user-visible position. + * + * When `columns` is OMITTED, the wrap-boundary check is skipped + * entirely and the function reverts to the legacy non-wrap-aware + * contract — values like `'hello '` will return `true` even though + * they would be unsafe at a width of 6. Production callers (the + * composer's `canFastBackspace` helper) always pass `columns`; + * `columns` is optional only so unit tests of the pre-wrap shape + * contract can keep calling the helper without threading width + * through. Do NOT omit it from any new caller that relies on the + * wrap-boundary protection. */ -export function canFastBackspaceShape(current: string, cursor: number): boolean { +export function canFastBackspaceShape(current: string, cursor: number, columns?: number): boolean { if (cursor !== current.length) { return false } @@ -252,6 +271,13 @@ export function canFastBackspaceShape(current: string, cursor: number): boolean return false } + // If we know the wrap width, reject at the soft-wrap boundary: the + // caret's visual column is 0, so "\b \b" can't represent the physical + // move back to the previous visual line. + if (columns !== undefined && cursorLayout(current, cursor, columns).column === 0) { + return false + } + const removed = current.slice(prevPos(current, cursor), cursor) return ASCII_PRINTABLE_RE.test(removed) @@ -333,6 +359,7 @@ export function TextInput({ const fwdDel = useFwdDelete(focus) const termFocus = useTerminalFocus() const { stdout } = useStdout() + const noteCursorAdvance = useCursorAdvance() const curRef = useRef(cur) const selRef = useRef(null) @@ -368,7 +395,19 @@ export function TextInput({ [sel] ) - const layout = useMemo(() => cursorLayout(display, cur, columns), [columns, cur, display]) + // Read `curRef.current` (always up-to-date) rather than the `cur` + // React state. The fast-echo path defers the React `setCur` by 16ms + // to batch re-renders during heavy typing; if an unrelated render + // flushes this component during that window and we used the stale + // `cur` state here, the layout effect inside `useDeclaredCursor` + // would publish a stale cursor declaration and clobber the Ink-level + // bump from `noteCursorAdvance(...)`. `cur` is still in scope and + // referenced by setSel/setCur paths below, so React tracks the + // dependency naturally — we just don't use it as the source of truth + // for layout. The cursorLayout call is cheap (one wrap-text pass + // over a single-line string in the common case), so dropping useMemo + // is fine. + const layout = cursorLayout(display, curRef.current, columns) const boxRef = useDeclaredCursor({ line: layout.line, @@ -526,7 +565,7 @@ export function TextInput({ canFastEchoBase() && canFastAppendShape(current, cursor, text, columns, lineWidthRef.current) const canFastBackspace = (current: string, cursor: number) => - canFastEchoBase() && canFastBackspaceShape(current, cursor) + canFastEchoBase() && canFastBackspaceShape(current, cursor, columns) const commit = ( next: string, @@ -911,6 +950,12 @@ export function TextInput({ v = v.slice(0, t) + v.slice(c) c = t stdout!.write('\b \b') + // The "\b \b" sequence ends with the cursor one column to the + // LEFT of where Ink last parked it. Tell Ink so its `displayCursor` + // (and log-update's relative-move basis on the next frame) stays + // in sync — otherwise the cursor parks one cell to the right of + // the caret on the next unrelated re-render. + noteCursorAdvance(-1) commit(v, c, true, false, false, Math.max(0, lineWidthRef.current - 1)) return @@ -998,6 +1043,14 @@ export function TextInput({ if (simpleAppend) { stdout!.write(text) + // ASCII-printable text advances the physical cursor by exactly + // text.length cells (canFastAppendShape rejects non-ASCII, + // wide chars, newlines). Notify Ink so the cached displayCursor + // / log-update relative-move basis advances with it; otherwise + // any unrelated re-render that happens before the 16ms + // setCur/setParent flush parks the cursor text.length cells + // too far right (#cursor-drift). + noteCursorAdvance(text.length) commit(v, c, true, false, false, lineWidthRef.current + stringWidth(text)) return diff --git a/ui-tui/src/types/hermes-ink.d.ts b/ui-tui/src/types/hermes-ink.d.ts index b84f843d322..ca2a05dc449 100644 --- a/ui-tui/src/types/hermes-ink.d.ts +++ b/ui-tui/src/types/hermes-ink.d.ts @@ -164,6 +164,7 @@ declare module '@hermes/ink' { readonly column: number readonly active: boolean }): (el: unknown) => void + export function useCursorAdvance(): (dx: number, dy?: number) => void export function useStdin(): { readonly stdin: NodeJS.ReadStream readonly setRawMode: (value: boolean) => void diff --git a/uv.lock b/uv.lock index 2508637a081..eca62880304 100644 --- a/uv.lock +++ b/uv.lock @@ -40,7 +40,7 @@ wheels = [ [[package]] name = "aiohttp" -version = "3.13.3" +version = "3.13.4" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "aiohappyeyeballs" }, @@ -51,93 +51,93 @@ dependencies = [ { name = "propcache" }, { name = "yarl" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/50/42/32cf8e7704ceb4481406eb87161349abb46a57fee3f008ba9cb610968646/aiohttp-3.13.3.tar.gz", hash = "sha256:a949eee43d3782f2daae4f4a2819b2cb9b0c5d3b7f7a927067cc84dafdbb9f88", size = 7844556, upload-time = "2026-01-03T17:33:05.204Z" } +sdist = { url = "https://files.pythonhosted.org/packages/45/4a/064321452809dae953c1ed6e017504e72551a26b6f5708a5a80e4bf556ff/aiohttp-3.13.4.tar.gz", hash = "sha256:d97a6d09c66087890c2ab5d49069e1e570583f7ac0314ecf98294c1b6aaebd38", size = 7859748, upload-time = "2026-03-28T17:19:40.6Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/f1/4c/a164164834f03924d9a29dc3acd9e7ee58f95857e0b467f6d04298594ebb/aiohttp-3.13.3-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:5b6073099fb654e0a068ae678b10feff95c5cae95bbfcbfa7af669d361a8aa6b", size = 746051, upload-time = "2026-01-03T17:29:43.287Z" }, - { url = "https://files.pythonhosted.org/packages/82/71/d5c31390d18d4f58115037c432b7e0348c60f6f53b727cad33172144a112/aiohttp-3.13.3-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:1cb93e166e6c28716c8c6aeb5f99dfb6d5ccf482d29fe9bf9a794110e6d0ab64", size = 499234, upload-time = "2026-01-03T17:29:44.822Z" }, - { url = "https://files.pythonhosted.org/packages/0e/c9/741f8ac91e14b1d2e7100690425a5b2b919a87a5075406582991fb7de920/aiohttp-3.13.3-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:28e027cf2f6b641693a09f631759b4d9ce9165099d2b5d92af9bd4e197690eea", size = 494979, upload-time = "2026-01-03T17:29:46.405Z" }, - { url = "https://files.pythonhosted.org/packages/75/b5/31d4d2e802dfd59f74ed47eba48869c1c21552c586d5e81a9d0d5c2ad640/aiohttp-3.13.3-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:3b61b7169ababd7802f9568ed96142616a9118dd2be0d1866e920e77ec8fa92a", size = 1748297, upload-time = "2026-01-03T17:29:48.083Z" }, - { url = "https://files.pythonhosted.org/packages/1a/3e/eefad0ad42959f226bb79664826883f2687d602a9ae2941a18e0484a74d3/aiohttp-3.13.3-cp311-cp311-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:80dd4c21b0f6237676449c6baaa1039abae86b91636b6c91a7f8e61c87f89540", size = 1707172, upload-time = "2026-01-03T17:29:49.648Z" }, - { url = "https://files.pythonhosted.org/packages/c5/3a/54a64299fac2891c346cdcf2aa6803f994a2e4beeaf2e5a09dcc54acc842/aiohttp-3.13.3-cp311-cp311-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:65d2ccb7eabee90ce0503c17716fc77226be026dcc3e65cce859a30db715025b", size = 1805405, upload-time = "2026-01-03T17:29:51.244Z" }, - { url = "https://files.pythonhosted.org/packages/6c/70/ddc1b7169cf64075e864f64595a14b147a895a868394a48f6a8031979038/aiohttp-3.13.3-cp311-cp311-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:5b179331a481cb5529fca8b432d8d3c7001cb217513c94cd72d668d1248688a3", size = 1899449, upload-time = "2026-01-03T17:29:53.938Z" }, - { url = "https://files.pythonhosted.org/packages/a1/7e/6815aab7d3a56610891c76ef79095677b8b5be6646aaf00f69b221765021/aiohttp-3.13.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9d4c940f02f49483b18b079d1c27ab948721852b281f8b015c058100e9421dd1", size = 1748444, upload-time = "2026-01-03T17:29:55.484Z" }, - { url = "https://files.pythonhosted.org/packages/6b/f2/073b145c4100da5511f457dc0f7558e99b2987cf72600d42b559db856fbc/aiohttp-3.13.3-cp311-cp311-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:f9444f105664c4ce47a2a7171a2418bce5b7bae45fb610f4e2c36045d85911d3", size = 1606038, upload-time = "2026-01-03T17:29:57.179Z" }, - { url = "https://files.pythonhosted.org/packages/0a/c1/778d011920cae03ae01424ec202c513dc69243cf2db303965615b81deeea/aiohttp-3.13.3-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:694976222c711d1d00ba131904beb60534f93966562f64440d0c9d41b8cdb440", size = 1724156, upload-time = "2026-01-03T17:29:58.914Z" }, - { url = "https://files.pythonhosted.org/packages/0e/cb/3419eabf4ec1e9ec6f242c32b689248365a1cf621891f6f0386632525494/aiohttp-3.13.3-cp311-cp311-musllinux_1_2_armv7l.whl", hash = "sha256:f33ed1a2bf1997a36661874b017f5c4b760f41266341af36febaf271d179f6d7", size = 1722340, upload-time = "2026-01-03T17:30:01.962Z" }, - { url = "https://files.pythonhosted.org/packages/7a/e5/76cf77bdbc435bf233c1f114edad39ed4177ccbfab7c329482b179cff4f4/aiohttp-3.13.3-cp311-cp311-musllinux_1_2_ppc64le.whl", hash = "sha256:e636b3c5f61da31a92bf0d91da83e58fdfa96f178ba682f11d24f31944cdd28c", size = 1783041, upload-time = "2026-01-03T17:30:03.609Z" }, - { url = "https://files.pythonhosted.org/packages/9d/d4/dd1ca234c794fd29c057ce8c0566b8ef7fd6a51069de5f06fa84b9a1971c/aiohttp-3.13.3-cp311-cp311-musllinux_1_2_riscv64.whl", hash = "sha256:5d2d94f1f5fcbe40838ac51a6ab5704a6f9ea42e72ceda48de5e6b898521da51", size = 1596024, upload-time = "2026-01-03T17:30:05.132Z" }, - { url = "https://files.pythonhosted.org/packages/55/58/4345b5f26661a6180afa686c473620c30a66afdf120ed3dd545bbc809e85/aiohttp-3.13.3-cp311-cp311-musllinux_1_2_s390x.whl", hash = "sha256:2be0e9ccf23e8a94f6f0650ce06042cefc6ac703d0d7ab6c7a917289f2539ad4", size = 1804590, upload-time = "2026-01-03T17:30:07.135Z" }, - { url = "https://files.pythonhosted.org/packages/7b/06/05950619af6c2df7e0a431d889ba2813c9f0129cec76f663e547a5ad56f2/aiohttp-3.13.3-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:9af5e68ee47d6534d36791bbe9b646d2a7c7deb6fc24d7943628edfbb3581f29", size = 1740355, upload-time = "2026-01-03T17:30:09.083Z" }, - { url = "https://files.pythonhosted.org/packages/3e/80/958f16de79ba0422d7c1e284b2abd0c84bc03394fbe631d0a39ffa10e1eb/aiohttp-3.13.3-cp311-cp311-win32.whl", hash = "sha256:a2212ad43c0833a873d0fb3c63fa1bacedd4cf6af2fee62bf4b739ceec3ab239", size = 433701, upload-time = "2026-01-03T17:30:10.869Z" }, - { url = "https://files.pythonhosted.org/packages/dc/f2/27cdf04c9851712d6c1b99df6821a6623c3c9e55956d4b1e318c337b5a48/aiohttp-3.13.3-cp311-cp311-win_amd64.whl", hash = "sha256:642f752c3eb117b105acbd87e2c143de710987e09860d674e068c4c2c441034f", size = 457678, upload-time = "2026-01-03T17:30:12.719Z" }, - { url = "https://files.pythonhosted.org/packages/a0/be/4fc11f202955a69e0db803a12a062b8379c970c7c84f4882b6da17337cc1/aiohttp-3.13.3-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:b903a4dfee7d347e2d87697d0713be59e0b87925be030c9178c5faa58ea58d5c", size = 739732, upload-time = "2026-01-03T17:30:14.23Z" }, - { url = "https://files.pythonhosted.org/packages/97/2c/621d5b851f94fa0bb7430d6089b3aa970a9d9b75196bc93bb624b0db237a/aiohttp-3.13.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:a45530014d7a1e09f4a55f4f43097ba0fd155089372e105e4bff4ca76cb1b168", size = 494293, upload-time = "2026-01-03T17:30:15.96Z" }, - { url = "https://files.pythonhosted.org/packages/5d/43/4be01406b78e1be8320bb8316dc9c42dbab553d281c40364e0f862d5661c/aiohttp-3.13.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:27234ef6d85c914f9efeb77ff616dbf4ad2380be0cda40b4db086ffc7ddd1b7d", size = 493533, upload-time = "2026-01-03T17:30:17.431Z" }, - { url = "https://files.pythonhosted.org/packages/8d/a8/5a35dc56a06a2c90d4742cbf35294396907027f80eea696637945a106f25/aiohttp-3.13.3-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d32764c6c9aafb7fb55366a224756387cd50bfa720f32b88e0e6fa45b27dcf29", size = 1737839, upload-time = "2026-01-03T17:30:19.422Z" }, - { url = "https://files.pythonhosted.org/packages/bf/62/4b9eeb331da56530bf2e198a297e5303e1c1ebdceeb00fe9b568a65c5a0c/aiohttp-3.13.3-cp312-cp312-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:b1a6102b4d3ebc07dad44fbf07b45bb600300f15b552ddf1851b5390202ea2e3", size = 1703932, upload-time = "2026-01-03T17:30:21.756Z" }, - { url = "https://files.pythonhosted.org/packages/7c/f6/af16887b5d419e6a367095994c0b1332d154f647e7dc2bd50e61876e8e3d/aiohttp-3.13.3-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:c014c7ea7fb775dd015b2d3137378b7be0249a448a1612268b5a90c2d81de04d", size = 1771906, upload-time = "2026-01-03T17:30:23.932Z" }, - { url = "https://files.pythonhosted.org/packages/ce/83/397c634b1bcc24292fa1e0c7822800f9f6569e32934bdeef09dae7992dfb/aiohttp-3.13.3-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:2b8d8ddba8f95ba17582226f80e2de99c7a7948e66490ef8d947e272a93e9463", size = 1871020, upload-time = "2026-01-03T17:30:26Z" }, - { url = "https://files.pythonhosted.org/packages/86/f6/a62cbbf13f0ac80a70f71b1672feba90fdb21fd7abd8dbf25c0105fb6fa3/aiohttp-3.13.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9ae8dd55c8e6c4257eae3a20fd2c8f41edaea5992ed67156642493b8daf3cecc", size = 1755181, upload-time = "2026-01-03T17:30:27.554Z" }, - { url = "https://files.pythonhosted.org/packages/0a/87/20a35ad487efdd3fba93d5843efdfaa62d2f1479eaafa7453398a44faf13/aiohttp-3.13.3-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:01ad2529d4b5035578f5081606a465f3b814c542882804e2e8cda61adf5c71bf", size = 1561794, upload-time = "2026-01-03T17:30:29.254Z" }, - { url = "https://files.pythonhosted.org/packages/de/95/8fd69a66682012f6716e1bc09ef8a1a2a91922c5725cb904689f112309c4/aiohttp-3.13.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:bb4f7475e359992b580559e008c598091c45b5088f28614e855e42d39c2f1033", size = 1697900, upload-time = "2026-01-03T17:30:31.033Z" }, - { url = "https://files.pythonhosted.org/packages/e5/66/7b94b3b5ba70e955ff597672dad1691333080e37f50280178967aff68657/aiohttp-3.13.3-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:c19b90316ad3b24c69cd78d5c9b4f3aa4497643685901185b65166293d36a00f", size = 1728239, upload-time = "2026-01-03T17:30:32.703Z" }, - { url = "https://files.pythonhosted.org/packages/47/71/6f72f77f9f7d74719692ab65a2a0252584bf8d5f301e2ecb4c0da734530a/aiohttp-3.13.3-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:96d604498a7c782cb15a51c406acaea70d8c027ee6b90c569baa6e7b93073679", size = 1740527, upload-time = "2026-01-03T17:30:34.695Z" }, - { url = "https://files.pythonhosted.org/packages/fa/b4/75ec16cbbd5c01bdaf4a05b19e103e78d7ce1ef7c80867eb0ace42ff4488/aiohttp-3.13.3-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:084911a532763e9d3dd95adf78a78f4096cd5f58cdc18e6fdbc1b58417a45423", size = 1554489, upload-time = "2026-01-03T17:30:36.864Z" }, - { url = "https://files.pythonhosted.org/packages/52/8f/bc518c0eea29f8406dcf7ed1f96c9b48e3bc3995a96159b3fc11f9e08321/aiohttp-3.13.3-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:7a4a94eb787e606d0a09404b9c38c113d3b099d508021faa615d70a0131907ce", size = 1767852, upload-time = "2026-01-03T17:30:39.433Z" }, - { url = "https://files.pythonhosted.org/packages/9d/f2/a07a75173124f31f11ea6f863dc44e6f09afe2bca45dd4e64979490deab1/aiohttp-3.13.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:87797e645d9d8e222e04160ee32aa06bc5c163e8499f24db719e7852ec23093a", size = 1722379, upload-time = "2026-01-03T17:30:41.081Z" }, - { url = "https://files.pythonhosted.org/packages/3c/4a/1a3fee7c21350cac78e5c5cef711bac1b94feca07399f3d406972e2d8fcd/aiohttp-3.13.3-cp312-cp312-win32.whl", hash = "sha256:b04be762396457bef43f3597c991e192ee7da460a4953d7e647ee4b1c28e7046", size = 428253, upload-time = "2026-01-03T17:30:42.644Z" }, - { url = "https://files.pythonhosted.org/packages/d9/b7/76175c7cb4eb73d91ad63c34e29fc4f77c9386bba4a65b53ba8e05ee3c39/aiohttp-3.13.3-cp312-cp312-win_amd64.whl", hash = "sha256:e3531d63d3bdfa7e3ac5e9b27b2dd7ec9df3206a98e0b3445fa906f233264c57", size = 455407, upload-time = "2026-01-03T17:30:44.195Z" }, - { url = "https://files.pythonhosted.org/packages/97/8a/12ca489246ca1faaf5432844adbfce7ff2cc4997733e0af120869345643a/aiohttp-3.13.3-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:5dff64413671b0d3e7d5918ea490bdccb97a4ad29b3f311ed423200b2203e01c", size = 734190, upload-time = "2026-01-03T17:30:45.832Z" }, - { url = "https://files.pythonhosted.org/packages/32/08/de43984c74ed1fca5c014808963cc83cb00d7bb06af228f132d33862ca76/aiohttp-3.13.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:87b9aab6d6ed88235aa2970294f496ff1a1f9adcd724d800e9b952395a80ffd9", size = 491783, upload-time = "2026-01-03T17:30:47.466Z" }, - { url = "https://files.pythonhosted.org/packages/17/f8/8dd2cf6112a5a76f81f81a5130c57ca829d101ad583ce57f889179accdda/aiohttp-3.13.3-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:425c126c0dc43861e22cb1c14ba4c8e45d09516d0a3ae0a3f7494b79f5f233a3", size = 490704, upload-time = "2026-01-03T17:30:49.373Z" }, - { url = "https://files.pythonhosted.org/packages/6d/40/a46b03ca03936f832bc7eaa47cfbb1ad012ba1be4790122ee4f4f8cba074/aiohttp-3.13.3-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7f9120f7093c2a32d9647abcaf21e6ad275b4fbec5b55969f978b1a97c7c86bf", size = 1720652, upload-time = "2026-01-03T17:30:50.974Z" }, - { url = "https://files.pythonhosted.org/packages/f7/7e/917fe18e3607af92657e4285498f500dca797ff8c918bd7d90b05abf6c2a/aiohttp-3.13.3-cp313-cp313-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:697753042d57f4bf7122cab985bf15d0cef23c770864580f5af4f52023a56bd6", size = 1692014, upload-time = "2026-01-03T17:30:52.729Z" }, - { url = "https://files.pythonhosted.org/packages/71/b6/cefa4cbc00d315d68973b671cf105b21a609c12b82d52e5d0c9ae61d2a09/aiohttp-3.13.3-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:6de499a1a44e7de70735d0b39f67c8f25eb3d91eb3103be99ca0fa882cdd987d", size = 1759777, upload-time = "2026-01-03T17:30:54.537Z" }, - { url = "https://files.pythonhosted.org/packages/fb/e3/e06ee07b45e59e6d81498b591fc589629be1553abb2a82ce33efe2a7b068/aiohttp-3.13.3-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:37239e9f9a7ea9ac5bf6b92b0260b01f8a22281996da609206a84df860bc1261", size = 1861276, upload-time = "2026-01-03T17:30:56.512Z" }, - { url = "https://files.pythonhosted.org/packages/7c/24/75d274228acf35ceeb2850b8ce04de9dd7355ff7a0b49d607ee60c29c518/aiohttp-3.13.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:f76c1e3fe7d7c8afad7ed193f89a292e1999608170dcc9751a7462a87dfd5bc0", size = 1743131, upload-time = "2026-01-03T17:30:58.256Z" }, - { url = "https://files.pythonhosted.org/packages/04/98/3d21dde21889b17ca2eea54fdcff21b27b93f45b7bb94ca029c31ab59dc3/aiohttp-3.13.3-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:fc290605db2a917f6e81b0e1e0796469871f5af381ce15c604a3c5c7e51cb730", size = 1556863, upload-time = "2026-01-03T17:31:00.445Z" }, - { url = "https://files.pythonhosted.org/packages/9e/84/da0c3ab1192eaf64782b03971ab4055b475d0db07b17eff925e8c93b3aa5/aiohttp-3.13.3-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:4021b51936308aeea0367b8f006dc999ca02bc118a0cc78c303f50a2ff6afb91", size = 1682793, upload-time = "2026-01-03T17:31:03.024Z" }, - { url = "https://files.pythonhosted.org/packages/ff/0f/5802ada182f575afa02cbd0ec5180d7e13a402afb7c2c03a9aa5e5d49060/aiohttp-3.13.3-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:49a03727c1bba9a97d3e93c9f93ca03a57300f484b6e935463099841261195d3", size = 1716676, upload-time = "2026-01-03T17:31:04.842Z" }, - { url = "https://files.pythonhosted.org/packages/3f/8c/714d53bd8b5a4560667f7bbbb06b20c2382f9c7847d198370ec6526af39c/aiohttp-3.13.3-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:3d9908a48eb7416dc1f4524e69f1d32e5d90e3981e4e37eb0aa1cd18f9cfa2a4", size = 1733217, upload-time = "2026-01-03T17:31:06.868Z" }, - { url = "https://files.pythonhosted.org/packages/7d/79/e2176f46d2e963facea939f5be2d26368ce543622be6f00a12844d3c991f/aiohttp-3.13.3-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:2712039939ec963c237286113c68dbad80a82a4281543f3abf766d9d73228998", size = 1552303, upload-time = "2026-01-03T17:31:08.958Z" }, - { url = "https://files.pythonhosted.org/packages/ab/6a/28ed4dea1759916090587d1fe57087b03e6c784a642b85ef48217b0277ae/aiohttp-3.13.3-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:7bfdc049127717581866fa4708791220970ce291c23e28ccf3922c700740fdc0", size = 1763673, upload-time = "2026-01-03T17:31:10.676Z" }, - { url = "https://files.pythonhosted.org/packages/e8/35/4a3daeb8b9fab49240d21c04d50732313295e4bd813a465d840236dd0ce1/aiohttp-3.13.3-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:8057c98e0c8472d8846b9c79f56766bcc57e3e8ac7bfd510482332366c56c591", size = 1721120, upload-time = "2026-01-03T17:31:12.575Z" }, - { url = "https://files.pythonhosted.org/packages/bc/9f/d643bb3c5fb99547323e635e251c609fbbc660d983144cfebec529e09264/aiohttp-3.13.3-cp313-cp313-win32.whl", hash = "sha256:1449ceddcdbcf2e0446957863af03ebaaa03f94c090f945411b61269e2cb5daf", size = 427383, upload-time = "2026-01-03T17:31:14.382Z" }, - { url = "https://files.pythonhosted.org/packages/4e/f1/ab0395f8a79933577cdd996dd2f9aa6014af9535f65dddcf88204682fe62/aiohttp-3.13.3-cp313-cp313-win_amd64.whl", hash = "sha256:693781c45a4033d31d4187d2436f5ac701e7bbfe5df40d917736108c1cc7436e", size = 453899, upload-time = "2026-01-03T17:31:15.958Z" }, - { url = "https://files.pythonhosted.org/packages/99/36/5b6514a9f5d66f4e2597e40dea2e3db271e023eb7a5d22defe96ba560996/aiohttp-3.13.3-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:ea37047c6b367fd4bd632bff8077449b8fa034b69e812a18e0132a00fae6e808", size = 737238, upload-time = "2026-01-03T17:31:17.909Z" }, - { url = "https://files.pythonhosted.org/packages/f7/49/459327f0d5bcd8c6c9ca69e60fdeebc3622861e696490d8674a6d0cb90a6/aiohttp-3.13.3-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:6fc0e2337d1a4c3e6acafda6a78a39d4c14caea625124817420abceed36e2415", size = 492292, upload-time = "2026-01-03T17:31:19.919Z" }, - { url = "https://files.pythonhosted.org/packages/e8/0b/b97660c5fd05d3495b4eb27f2d0ef18dc1dc4eff7511a9bf371397ff0264/aiohttp-3.13.3-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:c685f2d80bb67ca8c3837823ad76196b3694b0159d232206d1e461d3d434666f", size = 493021, upload-time = "2026-01-03T17:31:21.636Z" }, - { url = "https://files.pythonhosted.org/packages/54/d4/438efabdf74e30aeceb890c3290bbaa449780583b1270b00661126b8aae4/aiohttp-3.13.3-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:48e377758516d262bde50c2584fc6c578af272559c409eecbdd2bae1601184d6", size = 1717263, upload-time = "2026-01-03T17:31:23.296Z" }, - { url = "https://files.pythonhosted.org/packages/71/f2/7bddc7fd612367d1459c5bcf598a9e8f7092d6580d98de0e057eb42697ad/aiohttp-3.13.3-cp314-cp314-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:34749271508078b261c4abb1767d42b8d0c0cc9449c73a4df494777dc55f0687", size = 1669107, upload-time = "2026-01-03T17:31:25.334Z" }, - { url = "https://files.pythonhosted.org/packages/00/5a/1aeaecca40e22560f97610a329e0e5efef5e0b5afdf9f857f0d93839ab2e/aiohttp-3.13.3-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:82611aeec80eb144416956ec85b6ca45a64d76429c1ed46ae1b5f86c6e0c9a26", size = 1760196, upload-time = "2026-01-03T17:31:27.394Z" }, - { url = "https://files.pythonhosted.org/packages/f8/f8/0ff6992bea7bd560fc510ea1c815f87eedd745fe035589c71ce05612a19a/aiohttp-3.13.3-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:2fff83cfc93f18f215896e3a190e8e5cb413ce01553901aca925176e7568963a", size = 1843591, upload-time = "2026-01-03T17:31:29.238Z" }, - { url = "https://files.pythonhosted.org/packages/e3/d1/e30e537a15f53485b61f5be525f2157da719819e8377298502aebac45536/aiohttp-3.13.3-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:bbe7d4cecacb439e2e2a8a1a7b935c25b812af7a5fd26503a66dadf428e79ec1", size = 1720277, upload-time = "2026-01-03T17:31:31.053Z" }, - { url = "https://files.pythonhosted.org/packages/84/45/23f4c451d8192f553d38d838831ebbc156907ea6e05557f39563101b7717/aiohttp-3.13.3-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:b928f30fe49574253644b1ca44b1b8adbd903aa0da4b9054a6c20fc7f4092a25", size = 1548575, upload-time = "2026-01-03T17:31:32.87Z" }, - { url = "https://files.pythonhosted.org/packages/6a/ed/0a42b127a43712eda7807e7892c083eadfaf8429ca8fb619662a530a3aab/aiohttp-3.13.3-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:7b5e8fe4de30df199155baaf64f2fcd604f4c678ed20910db8e2c66dc4b11603", size = 1679455, upload-time = "2026-01-03T17:31:34.76Z" }, - { url = "https://files.pythonhosted.org/packages/2e/b5/c05f0c2b4b4fe2c9d55e73b6d3ed4fd6c9dc2684b1d81cbdf77e7fad9adb/aiohttp-3.13.3-cp314-cp314-musllinux_1_2_armv7l.whl", hash = "sha256:8542f41a62bcc58fc7f11cf7c90e0ec324ce44950003feb70640fc2a9092c32a", size = 1687417, upload-time = "2026-01-03T17:31:36.699Z" }, - { url = "https://files.pythonhosted.org/packages/c9/6b/915bc5dad66aef602b9e459b5a973529304d4e89ca86999d9d75d80cbd0b/aiohttp-3.13.3-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:5e1d8c8b8f1d91cd08d8f4a3c2b067bfca6ec043d3ff36de0f3a715feeedf926", size = 1729968, upload-time = "2026-01-03T17:31:38.622Z" }, - { url = "https://files.pythonhosted.org/packages/11/3b/e84581290a9520024a08640b63d07673057aec5ca548177a82026187ba73/aiohttp-3.13.3-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:90455115e5da1c3c51ab619ac57f877da8fd6d73c05aacd125c5ae9819582aba", size = 1545690, upload-time = "2026-01-03T17:31:40.57Z" }, - { url = "https://files.pythonhosted.org/packages/f5/04/0c3655a566c43fd647c81b895dfe361b9f9ad6d58c19309d45cff52d6c3b/aiohttp-3.13.3-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:042e9e0bcb5fba81886c8b4fbb9a09d6b8a00245fd8d88e4d989c1f96c74164c", size = 1746390, upload-time = "2026-01-03T17:31:42.857Z" }, - { url = "https://files.pythonhosted.org/packages/1f/53/71165b26978f719c3419381514c9690bd5980e764a09440a10bb816ea4ab/aiohttp-3.13.3-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:2eb752b102b12a76ca02dff751a801f028b4ffbbc478840b473597fc91a9ed43", size = 1702188, upload-time = "2026-01-03T17:31:44.984Z" }, - { url = "https://files.pythonhosted.org/packages/29/a7/cbe6c9e8e136314fa1980da388a59d2f35f35395948a08b6747baebb6aa6/aiohttp-3.13.3-cp314-cp314-win32.whl", hash = "sha256:b556c85915d8efaed322bf1bdae9486aa0f3f764195a0fb6ee962e5c71ef5ce1", size = 433126, upload-time = "2026-01-03T17:31:47.463Z" }, - { url = "https://files.pythonhosted.org/packages/de/56/982704adea7d3b16614fc5936014e9af85c0e34b58f9046655817f04306e/aiohttp-3.13.3-cp314-cp314-win_amd64.whl", hash = "sha256:9bf9f7a65e7aa20dd764151fb3d616c81088f91f8df39c3893a536e279b4b984", size = 459128, upload-time = "2026-01-03T17:31:49.2Z" }, - { url = "https://files.pythonhosted.org/packages/6c/2a/3c79b638a9c3d4658d345339d22070241ea341ed4e07b5ac60fb0f418003/aiohttp-3.13.3-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:05861afbbec40650d8a07ea324367cb93e9e8cc7762e04dd4405df99fa65159c", size = 769512, upload-time = "2026-01-03T17:31:51.134Z" }, - { url = "https://files.pythonhosted.org/packages/29/b9/3e5014d46c0ab0db8707e0ac2711ed28c4da0218c358a4e7c17bae0d8722/aiohttp-3.13.3-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:2fc82186fadc4a8316768d61f3722c230e2c1dcab4200d52d2ebdf2482e47592", size = 506444, upload-time = "2026-01-03T17:31:52.85Z" }, - { url = "https://files.pythonhosted.org/packages/90/03/c1d4ef9a054e151cd7839cdc497f2638f00b93cbe8043983986630d7a80c/aiohttp-3.13.3-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:0add0900ff220d1d5c5ebbf99ed88b0c1bbf87aa7e4262300ed1376a6b13414f", size = 510798, upload-time = "2026-01-03T17:31:54.91Z" }, - { url = "https://files.pythonhosted.org/packages/ea/76/8c1e5abbfe8e127c893fe7ead569148a4d5a799f7cf958d8c09f3eedf097/aiohttp-3.13.3-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:568f416a4072fbfae453dcf9a99194bbb8bdeab718e08ee13dfa2ba0e4bebf29", size = 1868835, upload-time = "2026-01-03T17:31:56.733Z" }, - { url = "https://files.pythonhosted.org/packages/8e/ac/984c5a6f74c363b01ff97adc96a3976d9c98940b8969a1881575b279ac5d/aiohttp-3.13.3-cp314-cp314t-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:add1da70de90a2569c5e15249ff76a631ccacfe198375eead4aadf3b8dc849dc", size = 1720486, upload-time = "2026-01-03T17:31:58.65Z" }, - { url = "https://files.pythonhosted.org/packages/b2/9a/b7039c5f099c4eb632138728828b33428585031a1e658d693d41d07d89d1/aiohttp-3.13.3-cp314-cp314t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:10b47b7ba335d2e9b1239fa571131a87e2d8ec96b333e68b2a305e7a98b0bae2", size = 1847951, upload-time = "2026-01-03T17:32:00.989Z" }, - { url = "https://files.pythonhosted.org/packages/3c/02/3bec2b9a1ba3c19ff89a43a19324202b8eb187ca1e928d8bdac9bbdddebd/aiohttp-3.13.3-cp314-cp314t-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:3dd4dce1c718e38081c8f35f323209d4c1df7d4db4bab1b5c88a6b4d12b74587", size = 1941001, upload-time = "2026-01-03T17:32:03.122Z" }, - { url = "https://files.pythonhosted.org/packages/37/df/d879401cedeef27ac4717f6426c8c36c3091c6e9f08a9178cc87549c537f/aiohttp-3.13.3-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:34bac00a67a812570d4a460447e1e9e06fae622946955f939051e7cc895cfab8", size = 1797246, upload-time = "2026-01-03T17:32:05.255Z" }, - { url = "https://files.pythonhosted.org/packages/8d/15/be122de1f67e6953add23335c8ece6d314ab67c8bebb3f181063010795a7/aiohttp-3.13.3-cp314-cp314t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:a19884d2ee70b06d9204b2727a7b9f983d0c684c650254679e716b0b77920632", size = 1627131, upload-time = "2026-01-03T17:32:07.607Z" }, - { url = "https://files.pythonhosted.org/packages/12/12/70eedcac9134cfa3219ab7af31ea56bc877395b1ac30d65b1bc4b27d0438/aiohttp-3.13.3-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:5f8ca7f2bb6ba8348a3614c7918cc4bb73268c5ac2a207576b7afea19d3d9f64", size = 1795196, upload-time = "2026-01-03T17:32:09.59Z" }, - { url = "https://files.pythonhosted.org/packages/32/11/b30e1b1cd1f3054af86ebe60df96989c6a414dd87e27ad16950eee420bea/aiohttp-3.13.3-cp314-cp314t-musllinux_1_2_armv7l.whl", hash = "sha256:b0d95340658b9d2f11d9697f59b3814a9d3bb4b7a7c20b131df4bcef464037c0", size = 1782841, upload-time = "2026-01-03T17:32:11.445Z" }, - { url = "https://files.pythonhosted.org/packages/88/0d/d98a9367b38912384a17e287850f5695c528cff0f14f791ce8ee2e4f7796/aiohttp-3.13.3-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:a1e53262fd202e4b40b70c3aff944a8155059beedc8a89bba9dc1f9ef06a1b56", size = 1795193, upload-time = "2026-01-03T17:32:13.705Z" }, - { url = "https://files.pythonhosted.org/packages/43/a5/a2dfd1f5ff5581632c7f6a30e1744deda03808974f94f6534241ef60c751/aiohttp-3.13.3-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:d60ac9663f44168038586cab2157e122e46bdef09e9368b37f2d82d354c23f72", size = 1621979, upload-time = "2026-01-03T17:32:15.965Z" }, - { url = "https://files.pythonhosted.org/packages/fa/f0/12973c382ae7c1cccbc4417e129c5bf54c374dfb85af70893646e1f0e749/aiohttp-3.13.3-cp314-cp314t-musllinux_1_2_s390x.whl", hash = "sha256:90751b8eed69435bac9ff4e3d2f6b3af1f57e37ecb0fbeee59c0174c9e2d41df", size = 1822193, upload-time = "2026-01-03T17:32:18.219Z" }, - { url = "https://files.pythonhosted.org/packages/3c/5f/24155e30ba7f8c96918af1350eb0663e2430aad9e001c0489d89cd708ab1/aiohttp-3.13.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:fc353029f176fd2b3ec6cfc71be166aba1936fe5d73dd1992ce289ca6647a9aa", size = 1769801, upload-time = "2026-01-03T17:32:20.25Z" }, - { url = "https://files.pythonhosted.org/packages/eb/f8/7314031ff5c10e6ece114da79b338ec17eeff3a079e53151f7e9f43c4723/aiohttp-3.13.3-cp314-cp314t-win32.whl", hash = "sha256:2e41b18a58da1e474a057b3d35248d8320029f61d70a37629535b16a0c8f3767", size = 466523, upload-time = "2026-01-03T17:32:22.215Z" }, - { url = "https://files.pythonhosted.org/packages/b4/63/278a98c715ae467624eafe375542d8ba9b4383a016df8fdefe0ae28382a7/aiohttp-3.13.3-cp314-cp314t-win_amd64.whl", hash = "sha256:44531a36aa2264a1860089ffd4dce7baf875ee5a6079d5fb42e261c704ef7344", size = 499694, upload-time = "2026-01-03T17:32:24.546Z" }, + { url = "https://files.pythonhosted.org/packages/d4/7e/cb94129302d78c46662b47f9897d642fd0b33bdfef4b73b20c6ced35aa4c/aiohttp-3.13.4-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:8ea0c64d1bcbf201b285c2246c51a0c035ba3bbd306640007bc5844a3b4658c1", size = 760027, upload-time = "2026-03-28T17:15:33.022Z" }, + { url = "https://files.pythonhosted.org/packages/5e/cd/2db3c9397c3bd24216b203dd739945b04f8b87bb036c640da7ddb63c75ef/aiohttp-3.13.4-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:6f742e1fa45c0ed522b00ede565e18f97e4cf8d1883a712ac42d0339dfb0cce7", size = 508325, upload-time = "2026-03-28T17:15:34.714Z" }, + { url = "https://files.pythonhosted.org/packages/36/a3/d28b2722ec13107f2e37a86b8a169897308bab6a3b9e071ecead9d67bd9b/aiohttp-3.13.4-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:6dcfb50ee25b3b7a1222a9123be1f9f89e56e67636b561441f0b304e25aaef8f", size = 502402, upload-time = "2026-03-28T17:15:36.409Z" }, + { url = "https://files.pythonhosted.org/packages/fa/d6/acd47b5f17c4430e555590990a4746efbcb2079909bb865516892bf85f37/aiohttp-3.13.4-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:3262386c4ff370849863ea93b9ea60fd59c6cf56bf8f93beac625cf4d677c04d", size = 1771224, upload-time = "2026-03-28T17:15:38.223Z" }, + { url = "https://files.pythonhosted.org/packages/98/af/af6e20113ba6a48fd1cd9e5832c4851e7613ef50c7619acdaee6ec5f1aff/aiohttp-3.13.4-cp311-cp311-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:473bb5aa4218dd254e9ae4834f20e31f5a0083064ac0136a01a62ddbae2eaa42", size = 1731530, upload-time = "2026-03-28T17:15:39.988Z" }, + { url = "https://files.pythonhosted.org/packages/81/16/78a2f5d9c124ad05d5ce59a9af94214b6466c3491a25fb70760e98e9f762/aiohttp-3.13.4-cp311-cp311-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:e56423766399b4c77b965f6aaab6c9546617b8994a956821cc507d00b91d978c", size = 1827925, upload-time = "2026-03-28T17:15:41.944Z" }, + { url = "https://files.pythonhosted.org/packages/2a/1f/79acf0974ced805e0e70027389fccbb7d728e6f30fcac725fb1071e63075/aiohttp-3.13.4-cp311-cp311-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:8af249343fafd5ad90366a16d230fc265cf1149f26075dc9fe93cfd7c7173942", size = 1923579, upload-time = "2026-03-28T17:15:44.071Z" }, + { url = "https://files.pythonhosted.org/packages/af/53/29f9e2054ea6900413f3b4c3eb9d8331f60678ec855f13ba8714c47fd48d/aiohttp-3.13.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0bc0a5cf4f10ef5a2c94fdde488734b582a3a7a000b131263e27c9295bd682d9", size = 1767655, upload-time = "2026-03-28T17:15:45.911Z" }, + { url = "https://files.pythonhosted.org/packages/f3/57/462fe1d3da08109ba4aa8590e7aed57c059af2a7e80ec21f4bac5cfe1094/aiohttp-3.13.4-cp311-cp311-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:5c7ff1028e3c9fc5123a865ce17df1cb6424d180c503b8517afbe89aa566e6be", size = 1630439, upload-time = "2026-03-28T17:15:48.11Z" }, + { url = "https://files.pythonhosted.org/packages/d7/4b/4813344aacdb8127263e3eec343d24e973421143826364fa9fc847f6283f/aiohttp-3.13.4-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:ba5cf98b5dcb9bddd857da6713a503fa6d341043258ca823f0f5ab7ab4a94ee8", size = 1745557, upload-time = "2026-03-28T17:15:50.13Z" }, + { url = "https://files.pythonhosted.org/packages/d4/01/1ef1adae1454341ec50a789f03cfafe4c4ac9c003f6a64515ecd32fe4210/aiohttp-3.13.4-cp311-cp311-musllinux_1_2_armv7l.whl", hash = "sha256:d85965d3ba21ee4999e83e992fecb86c4614d6920e40705501c0a1f80a583c12", size = 1741796, upload-time = "2026-03-28T17:15:52.351Z" }, + { url = "https://files.pythonhosted.org/packages/22/04/8cdd99af988d2aa6922714d957d21383c559835cbd43fbf5a47ddf2e0f05/aiohttp-3.13.4-cp311-cp311-musllinux_1_2_ppc64le.whl", hash = "sha256:49f0b18a9b05d79f6f37ddd567695943fcefb834ef480f17a4211987302b2dc7", size = 1805312, upload-time = "2026-03-28T17:15:54.407Z" }, + { url = "https://files.pythonhosted.org/packages/fb/7f/b48d5577338d4b25bbdbae35c75dbfd0493cb8886dc586fbfb2e90862239/aiohttp-3.13.4-cp311-cp311-musllinux_1_2_riscv64.whl", hash = "sha256:7f78cb080c86fbf765920e5f1ef35af3f24ec4314d6675d0a21eaf41f6f2679c", size = 1621751, upload-time = "2026-03-28T17:15:56.564Z" }, + { url = "https://files.pythonhosted.org/packages/bc/89/4eecad8c1858e6d0893c05929e22343e0ebe3aec29a8a399c65c3cc38311/aiohttp-3.13.4-cp311-cp311-musllinux_1_2_s390x.whl", hash = "sha256:67a3ec705534a614b68bbf1c70efa777a21c3da3895d1c44510a41f5a7ae0453", size = 1826073, upload-time = "2026-03-28T17:15:58.489Z" }, + { url = "https://files.pythonhosted.org/packages/f5/5c/9dc8293ed31b46c39c9c513ac7ca152b3c3d38e0ea111a530ad12001b827/aiohttp-3.13.4-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:d6630ec917e85c5356b2295744c8a97d40f007f96a1c76bf1928dc2e27465393", size = 1760083, upload-time = "2026-03-28T17:16:00.677Z" }, + { url = "https://files.pythonhosted.org/packages/1e/19/8bbf6a4994205d96831f97b7d21a0feed120136e6267b5b22d229c6dc4dc/aiohttp-3.13.4-cp311-cp311-win32.whl", hash = "sha256:54049021bc626f53a5394c29e8c444f726ee5a14b6e89e0ad118315b1f90f5e3", size = 439690, upload-time = "2026-03-28T17:16:02.902Z" }, + { url = "https://files.pythonhosted.org/packages/0c/f5/ac409ecd1007528d15c3e8c3a57d34f334c70d76cfb7128a28cffdebd4c1/aiohttp-3.13.4-cp311-cp311-win_amd64.whl", hash = "sha256:c033f2bc964156030772d31cbf7e5defea181238ce1f87b9455b786de7d30145", size = 463824, upload-time = "2026-03-28T17:16:05.058Z" }, + { url = "https://files.pythonhosted.org/packages/1e/bd/ede278648914cabbabfdf95e436679b5d4156e417896a9b9f4587169e376/aiohttp-3.13.4-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:ee62d4471ce86b108b19c3364db4b91180d13fe3510144872d6bad5401957360", size = 752158, upload-time = "2026-03-28T17:16:06.901Z" }, + { url = "https://files.pythonhosted.org/packages/90/de/581c053253c07b480b03785196ca5335e3c606a37dc73e95f6527f1591fe/aiohttp-3.13.4-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:c0fd8f41b54b58636402eb493afd512c23580456f022c1ba2db0f810c959ed0d", size = 501037, upload-time = "2026-03-28T17:16:08.82Z" }, + { url = "https://files.pythonhosted.org/packages/fa/f9/a5ede193c08f13cc42c0a5b50d1e246ecee9115e4cf6e900d8dbd8fd6acb/aiohttp-3.13.4-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:4baa48ce49efd82d6b1a0be12d6a36b35e5594d1dd42f8bfba96ea9f8678b88c", size = 501556, upload-time = "2026-03-28T17:16:10.63Z" }, + { url = "https://files.pythonhosted.org/packages/d6/10/88ff67cd48a6ec36335b63a640abe86135791544863e0cfe1f065d6cef7a/aiohttp-3.13.4-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d738ebab9f71ee652d9dbd0211057690022201b11197f9a7324fd4dba128aa97", size = 1757314, upload-time = "2026-03-28T17:16:12.498Z" }, + { url = "https://files.pythonhosted.org/packages/8b/15/fdb90a5cf5a1f52845c276e76298c75fbbcc0ac2b4a86551906d54529965/aiohttp-3.13.4-cp312-cp312-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:0ce692c3468fa831af7dceed52edf51ac348cebfc8d3feb935927b63bd3e8576", size = 1731819, upload-time = "2026-03-28T17:16:14.558Z" }, + { url = "https://files.pythonhosted.org/packages/ec/df/28146785a007f7820416be05d4f28cc207493efd1e8c6c1068e9bdc29198/aiohttp-3.13.4-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:8e08abcfe752a454d2cb89ff0c08f2d1ecd057ae3e8cc6d84638de853530ebab", size = 1793279, upload-time = "2026-03-28T17:16:16.594Z" }, + { url = "https://files.pythonhosted.org/packages/10/47/689c743abf62ea7a77774d5722f220e2c912a77d65d368b884d9779ef41b/aiohttp-3.13.4-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:5977f701b3fff36367a11087f30ea73c212e686d41cd363c50c022d48b011d8d", size = 1891082, upload-time = "2026-03-28T17:16:18.71Z" }, + { url = "https://files.pythonhosted.org/packages/b0/b6/f7f4f318c7e58c23b761c9b13b9a3c9b394e0f9d5d76fbc6622fa98509f6/aiohttp-3.13.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:54203e10405c06f8b6020bd1e076ae0fe6c194adcee12a5a78af3ffa3c57025e", size = 1773938, upload-time = "2026-03-28T17:16:21.125Z" }, + { url = "https://files.pythonhosted.org/packages/aa/06/f207cb3121852c989586a6fc16ff854c4fcc8651b86c5d3bd1fc83057650/aiohttp-3.13.4-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:358a6af0145bc4dda037f13167bef3cce54b132087acc4c295c739d05d16b1c3", size = 1579548, upload-time = "2026-03-28T17:16:23.588Z" }, + { url = "https://files.pythonhosted.org/packages/6c/58/e1289661a32161e24c1fe479711d783067210d266842523752869cc1d9c2/aiohttp-3.13.4-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:898ea1850656d7d61832ef06aa9846ab3ddb1621b74f46de78fbc5e1a586ba83", size = 1714669, upload-time = "2026-03-28T17:16:25.713Z" }, + { url = "https://files.pythonhosted.org/packages/96/0a/3e86d039438a74a86e6a948a9119b22540bae037d6ba317a042ae3c22711/aiohttp-3.13.4-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:7bc30cceb710cf6a44e9617e43eebb6e3e43ad855a34da7b4b6a73537d8a6763", size = 1754175, upload-time = "2026-03-28T17:16:28.18Z" }, + { url = "https://files.pythonhosted.org/packages/f4/30/e717fc5df83133ba467a560b6d8ef20197037b4bb5d7075b90037de1018e/aiohttp-3.13.4-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:4a31c0c587a8a038f19a4c7e60654a6c899c9de9174593a13e7cc6e15ff271f9", size = 1762049, upload-time = "2026-03-28T17:16:30.941Z" }, + { url = "https://files.pythonhosted.org/packages/e4/28/8f7a2d4492e336e40005151bdd94baf344880a4707573378579f833a64c1/aiohttp-3.13.4-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:2062f675f3fe6e06d6113eb74a157fb9df58953ffed0cdb4182554b116545758", size = 1570861, upload-time = "2026-03-28T17:16:32.953Z" }, + { url = "https://files.pythonhosted.org/packages/78/45/12e1a3d0645968b1c38de4b23fdf270b8637735ea057d4f84482ff918ad9/aiohttp-3.13.4-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:3d1ba8afb847ff80626d5e408c1fdc99f942acc877d0702fe137015903a220a9", size = 1790003, upload-time = "2026-03-28T17:16:35.468Z" }, + { url = "https://files.pythonhosted.org/packages/eb/0f/60374e18d590de16dcb39d6ff62f39c096c1b958e6f37727b5870026ea30/aiohttp-3.13.4-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:b08149419994cdd4d5eecf7fd4bc5986b5a9380285bcd01ab4c0d6bfca47b79d", size = 1737289, upload-time = "2026-03-28T17:16:38.187Z" }, + { url = "https://files.pythonhosted.org/packages/02/bf/535e58d886cfbc40a8b0013c974afad24ef7632d645bca0b678b70033a60/aiohttp-3.13.4-cp312-cp312-win32.whl", hash = "sha256:fc432f6a2c4f720180959bc19aa37259651c1a4ed8af8afc84dd41c60f15f791", size = 434185, upload-time = "2026-03-28T17:16:40.735Z" }, + { url = "https://files.pythonhosted.org/packages/1e/1a/d92e3325134ebfff6f4069f270d3aac770d63320bd1fcd0eca023e74d9a8/aiohttp-3.13.4-cp312-cp312-win_amd64.whl", hash = "sha256:6148c9ae97a3e8bff9a1fc9c757fa164116f86c100468339730e717590a3fb77", size = 461285, upload-time = "2026-03-28T17:16:42.713Z" }, + { url = "https://files.pythonhosted.org/packages/e3/ac/892f4162df9b115b4758d615f32ec63d00f3084c705ff5526630887b9b42/aiohttp-3.13.4-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:63dd5e5b1e43b8fb1e91b79b7ceba1feba588b317d1edff385084fcc7a0a4538", size = 745744, upload-time = "2026-03-28T17:16:44.67Z" }, + { url = "https://files.pythonhosted.org/packages/97/a9/c5b87e4443a2f0ea88cb3000c93a8fdad1ee63bffc9ded8d8c8e0d66efc6/aiohttp-3.13.4-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:746ac3cc00b5baea424dacddea3ec2c2702f9590de27d837aa67004db1eebc6e", size = 498178, upload-time = "2026-03-28T17:16:46.766Z" }, + { url = "https://files.pythonhosted.org/packages/94/42/07e1b543a61250783650df13da8ddcdc0d0a5538b2bd15cef6e042aefc61/aiohttp-3.13.4-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:bda8f16ea99d6a6705e5946732e48487a448be874e54a4f73d514660ff7c05d3", size = 498331, upload-time = "2026-03-28T17:16:48.9Z" }, + { url = "https://files.pythonhosted.org/packages/20/d6/492f46bf0328534124772d0cf58570acae5b286ea25006900650f69dae0e/aiohttp-3.13.4-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4b061e7b5f840391e3f64d0ddf672973e45c4cfff7a0feea425ea24e51530fc2", size = 1744414, upload-time = "2026-03-28T17:16:50.968Z" }, + { url = "https://files.pythonhosted.org/packages/e2/4d/e02627b2683f68051246215d2d62b2d2f249ff7a285e7a858dc47d6b6a14/aiohttp-3.13.4-cp313-cp313-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:b252e8d5cd66184b570d0d010de742736e8a4fab22c58299772b0c5a466d4b21", size = 1719226, upload-time = "2026-03-28T17:16:53.173Z" }, + { url = "https://files.pythonhosted.org/packages/7b/6c/5d0a3394dd2b9f9aeba6e1b6065d0439e4b75d41f1fb09a3ec010b43552b/aiohttp-3.13.4-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:20af8aad61d1803ff11152a26146d8d81c266aa8c5aa9b4504432abb965c36a0", size = 1782110, upload-time = "2026-03-28T17:16:55.362Z" }, + { url = "https://files.pythonhosted.org/packages/0d/2d/c20791e3437700a7441a7edfb59731150322424f5aadf635602d1d326101/aiohttp-3.13.4-cp313-cp313-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:13a5cc924b59859ad2adb1478e31f410a7ed46e92a2a619d6d1dd1a63c1a855e", size = 1884809, upload-time = "2026-03-28T17:16:57.734Z" }, + { url = "https://files.pythonhosted.org/packages/c8/94/d99dbfbd1924a87ef643833932eb2a3d9e5eee87656efea7d78058539eff/aiohttp-3.13.4-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:534913dfb0a644d537aebb4123e7d466d94e3be5549205e6a31f72368980a81a", size = 1764938, upload-time = "2026-03-28T17:17:00.221Z" }, + { url = "https://files.pythonhosted.org/packages/49/61/3ce326a1538781deb89f6cf5e094e2029cd308ed1e21b2ba2278b08426f6/aiohttp-3.13.4-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:320e40192a2dcc1cf4b5576936e9652981ab596bf81eb309535db7e2f5b5672f", size = 1570697, upload-time = "2026-03-28T17:17:02.985Z" }, + { url = "https://files.pythonhosted.org/packages/b6/77/4ab5a546857bb3028fbaf34d6eea180267bdab022ee8b1168b1fcde4bfdd/aiohttp-3.13.4-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:9e587fcfce2bcf06526a43cb705bdee21ac089096f2e271d75de9c339db3100c", size = 1702258, upload-time = "2026-03-28T17:17:05.28Z" }, + { url = "https://files.pythonhosted.org/packages/79/63/d8f29021e39bc5af8e5d5e9da1b07976fb9846487a784e11e4f4eeda4666/aiohttp-3.13.4-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:9eb9c2eea7278206b5c6c1441fdd9dc420c278ead3f3b2cc87f9b693698cc500", size = 1740287, upload-time = "2026-03-28T17:17:07.712Z" }, + { url = "https://files.pythonhosted.org/packages/55/3a/cbc6b3b124859a11bc8055d3682c26999b393531ef926754a3445b99dfef/aiohttp-3.13.4-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:29be00c51972b04bf9d5c8f2d7f7314f48f96070ca40a873a53056e652e805f7", size = 1753011, upload-time = "2026-03-28T17:17:10.053Z" }, + { url = "https://files.pythonhosted.org/packages/e0/30/836278675205d58c1368b21520eab9572457cf19afd23759216c04483048/aiohttp-3.13.4-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:90c06228a6c3a7c9f776fe4fc0b7ff647fffd3bed93779a6913c804ae00c1073", size = 1566359, upload-time = "2026-03-28T17:17:12.433Z" }, + { url = "https://files.pythonhosted.org/packages/50/b4/8032cc9b82d17e4277704ba30509eaccb39329dc18d6a35f05e424439e32/aiohttp-3.13.4-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:a533ec132f05fd9a1d959e7f34184cd7d5e8511584848dab85faefbaac573069", size = 1785537, upload-time = "2026-03-28T17:17:14.721Z" }, + { url = "https://files.pythonhosted.org/packages/17/7d/5873e98230bde59f493bf1f7c3e327486a4b5653fa401144704df5d00211/aiohttp-3.13.4-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:1c946f10f413836f82ea4cfb90200d2a59578c549f00857e03111cf45ad01ca5", size = 1740752, upload-time = "2026-03-28T17:17:17.387Z" }, + { url = "https://files.pythonhosted.org/packages/7b/f2/13e46e0df051494d7d3c68b7f72d071f48c384c12716fc294f75d5b1a064/aiohttp-3.13.4-cp313-cp313-win32.whl", hash = "sha256:48708e2706106da6967eff5908c78ca3943f005ed6bcb75da2a7e4da94ef8c70", size = 433187, upload-time = "2026-03-28T17:17:19.523Z" }, + { url = "https://files.pythonhosted.org/packages/ea/c0/649856ee655a843c8f8664592cfccb73ac80ede6a8c8db33a25d810c12db/aiohttp-3.13.4-cp313-cp313-win_amd64.whl", hash = "sha256:74a2eb058da44fa3a877a49e2095b591d4913308bb424c418b77beb160c55ce3", size = 459778, upload-time = "2026-03-28T17:17:21.964Z" }, + { url = "https://files.pythonhosted.org/packages/6d/29/6657cc37ae04cacc2dbf53fb730a06b6091cc4cbe745028e047c53e6d840/aiohttp-3.13.4-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:e0a2c961fc92abeff61d6444f2ce6ad35bb982db9fc8ff8a47455beacf454a57", size = 749363, upload-time = "2026-03-28T17:17:24.044Z" }, + { url = "https://files.pythonhosted.org/packages/90/7f/30ccdf67ca3d24b610067dc63d64dcb91e5d88e27667811640644aa4a85d/aiohttp-3.13.4-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:153274535985a0ff2bff1fb6c104ed547cec898a09213d21b0f791a44b14d933", size = 499317, upload-time = "2026-03-28T17:17:26.199Z" }, + { url = "https://files.pythonhosted.org/packages/93/13/e372dd4e68ad04ee25dafb050c7f98b0d91ea643f7352757e87231102555/aiohttp-3.13.4-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:351f3171e2458da3d731ce83f9e6b9619e325c45cbd534c7759750cabf453ad7", size = 500477, upload-time = "2026-03-28T17:17:28.279Z" }, + { url = "https://files.pythonhosted.org/packages/e5/fe/ee6298e8e586096fb6f5eddd31393d8544f33ae0792c71ecbb4c2bef98ac/aiohttp-3.13.4-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f989ac8bc5595ff761a5ccd32bdb0768a117f36dd1504b1c2c074ed5d3f4df9c", size = 1737227, upload-time = "2026-03-28T17:17:30.587Z" }, + { url = "https://files.pythonhosted.org/packages/b0/b9/a7a0463a09e1a3fe35100f74324f23644bfc3383ac5fd5effe0722a5f0b7/aiohttp-3.13.4-cp314-cp314-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:d36fc1709110ec1e87a229b201dd3ddc32aa01e98e7868083a794609b081c349", size = 1694036, upload-time = "2026-03-28T17:17:33.29Z" }, + { url = "https://files.pythonhosted.org/packages/57/7c/8972ae3fb7be00a91aee6b644b2a6a909aedb2c425269a3bfd90115e6f8f/aiohttp-3.13.4-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:42adaeea83cbdf069ab94f5103ce0787c21fb1a0153270da76b59d5578302329", size = 1786814, upload-time = "2026-03-28T17:17:36.035Z" }, + { url = "https://files.pythonhosted.org/packages/93/01/c81e97e85c774decbaf0d577de7d848934e8166a3a14ad9f8aa5be329d28/aiohttp-3.13.4-cp314-cp314-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:92deb95469928cc41fd4b42a95d8012fa6df93f6b1c0a83af0ffbc4a5e218cde", size = 1866676, upload-time = "2026-03-28T17:17:38.441Z" }, + { url = "https://files.pythonhosted.org/packages/5a/5f/5b46fe8694a639ddea2cd035bf5729e4677ea882cb251396637e2ef1590d/aiohttp-3.13.4-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0c0c7c07c4257ef3a1df355f840bc62d133bcdef5c1c5ba75add3c08553e2eed", size = 1740842, upload-time = "2026-03-28T17:17:40.783Z" }, + { url = "https://files.pythonhosted.org/packages/20/a2/0d4b03d011cca6b6b0acba8433193c1e484efa8d705ea58295590fe24203/aiohttp-3.13.4-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:f062c45de8a1098cb137a1898819796a2491aec4e637a06b03f149315dff4d8f", size = 1566508, upload-time = "2026-03-28T17:17:43.235Z" }, + { url = "https://files.pythonhosted.org/packages/98/17/e689fd500da52488ec5f889effd6404dece6a59de301e380f3c64f167beb/aiohttp-3.13.4-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:76093107c531517001114f0ebdb4f46858ce818590363e3e99a4a2280334454a", size = 1700569, upload-time = "2026-03-28T17:17:46.165Z" }, + { url = "https://files.pythonhosted.org/packages/d8/0d/66402894dbcf470ef7db99449e436105ea862c24f7ea4c95c683e635af35/aiohttp-3.13.4-cp314-cp314-musllinux_1_2_armv7l.whl", hash = "sha256:6f6ec32162d293b82f8b63a16edc80769662fbd5ae6fbd4936d3206a2c2cc63b", size = 1707407, upload-time = "2026-03-28T17:17:48.825Z" }, + { url = "https://files.pythonhosted.org/packages/2f/eb/af0ab1a3650092cbd8e14ef29e4ab0209e1460e1c299996c3f8288b3f1ff/aiohttp-3.13.4-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:5903e2db3d202a00ad9f0ec35a122c005e85d90c9836ab4cda628f01edf425e2", size = 1752214, upload-time = "2026-03-28T17:17:51.206Z" }, + { url = "https://files.pythonhosted.org/packages/5a/bf/72326f8a98e4c666f292f03c385545963cc65e358835d2a7375037a97b57/aiohttp-3.13.4-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:2d5bea57be7aca98dbbac8da046d99b5557c5cf4e28538c4c786313078aca09e", size = 1562162, upload-time = "2026-03-28T17:17:53.634Z" }, + { url = "https://files.pythonhosted.org/packages/67/9f/13b72435f99151dd9a5469c96b3b5f86aa29b7e785ca7f35cf5e538f74c0/aiohttp-3.13.4-cp314-cp314-musllinux_1_2_s390x.whl", hash = "sha256:bcf0c9902085976edc0232b75006ef38f89686901249ce14226b6877f88464fb", size = 1768904, upload-time = "2026-03-28T17:17:55.991Z" }, + { url = "https://files.pythonhosted.org/packages/18/bc/28d4970e7d5452ac7776cdb5431a1164a0d9cf8bd2fffd67b4fb463aa56d/aiohttp-3.13.4-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:c3295f98bfeed2e867cab588f2a146a9db37a85e3ae9062abf46ba062bd29165", size = 1723378, upload-time = "2026-03-28T17:17:58.348Z" }, + { url = "https://files.pythonhosted.org/packages/53/74/b32458ca1a7f34d65bdee7aef2036adbe0438123d3d53e2b083c453c24dd/aiohttp-3.13.4-cp314-cp314-win32.whl", hash = "sha256:a598a5c5767e1369d8f5b08695cab1d8160040f796c4416af76fd773d229b3c9", size = 438711, upload-time = "2026-03-28T17:18:00.728Z" }, + { url = "https://files.pythonhosted.org/packages/40/b2/54b487316c2df3e03a8f3435e9636f8a81a42a69d942164830d193beb56a/aiohttp-3.13.4-cp314-cp314-win_amd64.whl", hash = "sha256:c555db4bc7a264bead5a7d63d92d41a1122fcd39cc62a4db815f45ad46f9c2c8", size = 464977, upload-time = "2026-03-28T17:18:03.367Z" }, + { url = "https://files.pythonhosted.org/packages/47/fb/e41b63c6ce71b07a59243bb8f3b457ee0c3402a619acb9d2c0d21ef0e647/aiohttp-3.13.4-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:45abbbf09a129825d13c18c7d3182fecd46d9da3cfc383756145394013604ac1", size = 781549, upload-time = "2026-03-28T17:18:05.779Z" }, + { url = "https://files.pythonhosted.org/packages/97/53/532b8d28df1e17e44c4d9a9368b78dcb6bf0b51037522136eced13afa9e8/aiohttp-3.13.4-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:74c80b2bc2c2adb7b3d1941b2b60701ee2af8296fc8aad8b8bc48bc25767266c", size = 514383, upload-time = "2026-03-28T17:18:08.096Z" }, + { url = "https://files.pythonhosted.org/packages/1b/1f/62e5d400603e8468cd635812d99cb81cfdc08127a3dc474c647615f31339/aiohttp-3.13.4-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:c97989ae40a9746650fa196894f317dafc12227c808c774929dda0ff873a5954", size = 518304, upload-time = "2026-03-28T17:18:10.642Z" }, + { url = "https://files.pythonhosted.org/packages/90/57/2326b37b10896447e3c6e0cbef4fe2486d30913639a5cfd1332b5d870f82/aiohttp-3.13.4-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:dae86be9811493f9990ef44fff1685f5c1a3192e9061a71a109d527944eed551", size = 1893433, upload-time = "2026-03-28T17:18:13.121Z" }, + { url = "https://files.pythonhosted.org/packages/d2/b4/a24d82112c304afdb650167ef2fe190957d81cbddac7460bedd245f765aa/aiohttp-3.13.4-cp314-cp314t-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:1db491abe852ca2fa6cc48a3341985b0174b3741838e1341b82ac82c8bd9e871", size = 1755901, upload-time = "2026-03-28T17:18:16.21Z" }, + { url = "https://files.pythonhosted.org/packages/9e/2d/0883ef9d878d7846287f036c162a951968f22aabeef3ac97b0bea6f76d5d/aiohttp-3.13.4-cp314-cp314t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:0e5d701c0aad02a7dce72eef6b93226cf3734330f1a31d69ebbf69f33b86666e", size = 1876093, upload-time = "2026-03-28T17:18:18.703Z" }, + { url = "https://files.pythonhosted.org/packages/ad/52/9204bb59c014869b71971addad6778f005daa72a96eed652c496789d7468/aiohttp-3.13.4-cp314-cp314t-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:8ac32a189081ae0a10ba18993f10f338ec94341f0d5df8fff348043962f3c6f8", size = 1970815, upload-time = "2026-03-28T17:18:21.858Z" }, + { url = "https://files.pythonhosted.org/packages/d6/b5/e4eb20275a866dde0f570f411b36c6b48f7b53edfe4f4071aa1b0728098a/aiohttp-3.13.4-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:98e968cdaba43e45c73c3f306fca418c8009a957733bac85937c9f9cf3f4de27", size = 1816223, upload-time = "2026-03-28T17:18:24.729Z" }, + { url = "https://files.pythonhosted.org/packages/d8/23/e98075c5bb146aa61a1239ee1ac7714c85e814838d6cebbe37d3fe19214a/aiohttp-3.13.4-cp314-cp314t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:ca114790c9144c335d538852612d3e43ea0f075288f4849cf4b05d6cd2238ce7", size = 1649145, upload-time = "2026-03-28T17:18:27.269Z" }, + { url = "https://files.pythonhosted.org/packages/d6/c1/7bad8be33bb06c2bb224b6468874346026092762cbec388c3bdb65a368ee/aiohttp-3.13.4-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:ea2e071661ba9cfe11eabbc81ac5376eaeb3061f6e72ec4cc86d7cdd1ffbdbbb", size = 1816562, upload-time = "2026-03-28T17:18:29.847Z" }, + { url = "https://files.pythonhosted.org/packages/5c/10/c00323348695e9a5e316825969c88463dcc24c7e9d443244b8a2c9cf2eae/aiohttp-3.13.4-cp314-cp314t-musllinux_1_2_armv7l.whl", hash = "sha256:34e89912b6c20e0fd80e07fa401fd218a410aa1ce9f1c2f1dad6db1bd0ce0927", size = 1800333, upload-time = "2026-03-28T17:18:32.269Z" }, + { url = "https://files.pythonhosted.org/packages/84/43/9b2147a1df3559f49bd723e22905b46a46c068a53adb54abdca32c4de180/aiohttp-3.13.4-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:0e217cf9f6a42908c52b46e42c568bd57adc39c9286ced31aaace614b6087965", size = 1820617, upload-time = "2026-03-28T17:18:35.238Z" }, + { url = "https://files.pythonhosted.org/packages/a9/7f/b3481a81e7a586d02e99387b18c6dafff41285f6efd3daa2124c01f87eae/aiohttp-3.13.4-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:0c296f1221e21ba979f5ac1964c3b78cfde15c5c5f855ffd2caab337e9cd9182", size = 1643417, upload-time = "2026-03-28T17:18:37.949Z" }, + { url = "https://files.pythonhosted.org/packages/8f/72/07181226bc99ce1124e0f89280f5221a82d3ae6a6d9d1973ce429d48e52b/aiohttp-3.13.4-cp314-cp314t-musllinux_1_2_s390x.whl", hash = "sha256:d99a9d168ebaffb74f36d011750e490085ac418f4db926cce3989c8fe6cb6b1b", size = 1849286, upload-time = "2026-03-28T17:18:40.534Z" }, + { url = "https://files.pythonhosted.org/packages/1a/e6/1b3566e103eca6da5be4ae6713e112a053725c584e96574caf117568ffef/aiohttp-3.13.4-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:cb19177205d93b881f3f89e6081593676043a6828f59c78c17a0fd6c1fbed2ba", size = 1782635, upload-time = "2026-03-28T17:18:43.073Z" }, + { url = "https://files.pythonhosted.org/packages/37/58/1b11c71904b8d079eb0c39fe664180dd1e14bebe5608e235d8bfbadc8929/aiohttp-3.13.4-cp314-cp314t-win32.whl", hash = "sha256:c606aa5656dab6552e52ca368e43869c916338346bfaf6304e15c58fb113ea30", size = 472537, upload-time = "2026-03-28T17:18:46.286Z" }, + { url = "https://files.pythonhosted.org/packages/bc/8f/87c56a1a1977d7dddea5b31e12189665a140fdb48a71e9038ff90bb564ec/aiohttp-3.13.4-cp314-cp314t-win_amd64.whl", hash = "sha256:014dcc10ec8ab8db681f0d68e939d1e9286a5aa2b993cbbdb0db130853e02144", size = 506381, upload-time = "2026-03-28T17:18:48.74Z" }, ] [[package]] @@ -321,7 +321,7 @@ wheels = [ [[package]] name = "anthropic" -version = "0.86.0" +version = "0.87.0" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "anyio" }, @@ -333,9 +333,9 @@ dependencies = [ { name = "sniffio" }, { name = "typing-extensions" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/37/7a/8b390dc47945d3169875d342847431e5f7d5fa716b2e37494d57cfc1db10/anthropic-0.86.0.tar.gz", hash = "sha256:60023a7e879aa4fbb1fed99d487fe407b2ebf6569603e5047cfe304cebdaa0e5", size = 583820, upload-time = "2026-03-18T18:43:08.017Z" } +sdist = { url = "https://files.pythonhosted.org/packages/d6/8f/3281edf7c35cbac169810e5388eb9b38678c7ea9867c2d331237bd5dff08/anthropic-0.87.0.tar.gz", hash = "sha256:098fef3753cdd3c0daa86f95efb9c8d03a798d45c5170329525bb4653f6702d0", size = 588982, upload-time = "2026-03-31T17:52:41.697Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/63/5f/67db29c6e5d16c8c9c4652d3efb934d89cb750cad201539141781d8eae14/anthropic-0.86.0-py3-none-any.whl", hash = "sha256:9d2bbd339446acce98858c5627d33056efe01f70435b22b63546fe7edae0cd57", size = 469400, upload-time = "2026-03-18T18:43:06.526Z" }, + { url = "https://files.pythonhosted.org/packages/0d/02/99bf351933bdea0545a2b6e2d812ed878899e9a95f618351dfa3d0de0e69/anthropic-0.87.0-py3-none-any.whl", hash = "sha256:e2669b86d42c739d3df163f873c51719552e263a3d85179297180fb4fa00a236", size = 472126, upload-time = "2026-03-31T17:52:40.174Z" }, ] [[package]] @@ -787,61 +787,61 @@ wheels = [ [[package]] name = "cryptography" -version = "46.0.5" +version = "46.0.7" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "cffi", marker = "platform_python_implementation != 'PyPy'" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/60/04/ee2a9e8542e4fa2773b81771ff8349ff19cdd56b7258a0cc442639052edb/cryptography-46.0.5.tar.gz", hash = "sha256:abace499247268e3757271b2f1e244b36b06f8515cf27c4d49468fc9eb16e93d", size = 750064, upload-time = "2026-02-10T19:18:38.255Z" } +sdist = { url = "https://files.pythonhosted.org/packages/47/93/ac8f3d5ff04d54bc814e961a43ae5b0b146154c89c61b47bb07557679b18/cryptography-46.0.7.tar.gz", hash = "sha256:e4cfd68c5f3e0bfdad0d38e023239b96a2fe84146481852dffbcca442c245aa5", size = 750652, upload-time = "2026-04-08T01:57:54.692Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/f7/81/b0bb27f2ba931a65409c6b8a8b358a7f03c0e46eceacddff55f7c84b1f3b/cryptography-46.0.5-cp311-abi3-macosx_10_9_universal2.whl", hash = "sha256:351695ada9ea9618b3500b490ad54c739860883df6c1f555e088eaf25b1bbaad", size = 7176289, upload-time = "2026-02-10T19:17:08.274Z" }, - { url = "https://files.pythonhosted.org/packages/ff/9e/6b4397a3e3d15123de3b1806ef342522393d50736c13b20ec4c9ea6693a6/cryptography-46.0.5-cp311-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:c18ff11e86df2e28854939acde2d003f7984f721eba450b56a200ad90eeb0e6b", size = 4275637, upload-time = "2026-02-10T19:17:10.53Z" }, - { url = "https://files.pythonhosted.org/packages/63/e7/471ab61099a3920b0c77852ea3f0ea611c9702f651600397ac567848b897/cryptography-46.0.5-cp311-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:4d7e3d356b8cd4ea5aff04f129d5f66ebdc7b6f8eae802b93739ed520c47c79b", size = 4424742, upload-time = "2026-02-10T19:17:12.388Z" }, - { url = "https://files.pythonhosted.org/packages/37/53/a18500f270342d66bf7e4d9f091114e31e5ee9e7375a5aba2e85a91e0044/cryptography-46.0.5-cp311-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:50bfb6925eff619c9c023b967d5b77a54e04256c4281b0e21336a130cd7fc263", size = 4277528, upload-time = "2026-02-10T19:17:13.853Z" }, - { url = "https://files.pythonhosted.org/packages/22/29/c2e812ebc38c57b40e7c583895e73c8c5adb4d1e4a0cc4c5a4fdab2b1acc/cryptography-46.0.5-cp311-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:803812e111e75d1aa73690d2facc295eaefd4439be1023fefc4995eaea2af90d", size = 4947993, upload-time = "2026-02-10T19:17:15.618Z" }, - { url = "https://files.pythonhosted.org/packages/6b/e7/237155ae19a9023de7e30ec64e5d99a9431a567407ac21170a046d22a5a3/cryptography-46.0.5-cp311-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:3ee190460e2fbe447175cda91b88b84ae8322a104fc27766ad09428754a618ed", size = 4456855, upload-time = "2026-02-10T19:17:17.221Z" }, - { url = "https://files.pythonhosted.org/packages/2d/87/fc628a7ad85b81206738abbd213b07702bcbdada1dd43f72236ef3cffbb5/cryptography-46.0.5-cp311-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:f145bba11b878005c496e93e257c1e88f154d278d2638e6450d17e0f31e558d2", size = 3984635, upload-time = "2026-02-10T19:17:18.792Z" }, - { url = "https://files.pythonhosted.org/packages/84/29/65b55622bde135aedf4565dc509d99b560ee4095e56989e815f8fd2aa910/cryptography-46.0.5-cp311-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:e9251e3be159d1020c4030bd2e5f84d6a43fe54b6c19c12f51cde9542a2817b2", size = 4277038, upload-time = "2026-02-10T19:17:20.256Z" }, - { url = "https://files.pythonhosted.org/packages/bc/36/45e76c68d7311432741faf1fbf7fac8a196a0a735ca21f504c75d37e2558/cryptography-46.0.5-cp311-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:47fb8a66058b80e509c47118ef8a75d14c455e81ac369050f20ba0d23e77fee0", size = 4912181, upload-time = "2026-02-10T19:17:21.825Z" }, - { url = "https://files.pythonhosted.org/packages/6d/1a/c1ba8fead184d6e3d5afcf03d569acac5ad063f3ac9fb7258af158f7e378/cryptography-46.0.5-cp311-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:4c3341037c136030cb46e4b1e17b7418ea4cbd9dd207e4a6f3b2b24e0d4ac731", size = 4456482, upload-time = "2026-02-10T19:17:25.133Z" }, - { url = "https://files.pythonhosted.org/packages/f9/e5/3fb22e37f66827ced3b902cf895e6a6bc1d095b5b26be26bd13c441fdf19/cryptography-46.0.5-cp311-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:890bcb4abd5a2d3f852196437129eb3667d62630333aacc13dfd470fad3aaa82", size = 4405497, upload-time = "2026-02-10T19:17:26.66Z" }, - { url = "https://files.pythonhosted.org/packages/1a/df/9d58bb32b1121a8a2f27383fabae4d63080c7ca60b9b5c88be742be04ee7/cryptography-46.0.5-cp311-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:80a8d7bfdf38f87ca30a5391c0c9ce4ed2926918e017c29ddf643d0ed2778ea1", size = 4667819, upload-time = "2026-02-10T19:17:28.569Z" }, - { url = "https://files.pythonhosted.org/packages/ea/ed/325d2a490c5e94038cdb0117da9397ece1f11201f425c4e9c57fe5b9f08b/cryptography-46.0.5-cp311-abi3-win32.whl", hash = "sha256:60ee7e19e95104d4c03871d7d7dfb3d22ef8a9b9c6778c94e1c8fcc8365afd48", size = 3028230, upload-time = "2026-02-10T19:17:30.518Z" }, - { url = "https://files.pythonhosted.org/packages/e9/5a/ac0f49e48063ab4255d9e3b79f5def51697fce1a95ea1370f03dc9db76f6/cryptography-46.0.5-cp311-abi3-win_amd64.whl", hash = "sha256:38946c54b16c885c72c4f59846be9743d699eee2b69b6988e0a00a01f46a61a4", size = 3480909, upload-time = "2026-02-10T19:17:32.083Z" }, - { url = "https://files.pythonhosted.org/packages/00/13/3d278bfa7a15a96b9dc22db5a12ad1e48a9eb3d40e1827ef66a5df75d0d0/cryptography-46.0.5-cp314-cp314t-macosx_10_9_universal2.whl", hash = "sha256:94a76daa32eb78d61339aff7952ea819b1734b46f73646a07decb40e5b3448e2", size = 7119287, upload-time = "2026-02-10T19:17:33.801Z" }, - { url = "https://files.pythonhosted.org/packages/67/c8/581a6702e14f0898a0848105cbefd20c058099e2c2d22ef4e476dfec75d7/cryptography-46.0.5-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:5be7bf2fb40769e05739dd0046e7b26f9d4670badc7b032d6ce4db64dddc0678", size = 4265728, upload-time = "2026-02-10T19:17:35.569Z" }, - { url = "https://files.pythonhosted.org/packages/dd/4a/ba1a65ce8fc65435e5a849558379896c957870dd64fecea97b1ad5f46a37/cryptography-46.0.5-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:fe346b143ff9685e40192a4960938545c699054ba11d4f9029f94751e3f71d87", size = 4408287, upload-time = "2026-02-10T19:17:36.938Z" }, - { url = "https://files.pythonhosted.org/packages/f8/67/8ffdbf7b65ed1ac224d1c2df3943553766914a8ca718747ee3871da6107e/cryptography-46.0.5-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:c69fd885df7d089548a42d5ec05be26050ebcd2283d89b3d30676eb32ff87dee", size = 4270291, upload-time = "2026-02-10T19:17:38.748Z" }, - { url = "https://files.pythonhosted.org/packages/f8/e5/f52377ee93bc2f2bba55a41a886fd208c15276ffbd2569f2ddc89d50e2c5/cryptography-46.0.5-cp314-cp314t-manylinux_2_28_ppc64le.whl", hash = "sha256:8293f3dea7fc929ef7240796ba231413afa7b68ce38fd21da2995549f5961981", size = 4927539, upload-time = "2026-02-10T19:17:40.241Z" }, - { url = "https://files.pythonhosted.org/packages/3b/02/cfe39181b02419bbbbcf3abdd16c1c5c8541f03ca8bda240debc467d5a12/cryptography-46.0.5-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:1abfdb89b41c3be0365328a410baa9df3ff8a9110fb75e7b52e66803ddabc9a9", size = 4442199, upload-time = "2026-02-10T19:17:41.789Z" }, - { url = "https://files.pythonhosted.org/packages/c0/96/2fcaeb4873e536cf71421a388a6c11b5bc846e986b2b069c79363dc1648e/cryptography-46.0.5-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:d66e421495fdb797610a08f43b05269e0a5ea7f5e652a89bfd5a7d3c1dee3648", size = 3960131, upload-time = "2026-02-10T19:17:43.379Z" }, - { url = "https://files.pythonhosted.org/packages/d8/d2/b27631f401ddd644e94c5cf33c9a4069f72011821cf3dc7309546b0642a0/cryptography-46.0.5-cp314-cp314t-manylinux_2_34_aarch64.whl", hash = "sha256:4e817a8920bfbcff8940ecfd60f23d01836408242b30f1a708d93198393a80b4", size = 4270072, upload-time = "2026-02-10T19:17:45.481Z" }, - { url = "https://files.pythonhosted.org/packages/f4/a7/60d32b0370dae0b4ebe55ffa10e8599a2a59935b5ece1b9f06edb73abdeb/cryptography-46.0.5-cp314-cp314t-manylinux_2_34_ppc64le.whl", hash = "sha256:68f68d13f2e1cb95163fa3b4db4bf9a159a418f5f6e7242564fc75fcae667fd0", size = 4892170, upload-time = "2026-02-10T19:17:46.997Z" }, - { url = "https://files.pythonhosted.org/packages/d2/b9/cf73ddf8ef1164330eb0b199a589103c363afa0cf794218c24d524a58eab/cryptography-46.0.5-cp314-cp314t-manylinux_2_34_x86_64.whl", hash = "sha256:a3d1fae9863299076f05cb8a778c467578262fae09f9dc0ee9b12eb4268ce663", size = 4441741, upload-time = "2026-02-10T19:17:48.661Z" }, - { url = "https://files.pythonhosted.org/packages/5f/eb/eee00b28c84c726fe8fa0158c65afe312d9c3b78d9d01daf700f1f6e37ff/cryptography-46.0.5-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:c4143987a42a2397f2fc3b4d7e3a7d313fbe684f67ff443999e803dd75a76826", size = 4396728, upload-time = "2026-02-10T19:17:50.058Z" }, - { url = "https://files.pythonhosted.org/packages/65/f4/6bc1a9ed5aef7145045114b75b77c2a8261b4d38717bd8dea111a63c3442/cryptography-46.0.5-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:7d731d4b107030987fd61a7f8ab512b25b53cef8f233a97379ede116f30eb67d", size = 4652001, upload-time = "2026-02-10T19:17:51.54Z" }, - { url = "https://files.pythonhosted.org/packages/86/ef/5d00ef966ddd71ac2e6951d278884a84a40ffbd88948ef0e294b214ae9e4/cryptography-46.0.5-cp314-cp314t-win32.whl", hash = "sha256:c3bcce8521d785d510b2aad26ae2c966092b7daa8f45dd8f44734a104dc0bc1a", size = 3003637, upload-time = "2026-02-10T19:17:52.997Z" }, - { url = "https://files.pythonhosted.org/packages/b7/57/f3f4160123da6d098db78350fdfd9705057aad21de7388eacb2401dceab9/cryptography-46.0.5-cp314-cp314t-win_amd64.whl", hash = "sha256:4d8ae8659ab18c65ced284993c2265910f6c9e650189d4e3f68445ef82a810e4", size = 3469487, upload-time = "2026-02-10T19:17:54.549Z" }, - { url = "https://files.pythonhosted.org/packages/e2/fa/a66aa722105ad6a458bebd64086ca2b72cdd361fed31763d20390f6f1389/cryptography-46.0.5-cp38-abi3-macosx_10_9_universal2.whl", hash = "sha256:4108d4c09fbbf2789d0c926eb4152ae1760d5a2d97612b92d508d96c861e4d31", size = 7170514, upload-time = "2026-02-10T19:17:56.267Z" }, - { url = "https://files.pythonhosted.org/packages/0f/04/c85bdeab78c8bc77b701bf0d9bdcf514c044e18a46dcff330df5448631b0/cryptography-46.0.5-cp38-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:7d1f30a86d2757199cb2d56e48cce14deddf1f9c95f1ef1b64ee91ea43fe2e18", size = 4275349, upload-time = "2026-02-10T19:17:58.419Z" }, - { url = "https://files.pythonhosted.org/packages/5c/32/9b87132a2f91ee7f5223b091dc963055503e9b442c98fc0b8a5ca765fab0/cryptography-46.0.5-cp38-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:039917b0dc418bb9f6edce8a906572d69e74bd330b0b3fea4f79dab7f8ddd235", size = 4420667, upload-time = "2026-02-10T19:18:00.619Z" }, - { url = "https://files.pythonhosted.org/packages/a1/a6/a7cb7010bec4b7c5692ca6f024150371b295ee1c108bdc1c400e4c44562b/cryptography-46.0.5-cp38-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:ba2a27ff02f48193fc4daeadf8ad2590516fa3d0adeeb34336b96f7fa64c1e3a", size = 4276980, upload-time = "2026-02-10T19:18:02.379Z" }, - { url = "https://files.pythonhosted.org/packages/8e/7c/c4f45e0eeff9b91e3f12dbd0e165fcf2a38847288fcfd889deea99fb7b6d/cryptography-46.0.5-cp38-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:61aa400dce22cb001a98014f647dc21cda08f7915ceb95df0c9eaf84b4b6af76", size = 4939143, upload-time = "2026-02-10T19:18:03.964Z" }, - { url = "https://files.pythonhosted.org/packages/37/19/e1b8f964a834eddb44fa1b9a9976f4e414cbb7aa62809b6760c8803d22d1/cryptography-46.0.5-cp38-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:3ce58ba46e1bc2aac4f7d9290223cead56743fa6ab94a5d53292ffaac6a91614", size = 4453674, upload-time = "2026-02-10T19:18:05.588Z" }, - { url = "https://files.pythonhosted.org/packages/db/ed/db15d3956f65264ca204625597c410d420e26530c4e2943e05a0d2f24d51/cryptography-46.0.5-cp38-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:420d0e909050490d04359e7fdb5ed7e667ca5c3c402b809ae2563d7e66a92229", size = 3978801, upload-time = "2026-02-10T19:18:07.167Z" }, - { url = "https://files.pythonhosted.org/packages/41/e2/df40a31d82df0a70a0daf69791f91dbb70e47644c58581d654879b382d11/cryptography-46.0.5-cp38-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:582f5fcd2afa31622f317f80426a027f30dc792e9c80ffee87b993200ea115f1", size = 4276755, upload-time = "2026-02-10T19:18:09.813Z" }, - { url = "https://files.pythonhosted.org/packages/33/45/726809d1176959f4a896b86907b98ff4391a8aa29c0aaaf9450a8a10630e/cryptography-46.0.5-cp38-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:bfd56bb4b37ed4f330b82402f6f435845a5f5648edf1ad497da51a8452d5d62d", size = 4901539, upload-time = "2026-02-10T19:18:11.263Z" }, - { url = "https://files.pythonhosted.org/packages/99/0f/a3076874e9c88ecb2ecc31382f6e7c21b428ede6f55aafa1aa272613e3cd/cryptography-46.0.5-cp38-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:a3d507bb6a513ca96ba84443226af944b0f7f47dcc9a399d110cd6146481d24c", size = 4452794, upload-time = "2026-02-10T19:18:12.914Z" }, - { url = "https://files.pythonhosted.org/packages/02/ef/ffeb542d3683d24194a38f66ca17c0a4b8bf10631feef44a7ef64e631b1a/cryptography-46.0.5-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:9f16fbdf4da055efb21c22d81b89f155f02ba420558db21288b3d0035bafd5f4", size = 4404160, upload-time = "2026-02-10T19:18:14.375Z" }, - { url = "https://files.pythonhosted.org/packages/96/93/682d2b43c1d5f1406ed048f377c0fc9fc8f7b0447a478d5c65ab3d3a66eb/cryptography-46.0.5-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:ced80795227d70549a411a4ab66e8ce307899fad2220ce5ab2f296e687eacde9", size = 4667123, upload-time = "2026-02-10T19:18:15.886Z" }, - { url = "https://files.pythonhosted.org/packages/45/2d/9c5f2926cb5300a8eefc3f4f0b3f3df39db7f7ce40c8365444c49363cbda/cryptography-46.0.5-cp38-abi3-win32.whl", hash = "sha256:02f547fce831f5096c9a567fd41bc12ca8f11df260959ecc7c3202555cc47a72", size = 3010220, upload-time = "2026-02-10T19:18:17.361Z" }, - { url = "https://files.pythonhosted.org/packages/48/ef/0c2f4a8e31018a986949d34a01115dd057bf536905dca38897bacd21fac3/cryptography-46.0.5-cp38-abi3-win_amd64.whl", hash = "sha256:556e106ee01aa13484ce9b0239bca667be5004efb0aabbed28d353df86445595", size = 3467050, upload-time = "2026-02-10T19:18:18.899Z" }, - { url = "https://files.pythonhosted.org/packages/eb/dd/2d9fdb07cebdf3d51179730afb7d5e576153c6744c3ff8fded23030c204e/cryptography-46.0.5-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:3b4995dc971c9fb83c25aa44cf45f02ba86f71ee600d81091c2f0cbae116b06c", size = 3476964, upload-time = "2026-02-10T19:18:20.687Z" }, - { url = "https://files.pythonhosted.org/packages/e9/6f/6cc6cc9955caa6eaf83660b0da2b077c7fe8ff9950a3c5e45d605038d439/cryptography-46.0.5-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl", hash = "sha256:bc84e875994c3b445871ea7181d424588171efec3e185dced958dad9e001950a", size = 4218321, upload-time = "2026-02-10T19:18:22.349Z" }, - { url = "https://files.pythonhosted.org/packages/3e/5d/c4da701939eeee699566a6c1367427ab91a8b7088cc2328c09dbee940415/cryptography-46.0.5-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl", hash = "sha256:2ae6971afd6246710480e3f15824ed3029a60fc16991db250034efd0b9fb4356", size = 4381786, upload-time = "2026-02-10T19:18:24.529Z" }, - { url = "https://files.pythonhosted.org/packages/ac/97/a538654732974a94ff96c1db621fa464f455c02d4bb7d2652f4edc21d600/cryptography-46.0.5-pp311-pypy311_pp73-manylinux_2_34_aarch64.whl", hash = "sha256:d861ee9e76ace6cf36a6a89b959ec08e7bc2493ee39d07ffe5acb23ef46d27da", size = 4217990, upload-time = "2026-02-10T19:18:25.957Z" }, - { url = "https://files.pythonhosted.org/packages/ae/11/7e500d2dd3ba891197b9efd2da5454b74336d64a7cc419aa7327ab74e5f6/cryptography-46.0.5-pp311-pypy311_pp73-manylinux_2_34_x86_64.whl", hash = "sha256:2b7a67c9cd56372f3249b39699f2ad479f6991e62ea15800973b956f4b73e257", size = 4381252, upload-time = "2026-02-10T19:18:27.496Z" }, - { url = "https://files.pythonhosted.org/packages/bc/58/6b3d24e6b9bc474a2dcdee65dfd1f008867015408a271562e4b690561a4d/cryptography-46.0.5-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:8456928655f856c6e1533ff59d5be76578a7157224dbd9ce6872f25055ab9ab7", size = 3407605, upload-time = "2026-02-10T19:18:29.233Z" }, + { url = "https://files.pythonhosted.org/packages/0b/5d/4a8f770695d73be252331e60e526291e3df0c9b27556a90a6b47bccca4c2/cryptography-46.0.7-cp311-abi3-macosx_10_9_universal2.whl", hash = "sha256:ea42cbe97209df307fdc3b155f1b6fa2577c0defa8f1f7d3be7d31d189108ad4", size = 7179869, upload-time = "2026-04-08T01:56:17.157Z" }, + { url = "https://files.pythonhosted.org/packages/5f/45/6d80dc379b0bbc1f9d1e429f42e4cb9e1d319c7a8201beffd967c516ea01/cryptography-46.0.7-cp311-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:b36a4695e29fe69215d75960b22577197aca3f7a25b9cf9d165dcfe9d80bc325", size = 4275492, upload-time = "2026-04-08T01:56:19.36Z" }, + { url = "https://files.pythonhosted.org/packages/4a/9a/1765afe9f572e239c3469f2cb429f3ba7b31878c893b246b4b2994ffe2fe/cryptography-46.0.7-cp311-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5ad9ef796328c5e3c4ceed237a183f5d41d21150f972455a9d926593a1dcb308", size = 4426670, upload-time = "2026-04-08T01:56:21.415Z" }, + { url = "https://files.pythonhosted.org/packages/8f/3e/af9246aaf23cd4ee060699adab1e47ced3f5f7e7a8ffdd339f817b446462/cryptography-46.0.7-cp311-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:73510b83623e080a2c35c62c15298096e2a5dc8d51c3b4e1740211839d0dea77", size = 4280275, upload-time = "2026-04-08T01:56:23.539Z" }, + { url = "https://files.pythonhosted.org/packages/0f/54/6bbbfc5efe86f9d71041827b793c24811a017c6ac0fd12883e4caa86b8ed/cryptography-46.0.7-cp311-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:cbd5fb06b62bd0721e1170273d3f4d5a277044c47ca27ee257025146c34cbdd1", size = 4928402, upload-time = "2026-04-08T01:56:25.624Z" }, + { url = "https://files.pythonhosted.org/packages/2d/cf/054b9d8220f81509939599c8bdbc0c408dbd2bdd41688616a20731371fe0/cryptography-46.0.7-cp311-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:420b1e4109cc95f0e5700eed79908cef9268265c773d3a66f7af1eef53d409ef", size = 4459985, upload-time = "2026-04-08T01:56:27.309Z" }, + { url = "https://files.pythonhosted.org/packages/f9/46/4e4e9c6040fb01c7467d47217d2f882daddeb8828f7df800cb806d8a2288/cryptography-46.0.7-cp311-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:24402210aa54baae71d99441d15bb5a1919c195398a87b563df84468160a65de", size = 3990652, upload-time = "2026-04-08T01:56:29.095Z" }, + { url = "https://files.pythonhosted.org/packages/36/5f/313586c3be5a2fbe87e4c9a254207b860155a8e1f3cca99f9910008e7d08/cryptography-46.0.7-cp311-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:8a469028a86f12eb7d2fe97162d0634026d92a21f3ae0ac87ed1c4a447886c83", size = 4279805, upload-time = "2026-04-08T01:56:30.928Z" }, + { url = "https://files.pythonhosted.org/packages/69/33/60dfc4595f334a2082749673386a4d05e4f0cf4df8248e63b2c3437585f2/cryptography-46.0.7-cp311-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:9694078c5d44c157ef3162e3bf3946510b857df5a3955458381d1c7cfc143ddb", size = 4892883, upload-time = "2026-04-08T01:56:32.614Z" }, + { url = "https://files.pythonhosted.org/packages/c7/0b/333ddab4270c4f5b972f980adef4faa66951a4aaf646ca067af597f15563/cryptography-46.0.7-cp311-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:42a1e5f98abb6391717978baf9f90dc28a743b7d9be7f0751a6f56a75d14065b", size = 4459756, upload-time = "2026-04-08T01:56:34.306Z" }, + { url = "https://files.pythonhosted.org/packages/d2/14/633913398b43b75f1234834170947957c6b623d1701ffc7a9600da907e89/cryptography-46.0.7-cp311-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:91bbcb08347344f810cbe49065914fe048949648f6bd5c2519f34619142bbe85", size = 4410244, upload-time = "2026-04-08T01:56:35.977Z" }, + { url = "https://files.pythonhosted.org/packages/10/f2/19ceb3b3dc14009373432af0c13f46aa08e3ce334ec6eff13492e1812ccd/cryptography-46.0.7-cp311-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:5d1c02a14ceb9148cc7816249f64f623fbfee39e8c03b3650d842ad3f34d637e", size = 4674868, upload-time = "2026-04-08T01:56:38.034Z" }, + { url = "https://files.pythonhosted.org/packages/1a/bb/a5c213c19ee94b15dfccc48f363738633a493812687f5567addbcbba9f6f/cryptography-46.0.7-cp311-abi3-win32.whl", hash = "sha256:d23c8ca48e44ee015cd0a54aeccdf9f09004eba9fc96f38c911011d9ff1bd457", size = 3026504, upload-time = "2026-04-08T01:56:39.666Z" }, + { url = "https://files.pythonhosted.org/packages/2b/02/7788f9fefa1d060ca68717c3901ae7fffa21ee087a90b7f23c7a603c32ae/cryptography-46.0.7-cp311-abi3-win_amd64.whl", hash = "sha256:397655da831414d165029da9bc483bed2fe0e75dde6a1523ec2fe63f3c46046b", size = 3488363, upload-time = "2026-04-08T01:56:41.893Z" }, + { url = "https://files.pythonhosted.org/packages/7b/56/15619b210e689c5403bb0540e4cb7dbf11a6bf42e483b7644e471a2812b3/cryptography-46.0.7-cp314-cp314t-macosx_10_9_universal2.whl", hash = "sha256:d151173275e1728cf7839aaa80c34fe550c04ddb27b34f48c232193df8db5842", size = 7119671, upload-time = "2026-04-08T01:56:44Z" }, + { url = "https://files.pythonhosted.org/packages/74/66/e3ce040721b0b5599e175ba91ab08884c75928fbeb74597dd10ef13505d2/cryptography-46.0.7-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:db0f493b9181c7820c8134437eb8b0b4792085d37dbb24da050476ccb664e59c", size = 4268551, upload-time = "2026-04-08T01:56:46.071Z" }, + { url = "https://files.pythonhosted.org/packages/03/11/5e395f961d6868269835dee1bafec6a1ac176505a167f68b7d8818431068/cryptography-46.0.7-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:ebd6daf519b9f189f85c479427bbd6e9c9037862cf8fe89ee35503bd209ed902", size = 4408887, upload-time = "2026-04-08T01:56:47.718Z" }, + { url = "https://files.pythonhosted.org/packages/40/53/8ed1cf4c3b9c8e611e7122fb56f1c32d09e1fff0f1d77e78d9ff7c82653e/cryptography-46.0.7-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:b7b412817be92117ec5ed95f880defe9cf18a832e8cafacf0a22337dc1981b4d", size = 4271354, upload-time = "2026-04-08T01:56:49.312Z" }, + { url = "https://files.pythonhosted.org/packages/50/46/cf71e26025c2e767c5609162c866a78e8a2915bbcfa408b7ca495c6140c4/cryptography-46.0.7-cp314-cp314t-manylinux_2_28_ppc64le.whl", hash = "sha256:fbfd0e5f273877695cb93baf14b185f4878128b250cc9f8e617ea0c025dfb022", size = 4905845, upload-time = "2026-04-08T01:56:50.916Z" }, + { url = "https://files.pythonhosted.org/packages/c0/ea/01276740375bac6249d0a971ebdf6b4dc9ead0ee0a34ef3b5a88c1a9b0d4/cryptography-46.0.7-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:ffca7aa1d00cf7d6469b988c581598f2259e46215e0140af408966a24cf086ce", size = 4444641, upload-time = "2026-04-08T01:56:52.882Z" }, + { url = "https://files.pythonhosted.org/packages/3d/4c/7d258f169ae71230f25d9f3d06caabcff8c3baf0978e2b7d65e0acac3827/cryptography-46.0.7-cp314-cp314t-manylinux_2_31_armv7l.whl", hash = "sha256:60627cf07e0d9274338521205899337c5d18249db56865f943cbe753aa96f40f", size = 3967749, upload-time = "2026-04-08T01:56:54.597Z" }, + { url = "https://files.pythonhosted.org/packages/b5/2a/2ea0767cad19e71b3530e4cad9605d0b5e338b6a1e72c37c9c1ceb86c333/cryptography-46.0.7-cp314-cp314t-manylinux_2_34_aarch64.whl", hash = "sha256:80406c3065e2c55d7f49a9550fe0c49b3f12e5bfff5dedb727e319e1afb9bf99", size = 4270942, upload-time = "2026-04-08T01:56:56.416Z" }, + { url = "https://files.pythonhosted.org/packages/41/3d/fe14df95a83319af25717677e956567a105bb6ab25641acaa093db79975d/cryptography-46.0.7-cp314-cp314t-manylinux_2_34_ppc64le.whl", hash = "sha256:c5b1ccd1239f48b7151a65bc6dd54bcfcc15e028c8ac126d3fada09db0e07ef1", size = 4871079, upload-time = "2026-04-08T01:56:58.31Z" }, + { url = "https://files.pythonhosted.org/packages/9c/59/4a479e0f36f8f378d397f4eab4c850b4ffb79a2f0d58704b8fa0703ddc11/cryptography-46.0.7-cp314-cp314t-manylinux_2_34_x86_64.whl", hash = "sha256:d5f7520159cd9c2154eb61eb67548ca05c5774d39e9c2c4339fd793fe7d097b2", size = 4443999, upload-time = "2026-04-08T01:57:00.508Z" }, + { url = "https://files.pythonhosted.org/packages/28/17/b59a741645822ec6d04732b43c5d35e4ef58be7bfa84a81e5ae6f05a1d33/cryptography-46.0.7-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:fcd8eac50d9138c1d7fc53a653ba60a2bee81a505f9f8850b6b2888555a45d0e", size = 4399191, upload-time = "2026-04-08T01:57:02.654Z" }, + { url = "https://files.pythonhosted.org/packages/59/6a/bb2e166d6d0e0955f1e9ff70f10ec4b2824c9cfcdb4da772c7dd69cc7d80/cryptography-46.0.7-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:65814c60f8cc400c63131584e3e1fad01235edba2614b61fbfbfa954082db0ee", size = 4655782, upload-time = "2026-04-08T01:57:04.592Z" }, + { url = "https://files.pythonhosted.org/packages/95/b6/3da51d48415bcb63b00dc17c2eff3a651b7c4fed484308d0f19b30e8cb2c/cryptography-46.0.7-cp314-cp314t-win32.whl", hash = "sha256:fdd1736fed309b4300346f88f74cd120c27c56852c3838cab416e7a166f67298", size = 3002227, upload-time = "2026-04-08T01:57:06.91Z" }, + { url = "https://files.pythonhosted.org/packages/32/a8/9f0e4ed57ec9cebe506e58db11ae472972ecb0c659e4d52bbaee80ca340a/cryptography-46.0.7-cp314-cp314t-win_amd64.whl", hash = "sha256:e06acf3c99be55aa3b516397fe42f5855597f430add9c17fa46bf2e0fb34c9bb", size = 3475332, upload-time = "2026-04-08T01:57:08.807Z" }, + { url = "https://files.pythonhosted.org/packages/a7/7f/cd42fc3614386bc0c12f0cb3c4ae1fc2bbca5c9662dfed031514911d513d/cryptography-46.0.7-cp38-abi3-macosx_10_9_universal2.whl", hash = "sha256:462ad5cb1c148a22b2e3bcc5ad52504dff325d17daf5df8d88c17dda1f75f2a4", size = 7165618, upload-time = "2026-04-08T01:57:10.645Z" }, + { url = "https://files.pythonhosted.org/packages/a5/d0/36a49f0262d2319139d2829f773f1b97ef8aef7f97e6e5bd21455e5a8fb5/cryptography-46.0.7-cp38-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:84d4cced91f0f159a7ddacad249cc077e63195c36aac40b4150e7a57e84fffe7", size = 4270628, upload-time = "2026-04-08T01:57:12.885Z" }, + { url = "https://files.pythonhosted.org/packages/8a/6c/1a42450f464dda6ffbe578a911f773e54dd48c10f9895a23a7e88b3e7db5/cryptography-46.0.7-cp38-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:128c5edfe5e5938b86b03941e94fac9ee793a94452ad1365c9fc3f4f62216832", size = 4415405, upload-time = "2026-04-08T01:57:14.923Z" }, + { url = "https://files.pythonhosted.org/packages/9a/92/4ed714dbe93a066dc1f4b4581a464d2d7dbec9046f7c8b7016f5286329e2/cryptography-46.0.7-cp38-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:5e51be372b26ef4ba3de3c167cd3d1022934bc838ae9eaad7e644986d2a3d163", size = 4272715, upload-time = "2026-04-08T01:57:16.638Z" }, + { url = "https://files.pythonhosted.org/packages/b7/e6/a26b84096eddd51494bba19111f8fffe976f6a09f132706f8f1bf03f51f7/cryptography-46.0.7-cp38-abi3-manylinux_2_28_ppc64le.whl", hash = "sha256:cdf1a610ef82abb396451862739e3fc93b071c844399e15b90726ef7470eeaf2", size = 4918400, upload-time = "2026-04-08T01:57:19.021Z" }, + { url = "https://files.pythonhosted.org/packages/c7/08/ffd537b605568a148543ac3c2b239708ae0bd635064bab41359252ef88ed/cryptography-46.0.7-cp38-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:1d25aee46d0c6f1a501adcddb2d2fee4b979381346a78558ed13e50aa8a59067", size = 4450634, upload-time = "2026-04-08T01:57:21.185Z" }, + { url = "https://files.pythonhosted.org/packages/16/01/0cd51dd86ab5b9befe0d031e276510491976c3a80e9f6e31810cce46c4ad/cryptography-46.0.7-cp38-abi3-manylinux_2_31_armv7l.whl", hash = "sha256:cdfbe22376065ffcf8be74dc9a909f032df19bc58a699456a21712d6e5eabfd0", size = 3985233, upload-time = "2026-04-08T01:57:22.862Z" }, + { url = "https://files.pythonhosted.org/packages/92/49/819d6ed3a7d9349c2939f81b500a738cb733ab62fbecdbc1e38e83d45e12/cryptography-46.0.7-cp38-abi3-manylinux_2_34_aarch64.whl", hash = "sha256:abad9dac36cbf55de6eb49badd4016806b3165d396f64925bf2999bcb67837ba", size = 4271955, upload-time = "2026-04-08T01:57:24.814Z" }, + { url = "https://files.pythonhosted.org/packages/80/07/ad9b3c56ebb95ed2473d46df0847357e01583f4c52a85754d1a55e29e4d0/cryptography-46.0.7-cp38-abi3-manylinux_2_34_ppc64le.whl", hash = "sha256:935ce7e3cfdb53e3536119a542b839bb94ec1ad081013e9ab9b7cfd478b05006", size = 4879888, upload-time = "2026-04-08T01:57:26.88Z" }, + { url = "https://files.pythonhosted.org/packages/b8/c7/201d3d58f30c4c2bdbe9b03844c291feb77c20511cc3586daf7edc12a47b/cryptography-46.0.7-cp38-abi3-manylinux_2_34_x86_64.whl", hash = "sha256:35719dc79d4730d30f1c2b6474bd6acda36ae2dfae1e3c16f2051f215df33ce0", size = 4449961, upload-time = "2026-04-08T01:57:29.068Z" }, + { url = "https://files.pythonhosted.org/packages/a5/ef/649750cbf96f3033c3c976e112265c33906f8e462291a33d77f90356548c/cryptography-46.0.7-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:7bbc6ccf49d05ac8f7d7b5e2e2c33830d4fe2061def88210a126d130d7f71a85", size = 4401696, upload-time = "2026-04-08T01:57:31.029Z" }, + { url = "https://files.pythonhosted.org/packages/41/52/a8908dcb1a389a459a29008c29966c1d552588d4ae6d43f3a1a4512e0ebe/cryptography-46.0.7-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:a1529d614f44b863a7b480c6d000fe93b59acee9c82ffa027cfadc77521a9f5e", size = 4664256, upload-time = "2026-04-08T01:57:33.144Z" }, + { url = "https://files.pythonhosted.org/packages/4b/fa/f0ab06238e899cc3fb332623f337a7364f36f4bb3f2534c2bb95a35b132c/cryptography-46.0.7-cp38-abi3-win32.whl", hash = "sha256:f247c8c1a1fb45e12586afbb436ef21ff1e80670b2861a90353d9b025583d246", size = 3013001, upload-time = "2026-04-08T01:57:34.933Z" }, + { url = "https://files.pythonhosted.org/packages/d2/f1/00ce3bde3ca542d1acd8f8cfa38e446840945aa6363f9b74746394b14127/cryptography-46.0.7-cp38-abi3-win_amd64.whl", hash = "sha256:506c4ff91eff4f82bdac7633318a526b1d1309fc07ca76a3ad182cb5b686d6d3", size = 3472985, upload-time = "2026-04-08T01:57:36.714Z" }, + { url = "https://files.pythonhosted.org/packages/63/0c/dca8abb64e7ca4f6b2978769f6fea5ad06686a190cec381f0a796fdcaaba/cryptography-46.0.7-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:fc9ab8856ae6cf7c9358430e49b368f3108f050031442eaeb6b9d87e4dcf4e4f", size = 3476879, upload-time = "2026-04-08T01:57:38.664Z" }, + { url = "https://files.pythonhosted.org/packages/3a/ea/075aac6a84b7c271578d81a2f9968acb6e273002408729f2ddff517fed4a/cryptography-46.0.7-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl", hash = "sha256:d3b99c535a9de0adced13d159c5a9cf65c325601aa30f4be08afd680643e9c15", size = 4219700, upload-time = "2026-04-08T01:57:40.625Z" }, + { url = "https://files.pythonhosted.org/packages/6c/7b/1c55db7242b5e5612b29fc7a630e91ee7a6e3c8e7bf5406d22e206875fbd/cryptography-46.0.7-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl", hash = "sha256:d02c738dacda7dc2a74d1b2b3177042009d5cab7c7079db74afc19e56ca1b455", size = 4385982, upload-time = "2026-04-08T01:57:42.725Z" }, + { url = "https://files.pythonhosted.org/packages/cb/da/9870eec4b69c63ef5925bf7d8342b7e13bc2ee3d47791461c4e49ca212f4/cryptography-46.0.7-pp311-pypy311_pp73-manylinux_2_34_aarch64.whl", hash = "sha256:04959522f938493042d595a736e7dbdff6eb6cc2339c11465b3ff89343b65f65", size = 4219115, upload-time = "2026-04-08T01:57:44.939Z" }, + { url = "https://files.pythonhosted.org/packages/f4/72/05aa5832b82dd341969e9a734d1812a6aadb088d9eb6f0430fc337cc5a8f/cryptography-46.0.7-pp311-pypy311_pp73-manylinux_2_34_x86_64.whl", hash = "sha256:3986ac1dee6def53797289999eabe84798ad7817f3e97779b5061a95b0ee4968", size = 4385479, upload-time = "2026-04-08T01:57:46.86Z" }, + { url = "https://files.pythonhosted.org/packages/20/2a/1b016902351a523aa2bd446b50a5bc1175d7a7d1cf90fe2ef904f9b84ebc/cryptography-46.0.7-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:258514877e15963bd43b558917bc9f54cf7cf866c38aa576ebf47a77ddbc43a4", size = 3412829, upload-time = "2026-04-08T01:57:48.874Z" }, ] [[package]] @@ -1573,6 +1573,7 @@ version = "0.13.0" source = { editable = "." } dependencies = [ { name = "croniter" }, + { name = "cryptography" }, { name = "fire" }, { name = "httpx", extra = ["socks"] }, { name = "jinja2" }, @@ -1758,18 +1759,19 @@ youtube = [ [package.metadata] requires-dist = [ { name = "agent-client-protocol", marker = "extra == 'acp'", specifier = "==0.9.0" }, - { name = "aiohttp", marker = "extra == 'homeassistant'", specifier = "==3.13.3" }, - { name = "aiohttp", marker = "extra == 'messaging'", specifier = "==3.13.3" }, - { name = "aiohttp", marker = "extra == 'slack'", specifier = "==3.13.3" }, - { name = "aiohttp", marker = "extra == 'sms'", specifier = "==3.13.3" }, + { name = "aiohttp", marker = "extra == 'homeassistant'", specifier = "==3.13.4" }, + { name = "aiohttp", marker = "extra == 'messaging'", specifier = "==3.13.4" }, + { name = "aiohttp", marker = "extra == 'slack'", specifier = "==3.13.4" }, + { name = "aiohttp", marker = "extra == 'sms'", specifier = "==3.13.4" }, { name = "aiohttp-socks", marker = "extra == 'matrix'", specifier = "==0.11.0" }, { name = "aiosqlite", marker = "extra == 'matrix'", specifier = "==0.22.1" }, { name = "alibabacloud-dingtalk", marker = "extra == 'dingtalk'", specifier = "==2.2.42" }, - { name = "anthropic", marker = "extra == 'anthropic'", specifier = "==0.86.0" }, + { name = "anthropic", marker = "extra == 'anthropic'", specifier = "==0.87.0" }, { name = "asyncpg", marker = "extra == 'matrix'", specifier = "==0.31.0" }, { name = "boto3", marker = "extra == 'bedrock'", specifier = "==1.42.89" }, { name = "brotlicffi", marker = "extra == 'messaging'", specifier = "==1.2.0.1" }, { name = "croniter", specifier = "==6.0.0" }, + { name = "cryptography", specifier = "==46.0.7" }, { name = "daytona", marker = "extra == 'daytona'", specifier = "==0.155.0" }, { name = "debugpy", marker = "extra == 'dev'", specifier = "==1.8.20" }, { name = "dingtalk-stream", marker = "extra == 'dingtalk'", specifier = "==0.24.3" }, diff --git a/website/docs/developer-guide/programmatic-integration.md b/website/docs/developer-guide/programmatic-integration.md new file mode 100644 index 00000000000..1ad0b13ef91 --- /dev/null +++ b/website/docs/developer-guide/programmatic-integration.md @@ -0,0 +1,126 @@ +--- +sidebar_position: 8 +title: "Programmatic Integration" +description: "Three protocols for driving hermes-agent from external programs: ACP, the TUI gateway JSON-RPC, and the OpenAI-compatible HTTP API" +--- + +# Programmatic Integration + +Hermes ships three protocols for driving the agent from external programs — IDE plugins, custom UIs, CI pipelines, embedded sub-agents. Pick the one that matches your transport and consumer. + +| Protocol | Transport | Best for | Defined by | +|----------|-----------|----------|------------| +| **ACP** | JSON-RPC over stdio | IDE clients (VS Code, Zed, JetBrains) that already speak the [Agent Client Protocol](https://github.com/zed-industries/agent-client-protocol) | `acp_adapter/` | +| **TUI gateway** | JSON-RPC over stdio (or WebSocket) | Custom hosts that want fine-grained control of sessions, slash commands, approvals, and streaming events | `tui_gateway/server.py` | +| **API server** | HTTP + Server-Sent Events | OpenAI-compatible frontends (Open WebUI, LobeChat, LibreChat…) and language-agnostic web clients | `gateway/platforms/api_server.py` | + +All three drive the same `AIAgent` core. They differ only in wire format and which set of features they expose. + +--- + +## ACP (Agent Client Protocol) + +`hermes acp` starts a stdio JSON-RPC server speaking ACP. Used in production by VS Code (Zed Industries' ACP extension), Zed, and any JetBrains IDE with an ACP plugin. + +Capabilities exposed: session creation, prompt submission, streaming agent message chunks, tool-call events, permission requests, session fork, cancel, and authentication. Tool output is rendered into ACP `Diff`/`ToolCall` content blocks the IDE understands. + +Full lifecycle, event bridge, and approval flow: [ACP Internals](./acp-internals). + +```bash +hermes acp # serve ACP on stdio +hermes acp --bootstrap # print install snippet for an ACP-capable IDE +``` + +--- + +## TUI Gateway JSON-RPC + +`tui_gateway/server.py` is the protocol the Ink TUI (`hermes --tui`) and the embedded dashboard PTY bridge talk to. Any external host can speak the same protocol over stdio (or WebSocket via `tui_gateway/ws.py`). + +### Method catalog (selected) + +``` +prompt.submit prompt.background session.steer +session.create session.list session.interrupt +session.history session.compress session.branch +session.title session.usage session.status +clarify.respond sudo.respond secret.respond +approval.respond config.set / config.get commands.catalog +command.resolve command.dispatch cli.exec +reload.mcp reload.env process.stop +delegation.status subagent.interrupt spawn_tree.save / list / load +terminal.resize clipboard.paste image.attach +``` + +### Events streamed back + +`message.delta`, `message.complete`, `tool.start`, `tool.progress`, `tool.complete`, `approval.request`, `clarify.request`, `sudo.request`, `secret.request`, `gateway.ready`, plus session lifecycle and error events. + +### Pi-style RPC mapping + +Every command in the Pi-mono RPC spec ([issue #360](https://github.com/NousResearch/hermes-agent/issues/360)) has a TUI-gateway equivalent: + +| Pi command | Hermes equivalent | +|------------|-------------------| +| `prompt` | `prompt.submit` (or ACP `session/prompt`) | +| `steer` | `session.steer` | +| `follow_up` | `prompt.submit` queued after current turn | +| `abort` | `session.interrupt` | +| `set_model` | `command.dispatch` for `/model ` (mid-session, persistent) | +| `compact` | `session.compress` | +| `get_state` | `session.status` | +| `get_messages` | `session.history` | +| `switch_session` | `session.resume` | +| `fork` | `session.branch` | +| `ui_request` / `ui_response` | `clarify.respond` / `sudo.respond` / `secret.respond` / `approval.respond` | + +--- + +## OpenAI-Compatible API Server + +`gateway/platforms/api_server.py` exposes hermes over HTTP for any client that already speaks the OpenAI format. Useful when you want a web frontend, a curl-driven CI runner, or a non-Python consumer. + +Endpoints: + +``` +POST /v1/chat/completions OpenAI Chat Completions (streaming via SSE) +POST /v1/responses OpenAI Responses API (stateful) +POST /v1/runs Start a run, returns run_id (202) +GET /v1/runs/{id} Run status +GET /v1/runs/{id}/events SSE stream of lifecycle events +POST /v1/runs/{id}/approval Resolve a pending approval +POST /v1/runs/{id}/stop Interrupt the run +GET /v1/capabilities Machine-readable feature flags +GET /v1/models Lists hermes-agent +GET /health, /health/detailed +``` + +Setup, headers (`X-Hermes-Session-Id`, `X-Hermes-Session-Key`), and frontend wiring: [API Server](../user-guide/features/api-server). + +--- + +## Which one should I use? + +- **You're writing an IDE plugin and the IDE already speaks ACP** → ACP. Zero protocol work on the IDE side. +- **You're writing a custom desktop / web / TUI host and want every Hermes feature** (slash commands, approvals, clarify, multi-agent, session branching) → TUI gateway JSON-RPC. +- **You want any OpenAI-compatible frontend, a language-agnostic HTTP client, or curl-driven automation** → API server. +- **You want a Python in-process embed without a subprocess** → import `run_agent.AIAgent` directly. See [Agent Loop](./agent-loop). + +--- + +## Model hot-swapping + +Mid-session model switching works on every surface — it's the `/model` slash command under the hood. + +- **CLI / TUI:** `/model claude-sonnet-4` or `/model openrouter:anthropic/claude-sonnet-4.6` +- **TUI gateway RPC:** `command.dispatch` with `{"command": "/model claude-sonnet-4"}` +- **ACP:** the IDE sends the slash command as a prompt; the agent dispatches it +- **API server:** include a `model` field in the request body or set `X-Hermes-Model` + +Provider-aware resolution (the same model name picks the right format for whatever provider you're on) is built in. See `hermes_cli/model_switch.py`. + +--- + +## A note on `--mode rpc` + +Hermes does not have a `--mode rpc` flag. The three protocols above already cover the use cases — ACP for IDE-protocol clients, the TUI gateway for stdio JSON-RPC hosts, and the API server for HTTP. If you find a real gap that none of them fill, open an issue with the concrete consumer you're building. diff --git a/website/docs/guides/build-a-hermes-plugin.md b/website/docs/guides/build-a-hermes-plugin.md index ee74e23ac5e..3135c68daaf 100644 --- a/website/docs/guides/build-a-hermes-plugin.md +++ b/website/docs/guides/build-a-hermes-plugin.md @@ -465,6 +465,30 @@ ctx.register_tool( ) ``` +### Overriding a built-in tool + +To replace a built-in tool with your own implementation (e.g. swap the +default browser tool for a headed-Chrome CDP backend, or replace +`web_search` with a custom corporate index), pass `override=True`: + +```python +def register(ctx): + ctx.register_tool( + name="browser_navigate", # same name as the built-in + toolset="plugin_my_browser", # your own toolset namespace + schema={...}, + handler=my_custom_navigate, + override=True, # explicit opt-in + ) +``` + +Without `override=True`, the registry rejects any registration that would +shadow an existing tool from a different toolset — this prevents +accidental overwrites. The override is logged at INFO level so it's +auditable in `~/.hermes/logs/agent.log`. Plugins load after built-in +tools, so the registration order is correct: your handler replaces the +built-in one. + ### Register multiple hooks ```python diff --git a/website/docs/guides/xai-grok-oauth.md b/website/docs/guides/xai-grok-oauth.md index 67d31c929ad..d85aa4c64bf 100644 --- a/website/docs/guides/xai-grok-oauth.md +++ b/website/docs/guides/xai-grok-oauth.md @@ -128,7 +128,7 @@ hermes --provider x-ai-oauth # alias hermes --provider xai-grok-oauth # alias ``` -## Direct-to-xAI Tools (TTS / Image / Video / Transcription) +## Direct-to-xAI Tools (TTS / Image / Video / Transcription / X Search) Once you're logged in via OAuth, every direct-to-xAI tool reuses the same bearer token automatically — there is **no separate setup** unless you'd rather use an API key. @@ -139,6 +139,7 @@ hermes tools # → Text-to-Speech → "xAI TTS" # → Image Generation → "xAI Grok Imagine (image)" # → Video Generation → "xAI Grok Imagine" +# → X (Twitter) Search → "xAI Grok OAuth (SuperGrok Subscription)" ``` If OAuth tokens are already stored, the picker confirms it and skips the credential prompt. If neither OAuth nor `XAI_API_KEY` is set, the picker offers a 3-choice menu: OAuth login, paste API key, or skip. @@ -147,6 +148,10 @@ If OAuth tokens are already stored, the picker confirms it and skips the credent The `video_gen` toolset is disabled by default. Enable it in `hermes tools` → `🎬 Video Generation` (press space) before the agent can call `video_generate`. Otherwise the agent may fall back to the bundled ComfyUI skill, which is also tagged for video generation. ::: +:::note X search is off by default +The `x_search` toolset is disabled by default. Enable it in `hermes tools` → `🐦 X (Twitter) Search` (press space) before the agent can call `x_search`. The tool routes through xAI's built-in `x_search` Responses API — it works with **either** your SuperGrok OAuth login or a paid `XAI_API_KEY`, and prefers OAuth when both are configured (uses your subscription quota instead of API spend). The tool schema is hidden from the model when no xAI credentials are configured, regardless of whether the toolset is enabled. +::: + ### Models | Tool | Model | Notes | diff --git a/website/docs/reference/mcp-config-reference.md b/website/docs/reference/mcp-config-reference.md index a87478f91fa..ecd6ad2c1a4 100644 --- a/website/docs/reference/mcp-config-reference.md +++ b/website/docs/reference/mcp-config-reference.md @@ -28,6 +28,7 @@ mcp_servers: enabled: true timeout: 120 connect_timeout: 60 + supports_parallel_tool_calls: false tools: include: [] exclude: [] @@ -47,6 +48,7 @@ mcp_servers: | `enabled` | bool | both | Skip the server entirely when false | | `timeout` | number | both | Tool call timeout | | `connect_timeout` | number | both | Initial connection timeout | +| `supports_parallel_tool_calls` | bool | both | Allow tools from this server to run concurrently | | `tools` | mapping | both | Filtering and utility-tool policy | | `auth` | string | HTTP | Authentication method. Set to `oauth` to enable OAuth 2.1 with PKCE | | `sampling` | mapping | both | Server-initiated LLM request policy (see MCP guide) | diff --git a/website/docs/reference/optional-skills-catalog.md b/website/docs/reference/optional-skills-catalog.md index d5839f846d1..ce1861431a6 100644 --- a/website/docs/reference/optional-skills-catalog.md +++ b/website/docs/reference/optional-skills-catalog.md @@ -64,6 +64,7 @@ hermes skills uninstall |-------|-------------| | [**inference-sh-cli**](/docs/user-guide/skills/optional/devops/devops-cli) | Run 150+ AI apps via inference.sh CLI (infsh) — image generation, video creation, LLMs, search, 3D, social automation. Uses the terminal tool. Triggers: inference.sh, infsh, ai apps, flux, veo, image generation, video generation, seedrea... | | [**docker-management**](/docs/user-guide/skills/optional/devops/devops-docker-management) | Manage Docker containers, images, volumes, networks, and Compose stacks — lifecycle ops, debugging, cleanup, and Dockerfile optimization. | +| [**pinggy-tunnel**](/docs/user-guide/skills/optional/devops/devops-pinggy-tunnel) | Zero-install localhost tunnels over SSH via Pinggy. | | [**watchers**](/docs/user-guide/skills/optional/devops/devops-watchers) | Poll RSS, JSON APIs, and GitHub with watermark dedup. | ## dogfood @@ -161,10 +162,12 @@ hermes skills uninstall | Skill | Description | |-------|-------------| | [**bioinformatics**](/docs/user-guide/skills/optional/research/research-bioinformatics) | Gateway to 400+ bioinformatics skills from bioSkills and ClawBio. Covers genomics, transcriptomics, single-cell, variant calling, pharmacogenomics, metagenomics, structural biology, and more. Fetches domain-specific reference material on... | +| [**darwinian-evolver**](/docs/user-guide/skills/optional/research/research-darwinian-evolver) | Evolve prompts/regex/SQL/code with Imbue's evolution loop. | | [**domain-intel**](/docs/user-guide/skills/optional/research/research-domain-intel) | Passive domain reconnaissance using Python stdlib. Subdomain discovery, SSL certificate inspection, WHOIS lookups, DNS records, domain availability checks, and bulk multi-domain analysis. No API keys required. | | [**drug-discovery**](/docs/user-guide/skills/optional/research/research-drug-discovery) | Pharmaceutical research assistant for drug discovery workflows. Search bioactive compounds on ChEMBL, calculate drug-likeness (Lipinski Ro5, QED, TPSA, synthetic accessibility), look up drug-drug interactions via OpenFDA, interpret ADMET... | | [**duckduckgo-search**](/docs/user-guide/skills/optional/research/research-duckduckgo-search) | Free web search via DuckDuckGo — text, news, images, videos. No API key needed. Prefer the `ddgs` CLI when installed; use the Python DDGS library only after verifying that `ddgs` is available in the current runtime. | | [**gitnexus-explorer**](/docs/user-guide/skills/optional/research/research-gitnexus-explorer) | Index a codebase with GitNexus and serve an interactive knowledge graph via web UI + Cloudflare tunnel. | +| [**osint-investigation**](/docs/user-guide/skills/optional/research/research-osint-investigation) | Public-records OSINT investigation framework — SEC EDGAR filings, USAspending contracts, Senate lobbying, OFAC sanctions, ICIJ offshore leaks, NYC property records (ACRIS), OpenCorporates registries, CourtListener court records, Wayback... | | [**parallel-cli**](/docs/user-guide/skills/optional/research/research-parallel-cli) | Optional vendor skill for Parallel CLI — agent-native web search, extraction, deep research, enrichment, FindAll, and monitoring. Prefer JSON output and non-interactive flows. | | [**qmd**](/docs/user-guide/skills/optional/research/research-qmd) | Search personal knowledge bases, notes, docs, and meeting transcripts locally using qmd — a hybrid retrieval engine with BM25, vector search, and LLM reranking. Supports CLI and MCP integration. | | [**scrapling**](/docs/user-guide/skills/optional/research/research-scrapling) | Web scraping with Scrapling - HTTP fetching, stealth browser automation, Cloudflare bypass, and spider crawling via CLI and Python. | diff --git a/website/docs/reference/tools-reference.md b/website/docs/reference/tools-reference.md index 03930264f8c..507bd307afb 100644 --- a/website/docs/reference/tools-reference.md +++ b/website/docs/reference/tools-reference.md @@ -196,6 +196,12 @@ Opt-in toolset (not loaded in the default `hermes-cli` set). Add via `--toolsets | `web_search` | Search the web for information. Returns up to 5 results by default with titles, URLs, and descriptions. Accepts an optional `limit` (1-100, default 5). The query is passed through to the configured backend, so operators such as `site:domain`, `filetype:pdf`, `intitle:word`, `-term`, and `"exact phrase"` may work when the backend supports them. | EXA_API_KEY or PARALLEL_API_KEY or FIRECRAWL_API_KEY or TAVILY_API_KEY | | `web_extract` | Extract content from web page URLs. Returns page content in markdown format. Also works with PDF URLs — pass the PDF link directly and it converts to markdown text. Pages under 5000 chars return full markdown; larger pages are LLM-summarized. | EXA_API_KEY or PARALLEL_API_KEY or FIRECRAWL_API_KEY or TAVILY_API_KEY | +## `x_search` toolset + +| Tool | Description | Requires environment | +|------|-------------|----------------------| +| `x_search` | Search X (Twitter) posts, profiles, and threads using xAI's built-in `x_search` Responses tool. Use this for current discussion, reactions, or claims on X rather than general web pages. Off by default — opt in via `hermes tools` → 🐦 X (Twitter) Search. Schema is only registered when xAI credentials are configured (check_fn-gated). | XAI_API_KEY **or** xAI Grok OAuth (SuperGrok Subscription) login | + ## `tts` toolset | Tool | Description | Requires environment | diff --git a/website/docs/reference/toolsets-reference.md b/website/docs/reference/toolsets-reference.md index 5bf1f14260e..61b51e4e30e 100644 --- a/website/docs/reference/toolsets-reference.md +++ b/website/docs/reference/toolsets-reference.md @@ -82,6 +82,7 @@ Or in-session: | `vision` | `vision_analyze` | Image analysis via vision-capable models. | | `video` | `video_analyze` | Video analysis and understanding tools (opt-in, not in the default toolset — add explicitly via `--toolsets`). | | `web` | `web_extract`, `web_search` | Web search and page content extraction. | +| `x_search` | `x_search` | Search X (Twitter) posts and threads via xAI's built-in `x_search` Responses tool. Off by default; opt in via `hermes tools`. Schema only registered when xAI credentials (SuperGrok OAuth or `XAI_API_KEY`) are configured. | | `yuanbao` | `yb_query_group_info`, `yb_query_group_members`, `yb_search_sticker`, `yb_send_dm`, `yb_send_sticker` | Yuanbao DM/group actions and sticker search. Registered only on `hermes-yuanbao`. | ## Platform Toolsets diff --git a/website/docs/user-guide/features/mcp.md b/website/docs/user-guide/features/mcp.md index b136af15c66..c1711a9f3ae 100644 --- a/website/docs/user-guide/features/mcp.md +++ b/website/docs/user-guide/features/mcp.md @@ -105,6 +105,7 @@ Hermes reads MCP config from `~/.hermes/config.yaml` under `mcp_servers`. | `timeout` | number | Tool call timeout | | `connect_timeout` | number | Initial connection timeout | | `enabled` | bool | If `false`, Hermes skips the server entirely | +| `supports_parallel_tool_calls` | bool | If `true`, tools from this server may run concurrently | | `tools` | mapping | Per-server tool filtering and utility policy | ### Minimal stdio example @@ -409,6 +410,23 @@ Because Hermes now only registers those wrappers when both are true: This is intentional and keeps the tool list honest. +## Parallel Tool Calls + +By default, MCP tools run sequentially — one at a time. If your MCP server exposes tools that are safe to run concurrently (e.g. read-only queries, independent API calls), you can opt-in to parallel execution: + +```yaml +mcp_servers: + docs: + command: "docs-server" + supports_parallel_tool_calls: true +``` + +When `supports_parallel_tool_calls` is `true`, Hermes may execute multiple tools from that server at the same time within a single tool-call batch, just like it does for built-in read-only tools (web_search, read_file, etc.). + +:::caution +Only enable parallel calls for MCP servers whose tools are safe to run at the same time. If tools read and write shared state, files, databases, or external resources, review the read/write race conditions before enabling this setting. +::: + ## MCP Sampling Support MCP servers can request LLM inference from Hermes via the `sampling/createMessage` protocol. This allows an MCP server to ask Hermes to generate text on its behalf — useful for servers that need LLM capabilities but don't have their own model access. diff --git a/website/docs/user-guide/features/tools.md b/website/docs/user-guide/features/tools.md index 9f9eddbb513..0c5dd30cb2c 100644 --- a/website/docs/user-guide/features/tools.md +++ b/website/docs/user-guide/features/tools.md @@ -21,6 +21,7 @@ High-level categories: | Category | Examples | Description | |----------|----------|-------------| | **Web** | `web_search`, `web_extract` | Search the web and extract page content. | +| **X Search** | `x_search` | Search X (Twitter) posts and threads via xAI's built-in `x_search` Responses tool — gated on xAI credentials (SuperGrok OAuth or `XAI_API_KEY`); off by default, opt in via `hermes tools` → 🐦 X (Twitter) Search. | | **Terminal & Files** | `terminal`, `process`, `read_file`, `patch` | Execute commands and manipulate files. | | **Browser** | `browser_navigate`, `browser_snapshot`, `browser_vision` | Interactive browser automation with text and vision support. | | **Media** | `vision_analyze`, `image_generate`, `text_to_speech` | Multimodal analysis and generation. | diff --git a/website/docs/user-guide/features/x-search.md b/website/docs/user-guide/features/x-search.md new file mode 100644 index 00000000000..c01bb8adf6d --- /dev/null +++ b/website/docs/user-guide/features/x-search.md @@ -0,0 +1,117 @@ +--- +title: X (Twitter) Search +description: Search X (Twitter) posts and threads from within the agent using xAI's built-in x_search Responses tool — works with either a SuperGrok OAuth login or an XAI_API_KEY. +sidebar_label: X (Twitter) Search +sidebar_position: 7 +--- + +# X (Twitter) Search + +The `x_search` tool lets the agent search X (Twitter) posts, profiles, and threads directly. It's backed by xAI's built-in `x_search` tool on the Responses API at `https://api.x.ai/v1/responses` — Grok itself runs the search server-side and returns synthesized results with citations to the originating posts. + +**Use this instead of `web_search`** when you specifically want current discussion, reactions, or claims **on X**. For general web pages, keep using `web_search` / `web_extract`. + +## Authentication + +`x_search` registers when **either** xAI credential path is available: + +| Credential | Source | Setup | +|------------|--------|-------| +| **SuperGrok OAuth** (preferred) | Browser login at `accounts.x.ai`, refreshed automatically | `hermes auth add xai-oauth` — see [xAI Grok OAuth (SuperGrok Subscription)](../../guides/xai-grok-oauth.md) | +| **`XAI_API_KEY`** | Paid xAI API key | Set in `~/.hermes/.env` | + +Both hit the same endpoint with the same payload — the only difference is the bearer token. **When both are configured, SuperGrok OAuth wins** so x_search runs against your subscription quota instead of paid API spend. + +The tool's `check_fn` runs the xAI credential resolver every time the model's tool list is rebuilt. A `True` return means the bearer is fetchable AND non-empty AND (if it had expired) successfully refreshed. Revoked tokens with a failed refresh hide the tool from the schema; the model simply can't see it. + +## Enabling the tool + +Off by default. Enable in `hermes tools`: + +```bash +hermes tools +# → 🐦 X (Twitter) Search (press space to toggle on) +``` + +The picker offers two credential choices: + +1. **xAI Grok OAuth (SuperGrok Subscription)** — opens the browser to `accounts.x.ai` if you're not already logged in +2. **xAI API key** — prompts for `XAI_API_KEY` + +Either choice satisfies the gating. You can pick whichever credentials you already have; the tool works identically with both. If both end up configured, OAuth is preferred at call time. + +## Configuration + +```yaml +# ~/.hermes/config.yaml +x_search: + # xAI model used for the Responses call. + # grok-4.20-reasoning is the recommended default; any Grok model + # with x_search tool access works. + model: grok-4.20-reasoning + + # Request timeout in seconds. x_search can take 60–120s for + # complex queries — the default is generous. Minimum: 30. + timeout_seconds: 180 + + # Number of automatic retries on 5xx / ReadTimeout / ConnectionError. + # Each retry backs off (1.5x attempt seconds, capped at 5s). + retries: 2 +``` + +## Tool parameters + +The agent calls `x_search` with these arguments: + +| Parameter | Type | Description | +|-----------|------|-------------| +| `query` | string (required) | What to look up on X. | +| `allowed_x_handles` | string array | Optional list of handles to include **exclusively** (max 10). Leading `@` is stripped. | +| `excluded_x_handles` | string array | Optional list of handles to exclude (max 10). Mutually exclusive with `allowed_x_handles`. | +| `from_date` | string | Optional `YYYY-MM-DD` start date. | +| `to_date` | string | Optional `YYYY-MM-DD` end date. | +| `enable_image_understanding` | boolean | Ask xAI to analyze images attached to matching posts. | +| `enable_video_understanding` | boolean | Ask xAI to analyze videos attached to matching posts. | + +The tool returns JSON with: + +- `answer` — synthesized text response from Grok +- `citations` — citations returned by the Responses API top-level field +- `inline_citations` — `url_citation` annotations extracted from the message body (each with `url`, `title`, `start_index`, `end_index`) +- `credential_source` — `"xai-oauth"` if OAuth resolved, `"xai"` if API key resolved +- `model`, `query`, `provider`, `tool`, `success` + +## Example + +Talking to the agent: + +> What are people on X saying about the new Grok image features? Focus on responses from @xai. + +The agent will: + +1. Call `x_search` with `query="reactions to new Grok image features"`, `allowed_x_handles=["xai"]` +2. Get back a synthesized answer plus a list of citations linking to specific posts +3. Reply with the answer and references + +## Troubleshooting + +### "No xAI credentials available" + +The tool surfaces this when both auth paths fail. Either set `XAI_API_KEY` in `~/.hermes/.env` or run `hermes auth add xai-oauth` and complete the browser login. Then restart your session so the agent re-reads the tool registry. + +### "`x_search` is not enabled for this model" + +The configured `x_search.model` doesn't have access to the server-side `x_search` tool. Switch to `grok-4.20-reasoning` (the default) or another Grok model that supports it. Check the [xAI documentation](https://docs.x.ai/) for the current list. + +### Tool doesn't appear in the schema + +Two possible causes: + +1. **Toolset not enabled.** Run `hermes tools` and confirm `🐦 X (Twitter) Search` is checked. +2. **No xAI credentials.** The check_fn returns False, so the schema stays hidden. Run `hermes auth status` to confirm xai-oauth login state, and check that `XAI_API_KEY` is set (if you're using the API-key path). + +## See Also + +- [xAI Grok OAuth (SuperGrok Subscription)](../../guides/xai-grok-oauth.md) — the OAuth setup guide +- [Web Search & Extract](web-search.md) — for general (non-X) web search +- [Tools Reference](../../reference/tools-reference.md) — full tool catalog diff --git a/website/docs/user-guide/skills/optional/devops/devops-pinggy-tunnel.md b/website/docs/user-guide/skills/optional/devops/devops-pinggy-tunnel.md new file mode 100644 index 00000000000..19f431f1967 --- /dev/null +++ b/website/docs/user-guide/skills/optional/devops/devops-pinggy-tunnel.md @@ -0,0 +1,327 @@ +--- +title: "Pinggy Tunnel — Zero-install localhost tunnels over SSH via Pinggy" +sidebar_label: "Pinggy Tunnel" +description: "Zero-install localhost tunnels over SSH via Pinggy" +--- + +{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */} + +# Pinggy Tunnel + +Zero-install localhost tunnels over SSH via Pinggy. + +## Skill metadata + +| | | +|---|---| +| Source | Optional — install with `hermes skills install official/devops/pinggy-tunnel` | +| Path | `optional-skills/devops/pinggy-tunnel` | +| Version | `0.1.0` | +| Author | Teknium (teknium1), Hermes Agent | +| License | MIT | +| Platforms | linux, macos, windows | +| Tags | `Pinggy`, `Tunnel`, `Networking`, `SSH`, `Webhook`, `Localhost` | +| Related skills | `cloudflared-quick-tunnel`, [`webhook-subscriptions`](/docs/user-guide/skills/bundled/devops/devops-webhook-subscriptions) | + +## Reference: full SKILL.md + +:::info +The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active. +::: + +# Pinggy Tunnel Skill + +Expose a local service (dev server, webhook receiver, MCP endpoint, demo) to the public internet using a Pinggy SSH reverse tunnel. No daemon to install — the user's stock SSH client connects to `a.pinggy.io:443` and Pinggy hands back a public HTTP/HTTPS URL. + +Free tier: 60-minute tunnels, random subdomain, no signup. Pro tier ($3/mo) is an opt-in with a token. + +## When to Use + +- User asks to "expose this locally", "share my dev server", "make this URL public", "tunnel port N", "get a public URL for a webhook" +- Need to receive a webhook callback during a local task (Stripe, GitHub, Discord, AgentMail) +- Sharing a one-off HTTP demo (MCP server, Ollama/vLLM endpoint, dashboard) with a remote party +- The host has SSH but no `cloudflared` / `ngrok` binary, and installing one would be overkill + +If the host already has `cloudflared` configured, prefer the `cloudflared-quick-tunnel` skill — Cloudflare quick tunnels don't expire after 60 minutes. + +## Prerequisites + +- `ssh` on PATH (`ssh -V`). Default on Linux, macOS, and Windows 10+. No other install. +- A local service listening on `127.0.0.1:` before the tunnel starts. Pinggy will return URLs but they'll 502 until the local origin is up. + +Optional: + +- `PINGGY_TOKEN` env var for paid Pro features (persistent subdomain, custom domain, multiple tunnels, no 60-minute cap). Free tier needs no credentials. + +## Quick Reference + +```bash +# Plain HTTP/HTTPS tunnel for port 8000 (free tier) +ssh -p 443 -o StrictHostKeyChecking=no -o ServerAliveInterval=30 \ + -R0:localhost:8000 free@a.pinggy.io + +# TCP tunnel (databases, raw SSH, etc.) +ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:5432 tcp@a.pinggy.io + +# TLS tunnel (Pinggy can't decrypt — bring your own certs at origin) +ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:443 tls@a.pinggy.io + +# Basic auth gate (b:user:pass) +ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:8000 \ + "b:admin:secret+free@a.pinggy.io" + +# Bearer token gate (k:token) +ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:8000 \ + "k:mysecrettoken+free@a.pinggy.io" + +# IP whitelist (w:CIDR) +ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:8000 \ + "w:203.0.113.0/24+free@a.pinggy.io" + +# Enable CORS + force HTTPS redirect +ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:8000 \ + "co+x:https+free@a.pinggy.io" + +# Pro tier (persistent URL, no 60-min cap) +ssh -p 443 -o StrictHostKeyChecking=no -R0:localhost:8000 "$PINGGY_TOKEN+a.pinggy.io" +``` + +## Procedure — Start a Tunnel and Get the URL + +The model SHOULD use the `terminal` tool. The tunnel must stay alive for the duration of the share, so run it as a background process and parse the public URL from stdout. + +### 1. Confirm a local origin is up + +```bash +curl -sI http://127.0.0.1:8000/ | head -1 +# expect HTTP/1.x 200 (or any non-connection-refused response) +``` + +If nothing is listening yet, start it first (e.g. `python3 -m http.server 8000 --bind 127.0.0.1`). Pinggy will happily return a URL pointed at nothing — the user will see 502 until the origin comes up. + +### 2. Launch the tunnel as a background process + +Use `terminal(background=True)` and capture output to a logfile (Pinggy prints the URLs on stdout, then keeps the connection open): + +```bash +LOG=/tmp/pinggy-8000.log +nohup ssh -p 443 \ + -o StrictHostKeyChecking=no \ + -o UserKnownHostsFile=/dev/null \ + -o ServerAliveInterval=30 \ + -o ServerAliveCountMax=3 \ + -R0:localhost:8000 free@a.pinggy.io \ + > "$LOG" 2>&1 & +echo $! > /tmp/pinggy-8000.pid +``` + +`StrictHostKeyChecking=no` + `UserKnownHostsFile=/dev/null` skips the first-run host-key prompt. `ServerAliveInterval=30` keeps the SSH session from getting torn down by an idle NAT. + +### 3. Parse the URL out of the log + +```bash +sleep 4 +grep -oE 'https://[a-z0-9-]+\.[a-z]+\.pinggy\.link' /tmp/pinggy-8000.log | head -1 +``` + +Expected output looks like: + +``` +You are not authenticated. +Your tunnel will expire in 60 minutes. +http://yqycl-98-162-69-48.a.free.pinggy.link +https://yqycl-98-162-69-48.a.free.pinggy.link +``` + +Hand the `https://...pinggy.link` URL to the user. + +### 4. Verify + +```bash +curl -sI https:/// | head -3 +# expect 200/302/whatever the local origin actually returns +``` + +If you get `502 Bad Gateway`, the SSH session is up but the local origin isn't listening — fix step 1 first. + +### 5. Teardown + +```bash +kill "$(cat /tmp/pinggy-8000.pid)" +# or, if the pid file got lost: +pkill -f 'ssh -p 443 .* free@a\.pinggy\.io' +``` + +If you have a session_id from `terminal(background=True)`, prefer `process(action='kill', session_id=...)`. + +## Access Control via Username Keywords + +Pinggy stacks control flags into the SSH username separated by `+`. Always quote the whole `user@host` argument when it contains a `+`: + +| Keyword | Effect | +|---------|--------| +| `b:user:pass` | HTTP Basic auth gate | +| `k:token` | Bearer-token header gate (`Authorization: Bearer `) | +| `w:CIDR` | IP whitelist (single IP or CIDR, repeatable) | +| `co` | Add `Access-Control-Allow-Origin: *` (CORS) | +| `x:https` | Force HTTPS — auto-redirect HTTP to HTTPS | +| `a:Name:Value` | Add request header | +| `u:Name:Value` | Update request header | +| `r:Name` | Remove request header | +| `qr` | Print a QR code of the URL to stdout (handy for mobile sharing) | + +Combine freely: `"b:admin:secret+co+x:https+free@a.pinggy.io"`. + +## Web Debugger (optional) + +Pinggy can mirror the inbound traffic to `localhost:4300` for inspection. Add a local forward to the SSH command: + +```bash +ssh -p 443 -L4300:localhost:4300 -R0:localhost:8000 free@a.pinggy.io +``` + +Then open `http://localhost:4300` in a browser to see live request/response pairs. + +## Pitfalls + +- **60-minute hard cap on the free tier.** The SSH session terminates at the 60-minute mark; the URL goes dead. For longer shares, either use `PINGGY_TOKEN` (Pro) or auto-restart with a shell loop (note that the URL changes on every restart for free-tier). +- **Free-tier URL is random and changes on restart.** Don't bookmark it, don't paste it into a config file. Re-parse from the log each time. +- **Concurrent free tunnels are limited to one per source IP.** Starting a second tunnel from the same machine usually kills the first. Pro tier lifts this. +- **`+` in usernames must be quoted.** Bare `ssh ... b:admin:secret+free@a.pinggy.io` works in bash but breaks under shells that treat `+` specially or when assembled programmatically. Always wrap in double quotes. +- **Don't tunnel anything sensitive without an access-control flag.** A bare HTTP tunnel is reachable by anyone with the URL. Use `b:`, `k:`, or `w:` for non-public services. +- **`process(action='log')` may miss SSH banner output.** Pinggy prints the URLs and then the SSH session goes interactive. Always redirect to a logfile and `grep` the file directly — same pattern as `cloudflared-quick-tunnel`. +- **Host-key prompt on first run.** Default OpenSSH config asks the user to accept Pinggy's host key. Always pass `-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null` for unattended runs. +- **TCP and TLS tunnels return a `.a.pinggy.online:` pair, not an https URL.** Parse with a different regex (`tcp://` and a port). Don't assume every Pinggy tunnel is HTTP. +- **Pro mode requires the token as the username, not a flag.** Use `"$PINGGY_TOKEN+a.pinggy.io"` (no `free@`). With a token you can also add `:persistent` for a stable subdomain — see `pinggy.io/docs/`. + +## Recipes + +Composite patterns combining a local origin with a Pinggy tunnel. Each recipe is self-contained — start the origin, start the tunnel, parse the URL, hand it back to the user. + +### Recipe 1 — Receive a webhook callback + +Use this when an external service (Stripe, GitHub, Discord, AgentMail, etc.) needs to POST to a publicly reachable URL during a local task. + +```bash +# 1. Tiny capturing server: every request gets appended to /tmp/webhook-hits.log +cat >/tmp/webhook-server.py <<'PY' +import http.server, json, datetime, pathlib +LOG = pathlib.Path("/tmp/webhook-hits.log") +class H(http.server.BaseHTTPRequestHandler): + def _capture(self): + n = int(self.headers.get("content-length") or 0) + body = self.rfile.read(n).decode("utf-8", "replace") if n else "" + rec = {"t": datetime.datetime.utcnow().isoformat(), "path": self.path, + "method": self.command, "headers": dict(self.headers), "body": body} + with LOG.open("a") as f: f.write(json.dumps(rec) + "\n") + self.send_response(200); self.send_header("content-type","application/json") + self.end_headers(); self.wfile.write(b'{"ok":true}\n') + def do_GET(self): self._capture() + def do_POST(self): self._capture() + def log_message(self,*a,**k): pass +http.server.HTTPServer(("127.0.0.1", 18080), H).serve_forever() +PY +nohup python3 /tmp/webhook-server.py >/tmp/webhook-server.log 2>&1 & +echo $! >/tmp/webhook-server.pid + +# 2. Tunnel — bearer-token-gate so randos can't pollute the capture log +nohup ssh -p 443 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \ + -o ServerAliveInterval=30 \ + -R0:localhost:18080 "k:$(openssl rand -hex 12)+free@a.pinggy.io" \ + >/tmp/webhook-pinggy.log 2>&1 & +echo $! >/tmp/webhook-pinggy.pid +sleep 5 +URL=$(grep -oE 'https://[a-z0-9-]+\.[a-z]+\.pinggy\.link' /tmp/webhook-pinggy.log | head -1) +echo "Webhook URL: $URL" + +# 3. While the agent works, watch hits land +tail -f /tmp/webhook-hits.log +``` + +Hand `$URL` to the service that needs to call you. Teardown: `kill $(cat /tmp/webhook-server.pid) $(cat /tmp/webhook-pinggy.pid)`. + +### Recipe 2 — Expose an MCP server over HTTP/SSE + +Use when a remote MCP client (Claude Desktop on another machine, a teammate's editor, etc.) needs to reach an MCP server running on the local box. Only works for MCP servers that speak HTTP transport — stdio-mode servers can't be tunneled. + +```bash +# 1. Start the MCP server in HTTP mode (example: a FastMCP server on port 8765) +nohup python3 my_mcp_server.py --transport http --port 8765 \ + >/tmp/mcp-server.log 2>&1 & +echo $! >/tmp/mcp-server.pid + +# 2. Tunnel with a bearer token — MCP traffic should not be open to the internet +TOKEN=$(openssl rand -hex 16) +nohup ssh -p 443 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \ + -o ServerAliveInterval=30 \ + -R0:localhost:8765 "k:$TOKEN+free@a.pinggy.io" \ + >/tmp/mcp-pinggy.log 2>&1 & +echo $! >/tmp/mcp-pinggy.pid +sleep 5 +URL=$(grep -oE 'https://[a-z0-9-]+\.[a-z]+\.pinggy\.link' /tmp/mcp-pinggy.log | head -1) +echo "MCP URL: $URL" +echo "Bearer token: $TOKEN" +``` + +The remote client connects to `$URL` with `Authorization: Bearer $TOKEN`. Hermes' own native MCP client config: `{"transport": "http", "url": "", "headers": {"Authorization": "Bearer "}}`. + +### Recipe 3 — Expose a local LLM endpoint (Ollama / vLLM / llama.cpp) + +Share a local model with a remote caller (another agent, a phone, a teammate). Ollama listens on `:11434`, vLLM and llama.cpp typically on `:8000`. + +```bash +# Pre-req: the model server is already running on 127.0.0.1:11434 (Ollama default) +TOKEN=$(openssl rand -hex 16) +nohup ssh -p 443 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \ + -o ServerAliveInterval=30 \ + -R0:localhost:11434 "k:$TOKEN+co+free@a.pinggy.io" \ + >/tmp/llm-pinggy.log 2>&1 & +echo $! >/tmp/llm-pinggy.pid +sleep 5 +URL=$(grep -oE 'https://[a-z0-9-]+\.[a-z]+\.pinggy\.link' /tmp/llm-pinggy.log | head -1) +echo "Endpoint: $URL" +echo "Token: $TOKEN" + +# Verify +curl -s "$URL/api/tags" -H "Authorization: Bearer $TOKEN" | head +``` + +`co` enables CORS so a browser caller can hit the endpoint. Drop `co` for backend-only callers. For an OpenAI-compatible vLLM/llama.cpp endpoint, callers use base URL `$URL/v1` with `Authorization: Bearer $TOKEN` — but note Pinggy strips/replaces nothing in the body, so the model server itself sees Pinggy's token; the local server should be configured to ignore auth (it's already on `127.0.0.1`) and let Pinggy do the gating. + +### Recipe 4 — Share a dev server with a one-shot password + +The fastest "let a teammate poke at my running app" pattern. Random password, prints once, dies when you Ctrl-C. + +```bash +PASS=$(openssl rand -base64 12 | tr -d '+/=' | head -c 12) +echo "Dev server password: $PASS" +ssh -p 443 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \ + -o ServerAliveInterval=30 \ + -R0:localhost:3000 "b:dev:$PASS+co+x:https+free@a.pinggy.io" +# URL prints to the terminal. Share URL + password. Ctrl-C to tear down. +``` + +`b:dev:$PASS` gates the URL with HTTP Basic auth. `x:https` forces TLS. `co` adds CORS for SPA frontends. + +## Verification + +```bash +# End-to-end: spin up a trivial origin, tunnel it, hit it, tear down +python3 -m http.server 18000 --bind 127.0.0.1 >/tmp/origin.log 2>&1 & +ORIGIN_PID=$! + +nohup ssh -p 443 \ + -o StrictHostKeyChecking=no \ + -o UserKnownHostsFile=/dev/null \ + -R0:localhost:18000 free@a.pinggy.io >/tmp/pinggy-verify.log 2>&1 & +SSH_PID=$! + +sleep 5 +URL=$(grep -oE 'https://[a-z0-9-]+\.[a-z]+\.pinggy\.link' /tmp/pinggy-verify.log | head -1) +echo "URL: $URL" +curl -sI "$URL/" | head -1 + +kill "$SSH_PID" "$ORIGIN_PID" +``` + +Expected: a `pinggy.link` URL and `HTTP/2 200` on the curl head. diff --git a/website/docs/user-guide/skills/optional/research/research-darwinian-evolver.md b/website/docs/user-guide/skills/optional/research/research-darwinian-evolver.md new file mode 100644 index 00000000000..121b2dde160 --- /dev/null +++ b/website/docs/user-guide/skills/optional/research/research-darwinian-evolver.md @@ -0,0 +1,217 @@ +--- +title: "Darwinian Evolver — Evolve prompts/regex/SQL/code with Imbue's evolution loop" +sidebar_label: "Darwinian Evolver" +description: "Evolve prompts/regex/SQL/code with Imbue's evolution loop" +--- + +{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */} + +# Darwinian Evolver + +Evolve prompts/regex/SQL/code with Imbue's evolution loop. + +## Skill metadata + +| | | +|---|---| +| Source | Optional — install with `hermes skills install official/research/darwinian-evolver` | +| Path | `optional-skills/research/darwinian-evolver` | +| Version | `0.1.0` | +| Author | Bihruze (Asahi0x), Hermes Agent | +| License | MIT | +| Platforms | linux, macos | +| Tags | `evolution`, `optimization`, `prompt-engineering`, `research` | +| Related skills | [`arxiv`](/docs/user-guide/skills/bundled/research/research-arxiv), [`jupyter-live-kernel`](/docs/user-guide/skills/bundled/data-science/data-science-jupyter-live-kernel) | + +## Reference: full SKILL.md + +:::info +The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active. +::: + +# Darwinian Evolver + +Run Imbue's [darwinian_evolver](https://github.com/imbue-ai/darwinian_evolver) — an +LLM-driven evolutionary search loop — to optimize a **prompt, regex, SQL query, +or small code snippet** against a fitness function. + +Status: thin wrapper around the upstream tool. The skill installs it, walks the +agent through writing a `Problem` definition (organism + evaluator + mutator), +and drives the loop via the upstream CLI or a small custom Python driver. + +**License:** the upstream tool is **AGPL-3.0**. The skill ONLY ever invokes it +via the upstream CLI or a `subprocess`/`uv run` call (mere aggregation). Do NOT +import upstream classes into Hermes itself. + +## When to Use + +- User says "optimize this prompt", "evolve a regex for X", "auto-improve this + code/SQL", "search for a better instruction". +- You have a scorer (exact match, regex pass-rate, unit test, LLM-judge, runtime + metric) AND a starting candidate (organism). If you don't have a scorer, stop + and define one first — that's the hard part. +- Cost is OK: a typical run is 50–500 LLM calls. On gpt-4o-mini that's pennies; + on Claude Sonnet it can be a few dollars. + +Do **not** use this when: +- The optimization target is differentiable (use gradient descent / DSPy). +- You only need to try 2–3 variants — just write them by hand. +- The fitness signal is purely subjective with no measurable criterion. + +## Prerequisites + +- Python ≥3.11 +- `git`, `uv` (or `pip`) +- One of: `OPENROUTER_API_KEY`, `ANTHROPIC_API_KEY`, or `OPENAI_API_KEY` + +The skill ships a small `parrot_openrouter.py` driver that uses `OPENROUTER_API_KEY` +via the OpenAI SDK, so any model on OpenRouter works. The upstream CLI itself +hardcodes Anthropic and needs `ANTHROPIC_API_KEY`. + +## Install (One-Time) + +Run via the `terminal` tool: + +```bash +mkdir -p ~/.hermes/cache/darwinian-evolver && cd ~/.hermes/cache/darwinian-evolver +[ -d darwinian_evolver ] || git clone --depth 1 https://github.com/imbue-ai/darwinian_evolver.git +cd darwinian_evolver && uv sync +``` + +Verify: + +```bash +cd ~/.hermes/cache/darwinian-evolver/darwinian_evolver \ + && uv run darwinian_evolver --help | head -5 +``` + +## Quick Start — The Built-In Parrot Example + +Tiny smoke test (requires `ANTHROPIC_API_KEY`): + +```bash +cd ~/.hermes/cache/darwinian-evolver/darwinian_evolver +uv run darwinian_evolver parrot \ + --num_iterations 2 \ + --num_parents_per_iteration 2 \ + --mutator_concurrency 2 --evaluator_concurrency 2 \ + --output_dir /tmp/parrot_demo +``` + +Outputs: +- `/tmp/parrot_demo/snapshots/iteration_N.pkl` — pickled population per iteration +- `/tmp/parrot_demo/` — per-iteration JSON log (path printed at end) + +Open `~/.hermes/cache/darwinian-evolver/darwinian_evolver/darwinian_evolver/lineage_visualizer.html` +in a browser and load the JSON log to see the evolutionary tree. + +## Quick Start — OpenRouter Driver (No Anthropic Key) + +The skill ships `scripts/parrot_openrouter.py` — same parrot problem, but the +LLM call goes through OpenRouter so any provider works. + +```bash +# From wherever the skill is installed: +SKILL_DIR=~/.hermes/skills/research/darwinian-evolver +DE_DIR=~/.hermes/cache/darwinian-evolver/darwinian_evolver + +cd "$DE_DIR" && \ + EVOLVER_MODEL='openai/gpt-4o-mini' \ + uv run --with openai python "$SKILL_DIR/scripts/parrot_openrouter.py" \ + --num_iterations 3 --num_parents_per_iteration 2 \ + --output_dir /tmp/parrot_or +``` + +Inspect the result with `scripts/show_snapshot.py`: + +```bash +uv run --with openai python "$SKILL_DIR/scripts/show_snapshot.py" \ + /tmp/parrot_or/snapshots/iteration_3.pkl +``` + +Expected output: 7 evolved prompt templates ranked by score, with the best +landing around 0.6–0.8 (the seed `Say {{ phrase }}` scored 0.000). + +## Defining a Custom Problem + +The skill ships `templates/custom_problem_template.py` — copy, edit, run. +Three things you must define: + +1. **`Organism`** — a Pydantic `BaseModel` subclass holding the artifact being + evolved (`prompt_template: str`, `regex_pattern: str`, `sql_query: str`, + `code_block: str`, etc.). Add a `run(*args)` method that exercises it. + +2. **`Evaluator`** — `.evaluate(organism) -> EvaluationResult(score=..., trainable_failure_cases=[...], holdout_failure_cases=[...], is_viable=True)`. + - **`score`** is in `[0, 1]`. Higher is better. + - **`trainable_failure_cases`** — what the mutator sees. Include enough + context (input, expected, actual) for the LLM to diagnose. + - **`holdout_failure_cases`** — kept out of the mutator's view. Use these + to detect overfitting. + - **`is_viable=True`** unless the organism is completely broken (raises, + returns None, etc.). A 0-score viable organism is fine — it just gets + down-weighted in parent selection. + +3. **`Mutator`** — `.mutate(organism, failure_cases, learning_log_entries) -> list[Organism]`. + Typically: build an LLM prompt that includes the current organism + a + failure case + an ask to propose a fix; parse the LLM's response; return + a new `Organism`. Return `[]` on parse failure — the loop handles it. + +Then write a driver script that wires `Problem(initial_organism, evaluator, [mutators])` +into `EvolveProblemLoop` and iterates over `loop.run(num_iterations=N)` — the +shipped `scripts/parrot_openrouter.py` is the reference. + +## Hyperparameters That Actually Matter + +| flag | default | when to change | +|---|---|---| +| `--num_iterations` | 5 | bump to 10–20 once you trust the evaluator | +| `--num_parents_per_iteration` | 4 | drop to 2 for cheap exploration | +| `--mutator_concurrency` | 10 | drop to 2–4 to avoid rate limits | +| `--evaluator_concurrency` | 10 | same; evaluator hits the LLM too | +| `--batch_size` | 1 | raise to 3–5 once your mutator handles multiple failures | +| `--verify_mutations` | off | turn on once mutator is wasteful (>10× cost saving on later runs per Imbue) | +| `--midpoint_score` | `p75` | leave alone unless scores cluster | +| `--sharpness` | 10 | leave alone | + +## Pitfalls + +1. **`Initial organism must be viable`** — set `is_viable=True` in your + `EvaluationResult` even on a 0-score seed. The loop refuses non-viable + organisms because they imply the loop has nothing to evolve from. +2. **Provider content filters kill runs.** Azure-backed OpenRouter models + reject phrases like "ignore previous instructions" with HTTP 400. Wrap + the LLM call in `try/except` and return `f""` — the + evolver will just score that organism 0 and move on. +3. **`loop.run()` is a generator** — calling it doesn't run anything until + you iterate. Use `for snap in loop.run(num_iterations=N):`. +4. **Snapshots are nested pickles.** `iteration_N.pkl` contains a dict with + `population_snapshot` (more pickled bytes). To unpickle you must have the + `Organism` class importable under the same dotted path it was pickled at. +5. **Concurrency defaults are aggressive.** 10/10 will hit rate limits on + most providers. Start with 2/2. +6. **CLI is hardcoded to Anthropic.** `uv run darwinian_evolver ` + reaches for `ANTHROPIC_API_KEY` and uses Claude Sonnet. To use any other + provider, write a driver like `parrot_openrouter.py`. +7. **AGPL.** Never `from darwinian_evolver import ...` inside Hermes core. + Custom driver scripts under `~/.hermes/skills/...` are user-side and fine. +8. **No PyPI package.** `pip install darwinian-evolver` will pull the wrong + thing. Always install from the GitHub repo. + +## Verification + +After install + a parrot run, exit code 0 from this is sufficient: + +```bash +DE_DIR=~/.hermes/cache/darwinian-evolver/darwinian_evolver +ls "$DE_DIR/darwinian_evolver/lineage_visualizer.html" >/dev/null && \ +cd "$DE_DIR" && uv run darwinian_evolver --help >/dev/null && \ +echo "darwinian-evolver: OK" +``` + +## References + +- [Imbue research post](https://imbue.com/research/2026-02-27-darwinian-evolver/) +- [ARC-AGI-2 results](https://imbue.com/research/2026-02-27-arc-agi-2-evolution/) +- [imbue-ai/darwinian_evolver](https://github.com/imbue-ai/darwinian_evolver) (AGPL-3.0) +- [Darwin Gödel Machines](https://arxiv.org/abs/2505.22954) +- [PromptBreeder](https://arxiv.org/abs/2309.16797) diff --git a/website/docs/user-guide/skills/optional/research/research-osint-investigation.md b/website/docs/user-guide/skills/optional/research/research-osint-investigation.md new file mode 100644 index 00000000000..7428c3022b2 --- /dev/null +++ b/website/docs/user-guide/skills/optional/research/research-osint-investigation.md @@ -0,0 +1,294 @@ +--- +title: "Osint Investigation" +sidebar_label: "Osint Investigation" +description: "Public-records OSINT investigation framework — SEC EDGAR filings, USAspending contracts, Senate lobbying, OFAC sanctions, ICIJ offshore leaks, NYC property r..." +--- + +{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */} + +# Osint Investigation + +Public-records OSINT investigation framework — SEC EDGAR filings, USAspending contracts, Senate lobbying, OFAC sanctions, ICIJ offshore leaks, NYC property records (ACRIS), OpenCorporates registries, CourtListener court records, Wayback Machine archives, Wikipedia + Wikidata, GDELT news monitoring. Entity resolution across sources, cross-link analysis, timing correlation, evidence chains. Python stdlib only. + +## Skill metadata + +| | | +|---|---| +| Source | Optional — install with `hermes skills install official/research/osint-investigation` | +| Path | `optional-skills/research/osint-investigation` | +| Version | `0.1.0` | +| Author | Hermes Agent (adapted from ShinMegamiBoson/OpenPlanter, MIT) | +| Platforms | linux, macos, windows | +| Tags | `osint`, `investigation`, `public-records`, `sec`, `sanctions`, `corporate-registry`, `property`, `courts`, `due-diligence`, `journalism` | +| Related skills | [`domain-intel`](/docs/user-guide/skills/optional/research/research-domain-intel), [`arxiv`](/docs/user-guide/skills/bundled/research/research-arxiv) | + +## Reference: full SKILL.md + +:::info +The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active. +::: + +# OSINT Investigation — Public Records Cross-Reference + +Investigative framework for public-records OSINT: government contracts, +corporate filings, lobbying, sanctions, offshore leaks, property records, +court records, web archives, knowledge bases, and global news. Resolve +entities across heterogeneous sources, build cross-links with explicit +confidence, run statistical timing tests, and produce structured evidence +chains. + +**Python stdlib only.** Zero install. Works on Linux, macOS, Windows. Most +sources work with no API key (OpenCorporates has an optional free token +that raises rate limits). + +Adapted from the MIT-licensed ShinMegamiBoson/OpenPlanter project; expanded +to cover identity / property / litigation / archives / news sources that +the original didn't address. + +## When to use this skill + +Use when the user asks for: + +- "follow the money" — government contracts, lobbying → legislation, sanctions +- corporate due diligence — who controls company X, where are they + incorporated, who serves on their boards, what filings have they made +- sanctions screening — is entity X on OFAC SDN, ICIJ offshore leaks +- pay-to-play investigation — contractors with offshore ties, lobbying + clients winning awards +- property ownership — find recorded deeds/mortgages by name or address + (NYC; for other counties point users at the relevant recorder) +- litigation history — find federal + state court opinions and PACER dockets +- multi-source entity resolution where naming varies (LLC suffixes, abbreviations) +- evidence-chain construction with explicit confidence levels +- "what's been said about X" — international news (GDELT) + Wikipedia + narrative + Wayback Machine to recover dead URLs + +Do NOT use this skill for: + +- general web research → `web_search` / `web_extract` +- domain/infrastructure OSINT → `domain-intel` skill +- academic literature → `arxiv` skill +- social-media profile discovery → `sherlock` skill (optional) +- US **federal** campaign finance — FEC is intentionally NOT covered here + (the API is unreliable for ad-hoc contributor-name queries on the free + DEMO_KEY tier). For federal donations, point users at + https://www.fec.gov/data/ directly. + +## Workflow + +The agent runs scripts via the `terminal` tool. `SKILL_DIR` is the directory +holding this SKILL.md. + +### 1. Identify which sources apply + +Read the data-source wiki entries to plan the investigation: + +``` +ls SKILL_DIR/references/sources/ + +# Federal financial / regulatory +cat SKILL_DIR/references/sources/sec-edgar.md # corporate filings +cat SKILL_DIR/references/sources/usaspending.md # federal contracts +cat SKILL_DIR/references/sources/senate-ld.md # lobbying +cat SKILL_DIR/references/sources/ofac-sdn.md # sanctions +cat SKILL_DIR/references/sources/icij-offshore.md # offshore leaks + +# Identity / property / litigation / archives / news +cat SKILL_DIR/references/sources/nyc-acris.md # NYC property records +cat SKILL_DIR/references/sources/opencorporates.md # global corporate registry +cat SKILL_DIR/references/sources/courtlistener.md # court records (federal + state) +cat SKILL_DIR/references/sources/wayback.md # Wayback Machine archives +cat SKILL_DIR/references/sources/wikipedia.md # Wikipedia + Wikidata +cat SKILL_DIR/references/sources/gdelt.md # global news monitoring +``` + +Each entry follows a 9-section template: summary, access, schema, coverage, +cross-reference keys, data quality, acquisition, legal, references. + +The **cross-reference potential** section maps join keys between sources — read +those first to pick the right pair. + +### 2. Acquire data + +Each source has a stdlib-only fetch script in `SKILL_DIR/scripts/`: + +**Federal financial / regulatory** + +```bash +# SEC EDGAR filings (corporate disclosures) +python3 SKILL_DIR/scripts/fetch_sec_edgar.py --cik 0000320193 \ + --types 10-K,10-Q --out data/edgar_filings.csv + +# USAspending federal contracts +python3 SKILL_DIR/scripts/fetch_usaspending.py --recipient "EXAMPLE CORP" \ + --fy 2024 --out data/contracts.csv + +# Senate LD-1 / LD-2 lobbying disclosures +python3 SKILL_DIR/scripts/fetch_senate_ld.py --client "EXAMPLE CORP" \ + --year 2024 --out data/lobbying.csv + +# OFAC SDN sanctions list (full snapshot) +python3 SKILL_DIR/scripts/fetch_ofac_sdn.py --out data/ofac_sdn.csv + +# ICIJ Offshore Leaks — downloads ~70 MB bulk CSV on first use, +# then searches it locally. Cached for 30 days under +# $HERMES_OSINT_CACHE/icij/ (default: ~/.cache/hermes-osint/icij/). +python3 SKILL_DIR/scripts/fetch_icij_offshore.py --entity "EXAMPLE CORP" \ + --out data/icij.csv +``` + +**Identity / property / litigation / archives / news** + +```bash +# NYC property records (deeds, mortgages, liens) — ACRIS via Socrata +python3 SKILL_DIR/scripts/fetch_nyc_acris.py --name "SMITH, JOHN" \ + --out data/acris.csv +python3 SKILL_DIR/scripts/fetch_nyc_acris.py --address "571 HUDSON" \ + --out data/acris_addr.csv + +# OpenCorporates — 130+ jurisdiction corporate registry +# (free token required; set OPENCORPORATES_API_TOKEN or pass --token) +python3 SKILL_DIR/scripts/fetch_opencorporates.py --query "Example Corp" \ + --jurisdiction us_ny --out data/opencorporates.csv + +# CourtListener — federal + state court opinions, PACER dockets +python3 SKILL_DIR/scripts/fetch_courtlistener.py --query "Smith v. Example Corp" \ + --type opinions --out data/courts.csv + +# Wayback Machine — historical web captures +python3 SKILL_DIR/scripts/fetch_wayback.py --url "example.com" \ + --match host --collapse digest --out data/wayback.csv + +# Wikipedia + Wikidata — narrative bio + structured facts +# Set HERMES_OSINT_UA=your-app/1.0 (your@email) to identify yourself +python3 SKILL_DIR/scripts/fetch_wikipedia.py --query "Bill Gates" \ + --out data/wp.csv + +# GDELT — global news in 100+ languages, ~2015→present +python3 SKILL_DIR/scripts/fetch_gdelt.py --query '"Example Corp"' \ + --timespan 1y --out data/gdelt.csv +``` + +All outputs are normalized CSV with a header row. Re-run scripts idempotently. + +When a private individual won't be in a source (e.g. SEC EDGAR for a non-public- +company person, USAspending for someone who isn't a federal contractor, Senate +LDA for someone who isn't a lobbying client), the script returns 0 rows with a +clear warning rather than silently writing an empty CSV. EDGAR specifically +flags when the company-name resolver matched an individual Form 3/4/5 filer +rather than a corporate registrant. + +Rate-limit notes are in each source's wiki entry. Default fetchers sleep +politely between paginated requests. **API keys raise rate limits** for +sources that support them (`SEC_USER_AGENT`, `SENATE_LDA_TOKEN`, +`OPENCORPORATES_API_TOKEN`, `COURTLISTENER_TOKEN`). All scripts surface +429 responses immediately with the upstream's quota message so the user +knows to slow down or supply a key. + +### 3. Resolve entities across sources + +Normalize names and find matches between two CSV files: + +```bash +# Match lobbying clients (Senate LDA) against contract recipients (USAspending) +python3 SKILL_DIR/scripts/entity_resolution.py \ + --left data/lobbying.csv --left-name-col client_name \ + --right data/contracts.csv --right-name-col recipient_name \ + --out data/cross_links.csv +``` + +Three matching tiers with explicit confidence: + +| Tier | Method | Confidence | +|------|--------|------------| +| `exact` | Normalized strings equal after suffix/punctuation strip | high | +| `fuzzy` | Sorted-token equality (word-bag match) | medium | +| `token_overlap` | ≥60% token overlap, ≥2 shared tokens, tokens ≥4 chars | low | + +Output `cross_links.csv` columns: `match_type, confidence, left_name, +right_name, left_normalized, right_normalized, left_row, right_row`. + +### 4. Statistical timing correlation (optional) + +Test whether two time series cluster suspiciously close together — e.g. +lobbying filings near contract awards — using a permutation test: + +```bash +python3 SKILL_DIR/scripts/timing_analysis.py \ + --donations data/lobbying.csv --donation-date-col filing_date \ + --donation-amount-col income --donation-donor-col client_name \ + --donation-recipient-col registrant_name \ + --contracts data/contracts.csv --contract-date-col award_date \ + --contract-vendor-col recipient_name \ + --cross-links data/cross_links.csv \ + --permutations 1000 \ + --out data/timing.json +``` + +The script's column flags are intentionally generic — the original tool was +written for donations vs awards, but it works for any (event, payee) time +series joined through cross-links. Null hypothesis: event timing is +independent of award dates. One-tailed p-value = fraction of permutations +with mean nearest-award distance ≤ observed. Minimum 3 events per (payer, +vendor) pair to run the test. + +### 5. Build the findings JSON (evidence chain) + +```bash +python3 SKILL_DIR/scripts/build_findings.py \ + --cross-links data/cross_links.csv \ + --timing data/timing.json \ + --out data/findings.json +``` + +Every finding has `id, title, severity, confidence, summary, evidence[], sources[]`. +Each evidence item points back to a specific row in a source CSV. The user (or a +follow-up agent) can verify every claim against its source. + +## Confidence and evidence discipline + +This is the load-bearing rule of the skill. Tell the user: + +- Every claim must trace to a record. No naked assertions. +- Confidence tier travels with the claim. `match_type=fuzzy` is "probable", + not "confirmed." +- Entity resolution produces candidates, NOT conclusions. A `fuzzy` match + between "ACME LLC" and "Acme Holdings Group" is a lead, not a fact. +- Statistical significance ≠ wrongdoing. p < 0.05 means the timing pattern + is unlikely under the null. It does not establish corruption. +- All data sources here are public records. They may still contain + inaccuracies, stale info, or redactions (GDPR, sealed records). + +## Adding a new data source + +Use the template: + +```bash +cp SKILL_DIR/templates/source-template.md \ + SKILL_DIR/references/sources/.md +``` + +Fill in all 9 sections. Write a `fetch_.py` script in `scripts/` that +uses stdlib only and writes a normalized CSV. Update the source list in the +"When to use" section above. + +## Tools and their limits + +- `entity_resolution.py` does NOT use external fuzzy libraries (no rapidfuzz, + no jellyfish). Token-bag matching is the upper bound here. If you need + Levenshtein, transliteration, or phonetic matching, pip-install separately. +- `timing_analysis.py` uses Python's `random` for permutations. For + reproducibility, pass `--seed N`. +- `fetch_*.py` scripts use `urllib.request` and respect `Retry-After`. Heavy + bulk usage may still violate ToS — read each source's legal section first. + +## Legal note + +All Phase-1 sources are public records. Bulk acquisition is permitted under +their respective access terms (FOIA, public records law, ICIJ explicit +publication, OFAC public data). However: + +- Some sources rate-limit aggressively. Respect their headers. +- Some redact registrant info (GDPR on WHOIS, sealed filings). +- Cross-referencing public records to identify private individuals can have + ethical implications. The skill produces evidence chains, not accusations. diff --git a/website/sidebars.ts b/website/sidebars.ts index 52ed452d046..1a0aa6fb0bb 100644 --- a/website/sidebars.ts +++ b/website/sidebars.ts @@ -83,6 +83,7 @@ const sidebars: SidebarsConfig = { items: [ 'user-guide/features/voice-mode', 'user-guide/features/web-search', + 'user-guide/features/x-search', 'user-guide/features/browser', 'user-guide/features/computer-use', 'user-guide/features/vision', @@ -423,6 +424,7 @@ const sidebars: SidebarsConfig = { items: [ 'user-guide/skills/optional/devops/devops-cli', 'user-guide/skills/optional/devops/devops-docker-management', + 'user-guide/skills/optional/devops/devops-pinggy-tunnel', 'user-guide/skills/optional/devops/devops-watchers', ], }, @@ -547,10 +549,12 @@ const sidebars: SidebarsConfig = { collapsed: true, items: [ 'user-guide/skills/optional/research/research-bioinformatics', + 'user-guide/skills/optional/research/research-darwinian-evolver', 'user-guide/skills/optional/research/research-domain-intel', 'user-guide/skills/optional/research/research-drug-discovery', 'user-guide/skills/optional/research/research-duckduckgo-search', 'user-guide/skills/optional/research/research-gitnexus-explorer', + 'user-guide/skills/optional/research/research-osint-investigation', 'user-guide/skills/optional/research/research-parallel-cli', 'user-guide/skills/optional/research/research-qmd', 'user-guide/skills/optional/research/research-scrapling', @@ -689,6 +693,7 @@ const sidebars: SidebarsConfig = { 'developer-guide/gateway-internals', 'developer-guide/session-storage', 'developer-guide/provider-runtime', + 'developer-guide/programmatic-integration', ], }, {