When refresh_xai_oauth_pure raises a terminal error (HTTP 400/401/403,
i.e. revoked or reused refresh token), _refresh_entry's existing race-
recovery path re-syncs from auth.json and returns if another process has
already rotated the tokens. If auth.json still holds the same stale
token pair, the function fell through to _mark_exhausted — leaving the
dead credentials in auth.json. On the next Hermes startup _seed_from_singletons
re-seeded the pool from those stale tokens, causing the same failure loop
on every session.
Fix: after the auth.json re-sync check in the xAI-oauth error handler,
detect terminal errors with the new _is_terminal_xai_oauth_refresh_error
helper and apply a quarantine:
- Clear access_token and refresh_token from providers["xai-oauth"]["tokens"]
in auth.json so they are not re-seeded.
- Write a last_auth_error entry for hermes doctor / auth status diagnostics.
- Remove all loopback_pkce entries from the in-memory pool so the current
session stops retrying with the dead credentials.
Mirrors the identical quarantine already in place for Nous OAuth
(c90556262).
Closes the parity gap introduced when c90556262 added Nous-only terminal
error handling without a corresponding xAI-oauth path.
When xAI's auth backend fails to redirect (e.g. the German "We couldn't reach
your app" fallback shown in #27385), users sometimes navigate manually to the
bare loopback callback URL — `http://127.0.0.1:<port>/callback` with no query
string. The handler used to return 200 "xAI authorization received" for any
GET that hit the expected path, because `parse_qs("")` yields no `code` and no
`error`, leaving `result` untouched while the success page was still served.
The CLI's wait loop, of course, still saw no code and timed out with
`AuthError: xAI authorization timed out waiting for the local callback.`
The user is left looking at a browser tab that claims success and a terminal
that says failure — exactly the contradiction in #27385.
This change makes the empty-callback case return 400 with an explicit
"not received" page and a hint to retry `hermes auth add xai-oauth`. The
wait-loop semantics are unchanged: `result["code"]` and `result["error"]`
both stay None, so the CLI still raises a real timeout rather than treating
the bare hit as a successful callback.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When xAI returns a subscription/entitlement error through an SSE
``type=error`` frame, ``_StreamErrorEvent`` is raised with
``status_code=None``. This caused ``_classify_by_status`` (step 2 of
``classify_api_error``) to be skipped entirely, and the Grok-specific
phrases ("do not have an active Grok subscription", "out of available
resources") appeared in none of the message-pattern lists. The error
fell through to ``FailoverReason.unknown (retryable=True)``, burning
``max_retries`` on every affected X Premium+ / SuperGrok user before
the agent stopped — and ``_is_entitlement_failure`` was never called
because it only fires under ``FailoverReason.auth``.
The HTTP 403 path already handled this correctly (``_classify_by_status``
returns ``auth/non-retryable`` for 403). Add an explicit pattern block
at step 1 (highest priority, before the ``status_code`` guard) so both
code paths route to ``FailoverReason.auth, retryable=False,
should_fallback=True`` — matching the 403 path exactly.
Add three regression tests in ``Fix D`` section of
``test_codex_xai_oauth_recovery.py``:
- primary "do not have an active Grok subscription" phrase
- "out of available resources" + "grok" variant
- unrelated ``_StreamErrorEvent`` must not be reclassified
xAI is a first-class provider in hermes-agent with its own credential
pool entry (XAI_API_KEY / xai-oauth). API keys follow the format
xai-<60+ alphanumeric chars> and were absent from _PREFIX_PATTERNS in
agent/redact.py.
When a key appears raw in log output, tool results, or error messages,
it passed through completely unmasked. The ENV-assignment and Bearer
header patterns catch the most common cases, but a raw token in a
stack trace or debug print had no protection.
Verified before fix:
redact_sensitive_text("using key xai-ABCD...rstu to call xAI", force=True)
# "using key xai-ABCD...rstu to call xAI" <- exposed
After fix:
# "using key xai-AB...rstu to call xAI" <- masked
Five unit tests added to TestXaiToken covering bare token masking,
env assignment, short-prefix false positive, company name false
positive, and visible prefix in masked output.
Tirith flags .app domains with a lookalike_tld finding because the TLD
"can be confused with file extensions". This is a false positive for
legitimate production APIs (e.g. api.example.app, lark.app).
Add _is_app_tld_finding() and a post-parse suppression block in
check_command_security(): if the only finding(s) on a warn verdict are
lookalike_tld entries for .app, downgrade the action to allow.
Mixed findings (e.g. .app + shortened_url) and block verdicts are
unaffected. Non-.app lookalike_tld findings (.zip, .exe, etc.) are
preserved.
Add 15 regression tests covering: .app-only suppression, mixed-finding
preservation, non-.app TLD preservation, block-verdict invariance, and
the helper's field-name and case-insensitivity behaviour.
Closes#24461
When auxiliary compression's summary generation returns None (aux model
errored, returned non-JSON, timed out, etc.) the compressor previously
still dropped every middle message between compress_start..compress_end
and replaced them with a static 'Summary generation was unavailable'
placeholder. The session kept going but the user silently lost N turns
of context for nothing.
New behavior: on summary failure, compress() aborts entirely — returns
the input messages unchanged and sets _last_compress_aborted=True. The
existing _summary_failure_cooldown_until gate (30-60s) keeps the aux
model from being burned on every turn. Auto-compress callers detect
the no-op (len(after) == len(before)) and stop looping. The chat is
'frozen' at its current size until the next /compress or /new.
Manual /compress (CLI + gateway) now passes force=True which clears
the cooldown so users can retry immediately after an auto-abort. If
the manual retry also fails, the user gets a visible warning telling
them nothing was dropped and how to retry.
- agent/context_compressor.py: compress() gains force= kwarg; failure
branch sets _last_compress_aborted and returns messages unchanged
instead of inserting placeholder.
- run_agent.py: _compress_context() detects abort, surfaces warning,
skips session-rotation entirely, returns messages unchanged.
- cli.py + gateway/run.py: manual /compress paths pass force=True.
- gateway/run.py: hygiene + /compress handlers detect _last_compress_aborted
and emit the new 'Compression aborted' warning (gateway.compress.aborted)
instead of the old 'N historical messages were removed' message.
- locales/*.yaml: new gateway.compress.aborted key in all 16 locales.
- tests: updated to assert the abort contract (messages preserved,
compression_count not incremented, abort flag set, no placeholder
leaked). New test_force_true_bypasses_failure_cooldown covers the
manual-retry path.
decompose_triage_task inlines SQL INSERTs for atomicity and intentionally
bypasses link_tasks() — which calls _would_cycle() per edge. If the LLM
emits a cyclic parent graph (e.g. A.parents=[1], B.parents=[0]) the DB
write succeeds but every involved child deadlocks in 'todo' forever:
recompute_ready() requires all parents to be done, which is impossible
when A waits for B and B waits for A.
Add a Kahn topological sort over the sibling parent indices in the
pre-validation block, before any DB writes. Mirrors the cycle-safety
guarantee that link_tasks() provides for manually linked tasks.
Cherry-pick of @sharziki's #27022 routed Azure Foundry through
_requires_bearer_auth, which also triggered the MiniMax-specific
beta-strip in _common_betas_for_base_url — dropping the 1M-context
beta from Azure even though Azure needs it for 1M context.
Split the strip predicate: introduce _is_minimax_anthropic_endpoint
so the fine-grained-tool-streaming and context-1m strips only fire
for MiniMax hosts, leaving Azure's bearer-auth header swap intact
without losing 1M context.
Also add a regression test that asserts Azure gets Bearer auth,
the api-version query param, and the context-1m-2025-08-07 beta.
* refactor(bootstrap): consolidate ACP browser bootstrap into install.{sh,ps1}
Delete 687 lines of duplicated browser bootstrap code from
acp_adapter/bootstrap/. All browser installation now routes through
dep_ensure -> install.{sh,ps1} --ensure, using agent-browser install
for Chromium. install.sh gains ensure_browser() with macOS app-bundle
detection and per-distro guidance.
Tracking: #27826
* fix(install.sh): add --ignore-scripts to npm install for camofox
@askjo/camofox-browser has a dependency (impit) whose postinstall
script runs `npx only-allow pnpm`, which fails under npm. Adding
--ignore-scripts avoids the spurious failure without affecting
functionality.
Tracking: #27826
* fix: add explicit return in ensure_browser, narrow exception in entry.py
ensure_browser() now returns 0 explicitly on all success paths.
_run_setup_browser() catches OSError instead of broad Exception,
letting ImportError propagate as a real packaging bug.
* feat(dep_ensure): complete Windows bootstrap — dep_ensure + install.ps1 + detection
dep_ensure.py gains Windows awareness: PowerShell invocation, platform-
specific browser detection, (path, shell) tuple returns.
install.ps1 gains -Ensure/-PostInstall modes using npm -g --prefix
(aligned with install.sh) and agent-browser install for Chromium.
browser_tool.py gains node/ in candidate dirs for Windows .cmd shims.
Both install scripts bundled in pip wheel.
Tracking: #27826
* fix(install.ps1): add --ignore-scripts to npm install for camofox
@askjo/camofox-browser has a dependency (impit) whose postinstall
script runs `npx only-allow pnpm`, which fails under npm. Adding
--ignore-scripts avoids the spurious failure without affecting
functionality.
Tracking: #27826
* fix: remove duplicate install scripts from git
CI already copies scripts/install.{sh,ps1} into hermes_cli/scripts/
during wheel build. No need to commit copies — .gitignore keeps them
out, _find_install_script() falls back to scripts/ for git-clone users.
Tracking: #27826
* fix: address review — remove env_extra, fix ps1 error handling
- Remove unused env_extra parameter from ensure_dependency()
- Invoke-EnsureMode node case now uses Test-Node consistently
- Install-AgentBrowser uses throw instead of exit 1
* feat(config): add install-method stamping + Docker detection
Dockerfile stamps "docker", install.sh stamps "git", and cmd_postinstall
stamps "pip" into ~/.hermes/.install_method. detect_install_method() reads
the stamp first, then falls back to managed-system / container / .git
heuristics. Adds Docker upgrade guidance.
Tracking: #27826
* fix(stamp): move Docker stamp to entrypoint, install.sh stamp after print_success
The Dockerfile stamp was overwritten by the VOLUME overlay at container
start. Moving it to entrypoint.sh ensures it persists. The install.sh
stamp now writes after print_success so it only lands on full success.
The agent can now produce a chart, PDF, spreadsheet, or any other supported
file type and have it land in Slack / Discord / Telegram / WhatsApp / etc.
as a native attachment, just by mentioning the absolute path in its
response. Same primitive works for kanban-worker completions: workers
attach artifacts via kanban_complete(artifacts=[...]) and the gateway
notifier uploads them alongside the completion message.
Changes:
- gateway/platforms/base.py: extract_local_files now covers PDFs, docx,
spreadsheets (xlsx/csv/json/yaml), presentations (pptx), archives
(zip/tar/gz), audio (mp3/wav/...), and html — not just images and video.
Image/video extensions still embed inline; everything else routes to
send_document via the existing dispatch partition in gateway/run.py.
- tools/kanban_tools.py + hermes_cli/kanban_db.py: kanban_complete gains
an explicit ``artifacts`` parameter. The handler stashes it in
metadata.artifacts (for downstream workers) and the kernel promotes
it onto the completed-event payload so the notifier can find it
without a second SQL round-trip.
- gateway/run.py: _kanban_notifier_watcher now calls a new helper
_deliver_kanban_artifacts after sending the completion text. The
helper reads payload.artifacts (preferred), falls back to scanning
the payload summary and task.result with extract_local_files, then
partitions images / videos / documents and uploads each via
send_multiple_images / send_video / send_document.
- website/docs/user-guide/features/deliverable-mode.md + sidebars.ts:
user-facing docs page covering the extension list, the kanban
artifacts pattern, and the MCP-for-connector-breadth recommendation.
Tests:
- tests/gateway/test_extract_local_files.py: 7 new test cases
(documents, spreadsheets, presentations, audio, archives, html,
chart-pdf canonical case). 44 passing, 0 regressions.
- tests/tools/test_kanban_tools.py: 4 new cases covering the artifacts
arg shape (list / string / merge with existing metadata / type
rejection). 17 passing.
- tests/hermes_cli/test_kanban_notify.py: 2 new cases covering full
notifier → artifact-upload path and missing-file silent-skip. 12
passing.
- E2E (real files, real kanban kernel, real BasePlatformAdapter):
worker calls kanban_complete(artifacts=[png,pdf,csv]) → metadata +
event payload land → notifier helper partitions correctly →
send_multiple_images called once with the PNG, send_document called
twice with PDF + CSV.
What's NOT in this PR (deferred to follow-ups):
- Ad-hoc "research this for two hours, ping the thread when done"
slash command — covered today by kanban subscriptions; a dedicated
slash command can ride a follow-up PR if needed.
- Setup-wizard prompt for recommended MCP servers (Notion, GitHub,
Linear, etc.) — docs page lists them; UI is a separate change.
Plan and rationale captured in ~/.hermes/docs/perplexity-computer-parity.pdf
(local doc, not shipped).
Adds a 'triage_aux_unavailable' diagnostic for tasks stuck in triage when
neither the active aux helper slot nor the main-model auto fallback is usable.
Auto-decompose aware:
- kanban.auto_decompose=True (default): primary is auxiliary.kanban_decomposer,
triage_specifier is the fanout=false fallback.
- kanban.auto_decompose=False: primary is auxiliary.triage_specifier (manual
'hermes kanban specify' path).
Default aux slots use 'provider: auto' which falls back to the main model, so
this rule only fires when both the explicit slot config AND the main-model
auto fallback are absent. Quiet by default; informative when there is a real
config gap.
Also adds kd.config_from_runtime_config() that carries kanban + auxiliary +
model keys through to diagnostics, and updates CLI/dashboard call sites to
use it. config_from_kanban_config() is preserved for back-compat.
Reworks the original PR #25640 idea (@qWaitCrypto) to align with the new
auto-decompose dispatcher path landed in #27572. The original PR pointed only
at auxiliary.triage_specifier, which is now the fallback rather than the
primary helper.
Co-authored-by: qWaitCrypto <axmaiqiu@gmail.com>
* feat(session_search): single-shape tool with discovery, scroll, browse — no LLM
Replaces the LLM-summarized session_search with a single-shape tool that
returns actual messages from the DB. Three calling shapes inferred from
args (no mode parameter):
1. Discovery — pass query. FTS5 + anchored ±5 window + bookends per hit,
all in one call. ~20ms on a real DB instead of ~90s for the previous
three aux-LLM calls.
2. Scroll — pass session_id + around_message_id. Returns a window
centered on the anchor. To paginate, re-anchor on the first/last id
of the returned window. Boundary message appears in both windows
as the orientation marker. ~1ms per scroll call.
3. Browse — no args. Recent sessions chronologically.
Bookend_start (first 3 user+assistant msgs) and bookend_end (last 3) give
the agent goal + resolution on every discovery hit, so a single tool call
reconstructs a long session's arc without loading the whole transcript.
The aux-LLM summary path is gone: it cost ~$0.30/call, took ~30s, and
laundered FTS5 hits through a model that could confabulate when the right
session wasn't in the hit list. The merged shape returns byte-for-byte
content from SQLite.
History:
- PR #20238 (JabberELF) seeded the fast/summary dual-mode split.
- PR #26419 (yoniebans) expanded to fast/guided/summary with bookends,
multi-anchor drill-down, default-mode config, and a teaching skill.
This PR collapses that toolkit into one shape with explicit scroll
support, drops the summary path, drops the mode parameter, drops the
config knob, drops the skill. JabberELF's seed work is acknowledged via
the AUTHOR_MAP entry.
Validation:
- 38/38 tool tests pass (tests/tools/test_session_search.py)
- 12/12 get_messages_around tests pass (tests/hermes_state/)
- 11/11 get_anchored_view tests pass (tests/hermes_state/)
- Full tests/tools/ run: 5168 passing, 2 failures pre-exist on main
(test ordering in test_delegate.py, unrelated)
- E2E against live state DB: discovery 20ms, scroll 1ms, browse 280ms;
pagination forward+backward works with boundary-message orientation;
error paths return clean tool_error responses
Co-authored-by: JabberELF <abcdjmm970703@gmail.com>
Co-authored-by: yoniebans <jonny@nousresearch.com>
* chore(session_search): prune dead LLM-summary config and docs
Companion to the single-shape rewrite. The auxiliary.session_search config
block, max_concurrency / extra_body tunables, and matching docs sections
all referenced the removed LLM summarization path. Removing them so users
don't try to tune knobs that nothing reads.
- hermes_cli/config.py: drop dead auxiliary.session_search block from
DEFAULT_CONFIG. Leftover keys in user config.yaml are harmless and
ignored.
- hermes_cli/tips.py: drop two tips referencing the removed
max_concurrency / extra_body knobs.
- website/docs/user-guide/configuration.md: drop 'Session Search Tuning'
section and the auxiliary.session_search block from the example.
- website/docs/user-guide/features/fallback-providers.md: drop session_search
rows from the auxiliary-tasks tables and the dedicated tuning subsection.
- website/docs/reference/tools-reference.md: rewrite the session_search
entry to describe the new three-shape behaviour.
- CONTRIBUTING.md: update the file-tree description.
- tests/tools/test_llm_content_none_guard.py: remove TestSessionSearchContentNone
class and test_session_search_tool_guarded — both guard against an
unguarded .content.strip() call site in _summarize_session() that no
longer exists.
Validation: 97/97 targeted tests still pass (hermes_state + session_search +
llm_content_none_guard). Config tests 55/55.
---------
Co-authored-by: JabberELF <abcdjmm970703@gmail.com>
Co-authored-by: yoniebans <jonny@nousresearch.com>
The system prompt's 'Conversation started:' line carried minute precision
(%I:%M %p), making it byte-unstable across every rebuild path. Within a
CLI session the in-memory cache held, but on the gateway path (fresh
AIAgent per turn → restore from session DB), any silent failure in the
read or write path dropped the cache stem and forced a full re-prefill
on every subsequent turn. Local prefix-caching backends (llama.cpp /
vLLM) saw this as KV-cache invalidation; remote prefix-caching providers
saw it as an Anthropic-style cache miss.
Three changes:
1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM').
System prompt now byte-stable for the full day. The model can still
query exact time via tools when it actually needs it. Credit:
@iamfoz (PR #20451).
2. Loud logging on session DB write failures. The update_system_prompt
call used to log at DEBUG, hiding disk-full / locked-database / schema
drift behind a silent fall-through that forced fresh rebuilds on
every subsequent turn. Now WARN with the session id and exception so
persistent issues show up in agent.log without verbose mode.
3. Three-way stored-state distinction on read. The previous
'session_row.get("system_prompt") or None' collapsed three states
into one (missing row / null column / empty string). Now we tell them
apart and WARN when a continuing session lands on null/empty (which
means the previous turn's write never persisted — every subsequent
turn rebuilds and the prefix cache misses every time).
The restore block is extracted into _restore_or_build_system_prompt()
so the prefix-cache path can be unit-tested in isolation.
E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary
sleep restores byte-identical bytes from the session DB. NULL stored
prompt fires the new warning. Date-only timestamp survives the rebuild
path. All on real SessionDB, no mocks.
Tests:
- tests/agent/test_system_prompt_restore.py (10 new tests)
- tests/run_agent/test_run_agent.py::TestBuildSystemPrompt::
test_datetime_is_date_only_not_minute_precision
Closes#20451 (date-only), #18547 (prefix stabilization),
#8689 (stabilize timestamp across compression), #15866 (timestamp
caching question), #8687 (compression timestamp), #27339
(claim #3: live timestamp in cached system prompt).
Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>
Grok models hit the same failure modes that OPENAI_MODEL_EXECUTION_GUIDANCE
addresses for GPT/Codex: claiming completion without tool calls
('to be honest, I didn't create the file yet'), suggesting workarounds
instead of using existing tools (proposing a folder-based memory system
when the memory tool exists), replying with plans instead of executing.
TOOL_USE_ENFORCEMENT_GUIDANCE was already injected for any model whose
name contains 'grok' (TOOL_USE_ENFORCEMENT_MODELS). This extends the
follow-on family-specific block — OPENAI_MODEL_EXECUTION_GUIDANCE
(tool_persistence / mandatory_tool_use / act_dont_ask / prerequisite_checks
/ verification / missing_context) — to grok-named models too.
The OPENAI_ prefix is retained for backwards compat with imports/tests;
docstring + inline comment now note that the body is family-agnostic and
the prefix reflects origin, not exclusivity.
Tests cover the OpenRouter slug (x-ai/grok-4.3) and the xai-oauth bare
name (grok-4.3), plus a negative control on claude.
E2E verified against a real AIAgent build of the system prompt for both
xai-oauth and openrouter grok models.
7 new tests:
TestAuxiliaryFallbackLayering (3):
- configured_chain succeeds → main agent fallback NOT consulted
- chain returns nothing → main agent fallback runs and succeeds
- both exhausted → user-visible 'all fallbacks exhausted' warning
fires before the original error is re-raised
TestTryMainAgentModelFallback (4):
- returns (None, None, "") when main provider is 'auto'
- returns (None, None, "") when failed provider == main provider
(no point retrying the same backend)
- resolves the main provider's client when configured correctly
- skips when main provider is marked unhealthy
The two TestAuxiliaryClientPoisonedCacheEviction tests were written
when explicit-provider users got no fallback at all on connection
errors — they asserted ConnectionError propagated after eviction
because the fallback gate blocked the auto chain.
After the #26803 fix in the previous commit, capacity errors
(payment/quota/connection) now DO trigger fallback even on explicit
providers. The tests still verify cache eviction (their actual
contract) but now stub _try_payment_fallback so the fallback
machinery does not attempt a real network call.
Closes#26803
Root causes:
1. _is_payment_error() checked for billing keywords (credits, insufficient
funds, billing, payment required) but missed daily token quota exhaustion
phrases used by Bedrock, Vertex AI, and LiteLLM proxies — e.g.
'Too many tokens per day', 'quota exceeded', 'resource exhausted',
'daily limit'. These are functionally identical to credit exhaustion
(provider cannot serve the request) but don't trigger fallback.
2. The call_llm() fallback chain was gated on resolved_provider == 'auto'.
When a task resolves to a specific provider (e.g. 'custom' for a LiteLLM
proxy, or 'openrouter'), capacity failures (payment/quota/connection)
silently raise instead of trying alternatives. This is overly conservative:
capacity errors mean the provider *cannot* serve the request regardless of
user intent, so alternatives should always be tried.
Fixes:
- Add quota-related keywords to _is_payment_error(): quota_exceeded,
too many tokens per day, daily limit, tokens per day, daily quota,
resource exhausted (Vertex AI gRPC code).
- Allow fallback for capacity errors (payment + connection) even when
resolved_provider is not 'auto'. Rate-limit fallback stays gated on
is_auto to honour explicit provider constraints for transient limits.
- Apply both fixes to sync call_llm() and async acall_llm() paths.
- Add 6 targeted tests for the new quota-error detection cases.
Quarantine Nous OAuth state when refresh fails with terminal invalid_grant/invalid_token errors. Clear local and shared refresh material across runtime, managed access-token, proxy, and credential-pool paths so Hermes stops retrying revoked refresh sessions.
* feat(kanban): orchestrator-driven auto-decomposition on triage
Closes the core gap in the kanban system: dropping a one-liner into Triage
now decomposes it into a graph of child tasks routed to specialist
profiles by description, matching teknium's original vision ("main
orchestrator splits/creates actual tasks, doles them out to each agent").
The build
---------
- hermes_cli/profiles.py: new `description` + `description_auto` fields
on ProfileInfo, persisted in <profile_dir>/profile.yaml. Helpers
read_profile_meta / write_profile_meta. `create_profile` accepts
optional description.
- hermes_cli/profile_describer.py: new module — auto-generate a 1-2
sentence description from a profile's skills + model + name via the
auxiliary LLM (`auxiliary.profile_describer`).
- hermes_cli/main.py: new `hermes profile create --description ...`
flag; new `hermes profile describe [name] [--text ... | --auto |
--all --auto]` subcommand.
- hermes_cli/kanban_db.py: new `decompose_triage_task` atomic helper —
creates N child tasks, links the root as a child of every leaf
(root waits for the whole graph), flips root `triage -> todo` with
orchestrator assignee, records an audit comment + `decomposed` event
in a single write_txn.
- hermes_cli/kanban_decompose.py: new module — calls the auxiliary LLM
(`auxiliary.kanban_decomposer`) with the profile roster + descriptions
to produce a JSON task graph, then invokes the DB helper. Rewrites
unknown assignees to the configured `kanban.default_assignee` (or
the active default profile) so a task NEVER lands with assignee=None.
Falls back to specify-style single-task promotion when the LLM
returns `fanout: false`.
- hermes_cli/kanban.py: new `hermes kanban decompose [task_id | --all]`
CLI verb.
- hermes_cli/config.py: new DEFAULT_CONFIG keys —
kanban.orchestrator_profile, kanban.default_assignee,
kanban.auto_decompose (default True), kanban.auto_decompose_per_tick
(default 3), auxiliary.kanban_decomposer, auxiliary.profile_describer.
- gateway/run.py: kanban dispatcher watcher now runs auto-decompose
before each `_tick_once`, capped by `auto_decompose_per_tick` so a
bulk-load of triage tasks doesn't burst-spend the aux LLM.
- plugins/kanban/dashboard/plugin_api.py: new endpoints —
GET /profiles (list roster + descriptions),
PATCH /profiles/<name> (set description, user-authored),
POST /profiles/<name>/describe-auto (LLM-generate),
POST /tasks/<id>/decompose (run decomposer),
GET/PUT /orchestration (orchestrator/default-assignee/auto-decompose
pickers, with resolved fallbacks echoed back).
- plugins/kanban/dashboard/dist/index.js: new OrchestrationPanel
collapsible — dropdowns for orchestrator profile and default
assignee, auto-decompose toggle, per-profile description editor with
Save and Auto-generate buttons. New ⚗ Decompose button next to
✨ Specify on triage-column task drawers.
Behavior
--------
- A task in Triage gets fanned out into a small DAG of child tasks.
Children with no internal parents flip to `ready` immediately
(parallel dispatch). Children with sibling parents wait. The root
stays alive as a parent of every child — when the whole graph
finishes, it promotes to `ready` and the orchestrator profile wakes
back up to judge completion (the "adds more tasks until done" part
of the original vision).
- `kanban.orchestrator_profile` unset -> falls back to the default
profile (whichever `hermes` launches with no -p flag).
- `kanban.default_assignee` unset -> same fallback. Tasks NEVER end
up unassigned.
- `kanban.auto_decompose=true` (default) runs the decomposer
automatically on dispatcher ticks; manual `hermes kanban decompose`
is always available.
Tests
-----
- tests/hermes_cli/test_kanban_decompose_db.py — 7 tests for the
atomic DB helper (status transitions, dep graph, audit trail,
validation errors).
- tests/hermes_cli/test_kanban_decompose.py — 6 tests for the
decomposer module (fanout, no-fanout fallback, unknown-assignee
rewrite, malformed-JSON resilience, no-aux-client path).
- tests/hermes_cli/test_profile_describer.py — 10 tests for
profile.yaml r/w + the LLM auto-describer (yaml corrupt tolerance,
user-vs-auto description protection, --overwrite, fallback parsing).
E2E
---
- CLI end-to-end: created profiles with descriptions, dropped a triage
task, mocked the aux LLM with a 3-task graph -> verified all three
children were created with the right assignees, the dependency
edges matched the LLM's graph, root flipped to todo gated by every
child, audit comment + `decomposed` event recorded.
- Dashboard end-to-end: started the dashboard against an isolated
HERMES_HOME, verified all four new endpoints via curl (profile
listing, PATCH for description, PUT for orchestration settings,
POST for decompose). Opened the UI in the browser, confirmed the
OrchestrationPanel renders with all three pickers + the per-profile
description editor, typed a description, clicked Save, verified
~/.hermes/profile.yaml was written. Clicked Decompose on the triage
card and confirmed the inline error message surfaced as designed
("no auxiliary client configured").
* feat(kanban): surface decompose mode (Auto/Manual) as a one-click pill
The auto/manual toggle already existed as kanban.auto_decompose (default
true), but it was buried inside the collapsed Orchestration settings
panel — users couldn't tell at a glance which mode they were in. This
hoists it to a pill at the top of the kanban page so the state is always
visible and one click flips it.
UX
- New "⚗ Decompose: AUTO|MANUAL" pill in the kanban header. Emerald
styling when Auto is on (the default), muted/gray when Manual.
- Pill is visible both in the collapsed AND expanded Orchestration
settings views so context is preserved when the user opens the panel.
- Tooltip explains both states + what clicking does.
- Renamed the in-panel "Auto-decompose on triage / Enabled" checkbox
to "Decompose mode / Auto (default) | Manual" for language parity
with the pill.
Behavior preserved
- Default remains Auto (kanban.auto_decompose=true).
- Manual mode restores pre-PR behavior: triage tasks stay in triage
until the user clicks ⚗ Decompose on each card (or runs
`hermes kanban decompose <id>`).
Implementation
- plugins/kanban/dashboard/dist/index.js: load /orchestration on mount
(not just on expand) so the collapsed pill reflects real state.
Render mode pill in both collapsed and expanded headers. Reuses the
existing PUT /api/plugins/kanban/orchestration endpoint — no new
backend, no new tests required.
E2E verified
- Pill renders as "⚗ Decompose: AUTO" on page load (default).
- One click flips to "⚗ Decompose: MANUAL" with muted styling.
- config.yaml on disk shows auto_decompose: false after the flip.
- Second click round-trips back to Auto; config.yaml flips to true.
* feat(kanban): rename mode pill to "Orchestration: Auto/Manual"
Per Teknium feedback — "Decompose" was too implementation-specific.
"Orchestration" is the user-facing concept (the whole pitch is the
orchestrator profile routing work), and the pill is the front door to it.
- Pill text: "Orchestration: Auto" / "Orchestration: Manual" (title case,
no ⚗ prefix, no SHOUTY-CAPS for the mode value)
- In-panel checkbox label: "Orchestration mode" (was "Decompose mode")
- Tooltips updated to match
- No behavior change
* docs(kanban): document decompose, profile descriptions, orchestration mode
Brings the docs site up to parity with the PR. English build verified
locally (npx docusaurus build --locale en) — clean, no new broken links
or anchors. Pre-existing broken-link warnings (rl-training, llms.txt,
step-by-step-checklist, fallback-model) untouched.
- website/docs/reference/cli-commands.md
+ `hermes kanban decompose` action row in the action table, with
pointer to the Auto vs Manual orchestration section.
- website/docs/reference/profile-commands.md
+ `--description "<text>"` flag on `hermes profile create`.
+ Full `hermes profile describe` section: read, --text, --auto,
--overwrite, --all flags with examples.
- website/docs/user-guide/features/kanban.md (the big one)
+ Triage column intro rewritten around the Auto-decompose default
behavior, with pointer to the new Auto vs Manual section.
+ Status action row updated to mention both ⚗ Decompose and
✨ Specify on triage cards.
+ New "Auto vs Manual orchestration" section explaining the two
modes, how to flip them (pill, config), how routing-by-description
works, the no-None-assignee guarantee, plus a config knob table
(auto_decompose, auto_decompose_per_tick, orchestrator_profile,
default_assignee) and the two new auxiliary slots
(kanban_decomposer, profile_describer).
+ REST surface table gains 6 new endpoint rows: /tasks/:id/decompose,
/profiles (GET), /profiles/:name (PATCH), /profiles/:name/describe-auto,
/orchestration (GET + PUT).
- website/docs/user-guide/features/kanban-tutorial.md
+ Triage column blurb updated for Auto by default + Manual via the
pill, with cross-link to the Auto vs Manual orchestration section.
- website/docs/user-guide/profiles.md
+ Blank-profile flow now mentions --description and points to the
kanban routing model for context.
- website/docs/user-guide/configuration.md
+ `kanban_decomposer` and `profile_describer` added to the
`hermes model -> Configure auxiliary models` menu listing.
xAI's /responses endpoint rejects pattern and format JSON Schema keywords
in tool schemas with HTTP 400 'Invalid arguments passed to the model'.
The existing strip_pattern_and_format() only walked OpenAI-format tools
({'function': {'parameters': ...}}), missing Responses-format shapes
({'name': ..., 'parameters': ...}) used by codex_responses API mode.
This shows up most often with MCP-derived tools that carry validation
keywords (e.g. domain pattern regex in firecrawl, format: date-time)
through to the wire.
Extends the walk to handle both shapes. Auto-strip wiring is applied
separately in chat_completion_helpers (post-refactor location).
Closes#27197
14 focused tests on the extracted helper
``_xai_oauth_exchange_code_for_tokens`` cover:
Core contract:
* ``code_verifier`` is on the wire (RFC 7636 §4.5).
* ``code_challenge`` + ``code_challenge_method=S256`` are echoed
(the #26990 defense-in-depth that makes xAI's token endpoint
stop rejecting valid exchanges).
* ``grant_type=authorization_code``, ``code``, ``redirect_uri``,
and ``client_id`` are all locked.
* Content-Type is ``application/x-www-form-urlencoded`` (xAI
rejects ``application/json`` on this endpoint).
* The supplied ``token_endpoint`` URL is used verbatim — no
hard-coded constant sneaks in via a future refactor.
* ``timeout_seconds`` is forwarded; floored at 20s.
Sanity guard:
* Empty ``code_verifier`` raises ``xai_pkce_verifier_missing``
with a link to #26990 — and NOTHING is sent. Leaking the auth
code to a server that can't redeem it is the wrong failure mode.
* Empty ``code_challenge`` omits only the defensive echo; the
standards-compliant ``code_verifier`` request still goes out so
RFC-compliant servers keep working.
Error surfacing:
* Non-200 responses include both ``HTTP <status>`` and the body
verbatim — disambiguates 400 (PKCE / bad request) from 403
(tier denied, see #26847).
* Transport errors are wrapped as ``AuthError`` with the
``xai_token_exchange_failed`` code, so the surrounding
``format_auth_error`` UI mapping still fires.
* Non-dict JSON payloads raise ``xai_token_exchange_invalid``.
* 200 happy path returns the parsed payload dict verbatim.
End-to-end wire-format guard:
* A real ``httpx.Client`` with a stub transport captures the bytes
on the wire and asserts every PKCE field round-trips through
``urlencode``. Catches a future refactor that swaps
``data=`` for ``json=`` (which xAI would silently reject).
Addresses reviewer feedback: when resolve_runtime_provider returns a dict
without the 'provider' key, the result must be None regardless of
configured_provider. This guards against malformed runtime responses.
Test: test_runtime_missing_provider_key_returns_none