mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
Merge da25c6e163 into 00c3d848d8
This commit is contained in:
commit
75e62a5af4
11 changed files with 2279 additions and 3 deletions
|
|
@ -0,0 +1,47 @@
|
|||
---
|
||||
title: Codex Bridge 异步完成通知 MVP 需求
|
||||
date: 2026-04-25
|
||||
status: accepted
|
||||
scope: lightweight
|
||||
---
|
||||
|
||||
# Codex Bridge 异步完成通知 MVP 需求
|
||||
|
||||
## 背景
|
||||
|
||||
Hermes 通过 `skills/codex-bridge/references/cli.py start` 启动 Codex app-server stdio 任务后,会把任务状态写入本地 `codex_bridge.db`。当前实现不是常驻订阅或完成回调模式:Codex turn 完成后不会主动通知原 Feishu/平台会话,用户必须再次说“继续”后 Hermes 才会手动查询 `status` 或 `list`。
|
||||
|
||||
这造成一个产品异味:异步任务已经启动,但完成后没有人主动查收。
|
||||
|
||||
## 范围决策
|
||||
|
||||
本次做窄范围 MVP:让 Codex Bridge 启动的异步任务在完成后能回到原会话或目标发送完成摘要。不要做多租户调度系统,不重写现有 Codex Bridge 低层协议,不引入 mailbox/outbox/inbox 作为主通信机制。
|
||||
|
||||
## 目标
|
||||
|
||||
- 启动任务时可选记录通知目标,例如 `local`、`feishu:<chat_id>` 或其他 `send_message` 支持的显式平台目标。
|
||||
- 默认不改变现有 API 行为;未传通知目标时仍能正常启动和查询。
|
||||
- 提供 watcher/one-shot poll 入口,发现已完成但未处理通知的任务。
|
||||
- 对有目标的任务读取 final summary,生成简洁完成摘要,并通过可注入 notifier 发送。
|
||||
- 对无目标的完成任务标记为 `no_target`,避免 watcher 重启后重复处理。
|
||||
- 通过持久化 `notification_status` / `notified_at` 防重复通知。
|
||||
|
||||
## 非目标
|
||||
|
||||
- 不实现常驻多租户调度器。
|
||||
- 不实现 pending approval / `requestUserInput` 的实时双向交互。
|
||||
- 不让测试向真实 Feishu、WeChat、Telegram 等外部平台发消息。
|
||||
- 不开放 `danger-full-access` 默认权限。
|
||||
- 不用 mailbox/outbox/inbox 作为通信机制。
|
||||
|
||||
## 验收标准
|
||||
|
||||
- `codex_bridge(action="start", notify_target=...)` 能把目标写入任务状态。
|
||||
- watcher/notify 入口只通知 terminal 状态任务一次;重启或重复运行不会重复发送。
|
||||
- terminal 任务没有 target 时会被标记为 `no_target`,不会调用 notifier。
|
||||
- CLI 暴露 `--notify-target` 和 one-shot `notify-completed` 入口,并支持 dry-run。
|
||||
- 测试通过 mock/inject notifier 覆盖通知行为。
|
||||
|
||||
## 后续扩展说明
|
||||
|
||||
pending approval 和 `requestUserInput` 后续可复用同一通知目标字段:当任务进入 `waiting_for_approval` 或 `waiting_for_user_input` 时,watcher 可以发送带 request id 的交互提示;平台侧回复再映射到 `codex_bridge respond`。本次先只处理 terminal completion,避免把交互式审批设计混入 MVP。
|
||||
|
|
@ -0,0 +1,82 @@
|
|||
---
|
||||
title: Codex Bridge 异步完成通知 MVP 实现计划
|
||||
date: 2026-04-25
|
||||
status: active
|
||||
origin: docs/brainstorms/2026-04-25-codex-bridge-completion-notification-requirements.md
|
||||
---
|
||||
|
||||
# Codex Bridge 异步完成通知 MVP 实现计划
|
||||
|
||||
## 问题框架
|
||||
|
||||
Codex Bridge 已能通过 app-server stdio 启动异步 Codex turn,并把状态写入 `codex_bridge.db`。缺口在完成后的主动送达:当前没有通知目标、通知状态,也没有 watcher 入口来把 terminal 任务的摘要回发给原会话。
|
||||
|
||||
## 技术决策
|
||||
|
||||
- 在 `codex_bridge_tasks` 上新增通知元数据:`notify_target`、`notification_status`、`notified_at`、`notification_error`。
|
||||
- `start` 接受可选 `notify_target`,不传时保持旧行为。
|
||||
- 新增 one-shot `notify_completed` action:扫描 terminal 且尚未处理通知的任务,按目标发送或标记 `no_target`。
|
||||
- 默认 notifier 复用现有 `send_message` 工具;测试和 CLI dry-run 通过注入或 dry-run 避免真实外发。
|
||||
- `local` 目标作为本地消费目标:记录为已通知并返回摘要,不调用外部平台。
|
||||
|
||||
## 实现单元
|
||||
|
||||
### U1: 持久化通知目标与状态
|
||||
|
||||
修改文件:
|
||||
- `tools/codex_bridge_tool.py`
|
||||
- `tests/tools/test_codex_bridge_tool.py`
|
||||
|
||||
做法:
|
||||
- 数据库初始化时对旧库执行兼容迁移。
|
||||
- `CodexBridgeTask.snapshot()`、`list_tasks()`、`get_task_snapshot()` 暴露通知字段。
|
||||
- `start_task()` 接受 `notify_target` 并保存。
|
||||
|
||||
测试场景:
|
||||
- 启动任务时传入 `notify_target`,状态快照和持久化查询都能看到该值。
|
||||
|
||||
### U2: 完成通知 one-shot watcher
|
||||
|
||||
修改文件:
|
||||
- `tools/codex_bridge_tool.py`
|
||||
- `tests/tools/test_codex_bridge_tool.py`
|
||||
|
||||
做法:
|
||||
- 增加扫描 terminal 任务的方法。
|
||||
- 对无 target 的任务标记 `no_target`,不调用 notifier。
|
||||
- 对有 target 的任务构造简洁摘要,调用 notifier 后标记 `sent` 和 `notified_at`。
|
||||
- 已 `sent` 或 `no_target` 的任务不再重复处理。
|
||||
- 支持 `dry_run`,只返回会处理的任务,不写通知状态,不发送。
|
||||
|
||||
测试场景:
|
||||
- completed 任务只通知一次。
|
||||
- 无 target completed 任务不发送,并标记 `no_target`。
|
||||
- dry-run 不发送且不改变通知状态。
|
||||
|
||||
### U3: 工具 schema 与 CLI 入口
|
||||
|
||||
修改文件:
|
||||
- `tools/codex_bridge_tool.py`
|
||||
- `skills/codex-bridge/references/cli.py`
|
||||
- `skills/codex-bridge/references/validator.py`
|
||||
- `tests/skills/test_codex_bridge_skill.py`
|
||||
|
||||
做法:
|
||||
- schema 加入 `notify_completed` action、`notify_target`、`dry_run`。
|
||||
- CLI `start`/`smoke-test` 增加 `--notify-target`。
|
||||
- CLI 增加 `notify-completed` one-shot 命令。
|
||||
- validator 校验 notify 输出的基本结构。
|
||||
|
||||
测试场景:
|
||||
- CLI start 能把 `--notify-target` 传给工具。
|
||||
- CLI notify-completed dry-run 调用 bridge 且不依赖真实平台。
|
||||
|
||||
## 验证
|
||||
|
||||
- `python -m py_compile tools/codex_bridge_tool.py skills/codex-bridge/references/cli.py skills/codex-bridge/references/validator.py`
|
||||
- `scripts/run_tests.sh tests/tools/test_codex_bridge_tool.py tests/skills/test_codex_bridge_skill.py`
|
||||
|
||||
## 风险
|
||||
|
||||
- 默认 notifier 依赖 `send_message` 的运行环境;没有 gateway 或目标不可达时会记录 `notification_error` 并保留可重试状态。
|
||||
- 当前只处理 terminal completion,不处理实时 approval/input;后续应在同一 target 模型上扩展。
|
||||
|
|
@ -0,0 +1,85 @@
|
|||
---
|
||||
title: Codex Bridge 异步任务需要持久化完成通知状态
|
||||
date: 2026-04-25
|
||||
category: docs/solutions/developer-experience/
|
||||
module: Codex Bridge
|
||||
problem_type: developer_experience
|
||||
component: assistant
|
||||
severity: medium
|
||||
applies_when:
|
||||
- 异步 agent 任务由本地 bridge 启动,但完成结果需要回到原平台会话
|
||||
- 任务状态已经持久化,但缺少完成后主动送达能力
|
||||
- 测试不能向真实外部平台发送消息
|
||||
tags: [codex-bridge, async-notification, app-server, send-message, watcher]
|
||||
---
|
||||
|
||||
# Codex Bridge 异步任务需要持久化完成通知状态
|
||||
|
||||
## Context
|
||||
|
||||
Codex Bridge 已经通过 app-server stdio 启动 Codex 任务,并把状态写入 `codex_bridge.db`。dogfood 暴露出的体验问题是:异步任务完成后没有主动通知原 Feishu/平台会话,用户必须再次触发 Hermes 查询 `status` 或 `list` 才能知道结果。
|
||||
|
||||
这类问题不需要先做多租户调度系统。MVP 的关键是让任务在启动时可选记录通知目标,并让一个 one-shot watcher 可以可靠地处理 terminal 任务。
|
||||
|
||||
## Guidance
|
||||
|
||||
在已有任务表上补齐三个概念,而不是重写底层通信协议:
|
||||
|
||||
- `notify_target`:启动时可选记录目标,例如 `local` 或 `feishu:<chat_id>`。
|
||||
- `notification_status`:记录通知生命周期,例如 `pending`、`sent`、`failed`、`no_target`。
|
||||
- `notified_at` / `notification_error`:让 watcher 重启后能防重复,并保留失败原因。
|
||||
|
||||
watcher 应该只扫描 terminal 状态任务,并做幂等处理:
|
||||
|
||||
- 有目标:构造简洁完成摘要,调用可注入 notifier,成功后标记 `sent`。
|
||||
- 无目标:标记 `no_target`,不发送,避免每次扫描重复捞到同一任务。
|
||||
- dry-run:返回预览,不发送,也不写通知状态。
|
||||
|
||||
默认 notifier 可以复用现有 `send_message` 能力,但核心 manager 方法要允许注入 notifier。这样单元测试可以用 fake notifier 验证行为,避免真实平台副作用。
|
||||
|
||||
## Why This Matters
|
||||
|
||||
异步 bridge 的产品承诺不是“能启动后台任务”,而是“任务结束后用户能在原上下文看到结果”。如果只有状态表但没有通知状态,系统会卡在“完成但无人查收”的灰区;如果没有持久化防重复,watcher 或 daemon 重启又可能重复推送。
|
||||
|
||||
把通知状态做成任务元数据,可以在不引入 mailbox/outbox/inbox 通信机制的情况下满足 MVP,并为后续实时 approval / `requestUserInput` 扩展留下同一套 target 语义。
|
||||
|
||||
## When to Apply
|
||||
|
||||
- 异步任务生命周期已经持久化,但完成后需要跨平台送达。
|
||||
- 现有平台发送能力已经存在,新增功能只需要选择目标和调用发送。
|
||||
- 需要保证测试环境不触发真实外部消息。
|
||||
- 需要 watcher/daemon 重启后不重复通知。
|
||||
|
||||
## Examples
|
||||
|
||||
启动时记录目标:
|
||||
|
||||
```python
|
||||
codex_bridge(
|
||||
action="start",
|
||||
prompt="Investigate the failing tests",
|
||||
notify_target="feishu:chat-1",
|
||||
)
|
||||
```
|
||||
|
||||
one-shot watcher dry-run:
|
||||
|
||||
```bash
|
||||
python skills/codex-bridge/references/cli.py notify-completed --dry-run
|
||||
```
|
||||
|
||||
测试中注入 notifier:
|
||||
|
||||
```python
|
||||
deliveries = []
|
||||
manager.notify_completed(
|
||||
notifier=lambda target, message: deliveries.append((target, message)) or {"ok": True}
|
||||
)
|
||||
```
|
||||
|
||||
## Related
|
||||
|
||||
- `docs/brainstorms/2026-04-25-codex-bridge-completion-notification-requirements.md`
|
||||
- `docs/plans/2026-04-25-codex-bridge-completion-notification-plan.md`
|
||||
- `tools/codex_bridge_tool.py`
|
||||
- `skills/codex-bridge/references/cli.py`
|
||||
59
skills/codex-bridge/SKILL.md
Normal file
59
skills/codex-bridge/SKILL.md
Normal file
|
|
@ -0,0 +1,59 @@
|
|||
---
|
||||
name: codex-bridge
|
||||
description: Start and control local Codex tasks through Hermes Codex Bridge app-server integration.
|
||||
version: 1.0.0
|
||||
platforms: [linux, macos]
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [codex, agent, bridge, app-server]
|
||||
category: software-development
|
||||
---
|
||||
|
||||
# Codex Bridge
|
||||
|
||||
Use this skill when you need Hermes to start or steer a local Codex task through the Codex app-server protocol.
|
||||
|
||||
## CLI
|
||||
|
||||
Run the reference CLI from the repository root:
|
||||
|
||||
```bash
|
||||
python skills/codex-bridge/references/cli.py start --prompt "Inspect this repository and summarize the test layout."
|
||||
python skills/codex-bridge/references/cli.py status <task_id>
|
||||
python skills/codex-bridge/references/cli.py list
|
||||
python skills/codex-bridge/references/cli.py steer <task_id> --instruction "Focus only on tests."
|
||||
python skills/codex-bridge/references/cli.py interrupt <task_id>
|
||||
python skills/codex-bridge/references/cli.py respond <task_id> --request-id <request_id> --decision decline
|
||||
python skills/codex-bridge/references/cli.py smoke-test --wait 10 --timeout 60
|
||||
```
|
||||
|
||||
The CLI is a productized wrapper around `tools.codex_bridge_tool.codex_bridge`.
|
||||
It does not implement the app-server protocol itself and does not use mailbox,
|
||||
inbox, or outbox files.
|
||||
|
||||
## Safety Defaults
|
||||
|
||||
- Sandbox is limited to `read-only` or `workspace-write`.
|
||||
- `danger-full-access` is rejected.
|
||||
- Approval policy is limited to `untrusted` or `on-request`.
|
||||
- `approval_policy=never` is rejected.
|
||||
- `start` requires a non-empty prompt and an existing `cwd`.
|
||||
|
||||
## Output
|
||||
|
||||
Commands print JSON to stdout. Validation errors return:
|
||||
|
||||
```json
|
||||
{"success": false, "error": "..."}
|
||||
```
|
||||
|
||||
Successful `start` output is validated to ensure:
|
||||
|
||||
- `success` is `true`
|
||||
- `protocol.mailbox` is `false`
|
||||
- `protocol.transport` includes `app-server`
|
||||
- task id, Codex thread id, and Codex turn id are present
|
||||
|
||||
The smoke test starts an async Codex task, polls `status`, and succeeds only
|
||||
when the final task status is `completed` and `CODEX_ASYNC_OK` appears in
|
||||
`recent_events` or `final_summary`.
|
||||
1
skills/codex-bridge/references/__init__.py
Normal file
1
skills/codex-bridge/references/__init__.py
Normal file
|
|
@ -0,0 +1 @@
|
|||
"""Codex Bridge skill reference utilities."""
|
||||
252
skills/codex-bridge/references/cli.py
Normal file
252
skills/codex-bridge/references/cli.py
Normal file
|
|
@ -0,0 +1,252 @@
|
|||
#!/usr/bin/env python3
|
||||
"""Productized CLI for Hermes Codex Bridge."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
|
||||
REPO_ROOT = Path(__file__).resolve().parents[3]
|
||||
if str(REPO_ROOT) not in sys.path:
|
||||
sys.path.insert(0, str(REPO_ROOT))
|
||||
|
||||
try:
|
||||
from .validator import (
|
||||
SMOKE_SENTINEL,
|
||||
TERMINAL_STATUSES,
|
||||
ValidationError,
|
||||
parse_json_object,
|
||||
validate_approval_policy,
|
||||
validate_bridge_output,
|
||||
validate_interrupt_input,
|
||||
validate_respond_input,
|
||||
validate_sandbox,
|
||||
validate_smoke_test_result,
|
||||
validate_start_input,
|
||||
validate_status_input,
|
||||
validate_steer_input,
|
||||
validate_notify_completed_output,
|
||||
validate_notify_target,
|
||||
)
|
||||
except ImportError:
|
||||
from validator import ( # type: ignore
|
||||
SMOKE_SENTINEL,
|
||||
TERMINAL_STATUSES,
|
||||
ValidationError,
|
||||
parse_json_object,
|
||||
validate_approval_policy,
|
||||
validate_bridge_output,
|
||||
validate_interrupt_input,
|
||||
validate_respond_input,
|
||||
validate_sandbox,
|
||||
validate_smoke_test_result,
|
||||
validate_start_input,
|
||||
validate_status_input,
|
||||
validate_steer_input,
|
||||
validate_notify_completed_output,
|
||||
validate_notify_target,
|
||||
)
|
||||
|
||||
from tools.codex_bridge_tool import DEFAULT_APPROVAL_POLICY, DEFAULT_SANDBOX, codex_bridge
|
||||
|
||||
|
||||
def emit(data: dict[str, Any]) -> None:
|
||||
print(json.dumps(data, ensure_ascii=False, sort_keys=True))
|
||||
|
||||
|
||||
def call_bridge(action: str, **kwargs: Any) -> dict[str, Any]:
|
||||
raw = codex_bridge(action=action, **kwargs)
|
||||
try:
|
||||
data = json.loads(raw)
|
||||
except json.JSONDecodeError as exc:
|
||||
raise ValidationError(f"codex_bridge returned invalid JSON for {action}: {exc.msg}") from exc
|
||||
validate_bridge_output(action, data)
|
||||
return data
|
||||
|
||||
|
||||
def _prompt_from_args(args: argparse.Namespace) -> str:
|
||||
prompt = args.prompt
|
||||
if prompt is None and args.prompt_text:
|
||||
prompt = " ".join(args.prompt_text)
|
||||
return prompt or ""
|
||||
|
||||
|
||||
def cmd_start(args: argparse.Namespace) -> dict[str, Any]:
|
||||
prompt = _prompt_from_args(args)
|
||||
validate_start_input(prompt, args.cwd, args.sandbox, args.approval_policy)
|
||||
notify_target = validate_notify_target(args.notify_target)
|
||||
return call_bridge(
|
||||
"start",
|
||||
prompt=prompt,
|
||||
cwd=args.cwd,
|
||||
model=args.model,
|
||||
sandbox=args.sandbox,
|
||||
approval_policy=args.approval_policy,
|
||||
codex_home=args.codex_home,
|
||||
notify_target=notify_target,
|
||||
)
|
||||
|
||||
|
||||
def cmd_status(args: argparse.Namespace) -> dict[str, Any]:
|
||||
validate_status_input(args.task_id)
|
||||
return call_bridge("status", task_id=args.task_id)
|
||||
|
||||
|
||||
def cmd_list(args: argparse.Namespace) -> dict[str, Any]:
|
||||
return call_bridge("list", limit=args.limit)
|
||||
|
||||
|
||||
def cmd_notify_completed(args: argparse.Namespace) -> dict[str, Any]:
|
||||
data = call_bridge("notify_completed", limit=args.limit, dry_run=args.dry_run)
|
||||
validate_notify_completed_output(data)
|
||||
return data
|
||||
|
||||
|
||||
def cmd_steer(args: argparse.Namespace) -> dict[str, Any]:
|
||||
validate_steer_input(args.task_id, args.instruction)
|
||||
return call_bridge("steer", task_id=args.task_id, instruction=args.instruction)
|
||||
|
||||
|
||||
def cmd_interrupt(args: argparse.Namespace) -> dict[str, Any]:
|
||||
validate_interrupt_input(args.task_id)
|
||||
return call_bridge("interrupt", task_id=args.task_id)
|
||||
|
||||
|
||||
def cmd_respond(args: argparse.Namespace) -> dict[str, Any]:
|
||||
answers = parse_json_object(args.answers, field_name="answers")
|
||||
validate_respond_input(args.task_id, args.request_id, args.decision, answers)
|
||||
return call_bridge(
|
||||
"respond",
|
||||
task_id=args.task_id,
|
||||
instruction=args.request_id,
|
||||
decision=args.decision,
|
||||
answers=answers,
|
||||
)
|
||||
|
||||
|
||||
def _smoke_prompt(wait_seconds: int) -> str:
|
||||
return (
|
||||
f"Wait {wait_seconds} seconds asynchronously, then reply exactly {SMOKE_SENTINEL}. "
|
||||
"Do not modify files."
|
||||
)
|
||||
|
||||
|
||||
def cmd_smoke_test(args: argparse.Namespace) -> dict[str, Any]:
|
||||
validate_start_input(_smoke_prompt(args.wait), args.cwd, args.sandbox, args.approval_policy)
|
||||
notify_target = validate_notify_target(args.notify_target)
|
||||
started = call_bridge(
|
||||
"start",
|
||||
prompt=_smoke_prompt(args.wait),
|
||||
cwd=args.cwd,
|
||||
model=args.model,
|
||||
sandbox=args.sandbox,
|
||||
approval_policy=args.approval_policy,
|
||||
codex_home=args.codex_home,
|
||||
notify_target=notify_target,
|
||||
)
|
||||
task_id = started["task"]["hermes_task_id"]
|
||||
deadline = time.monotonic() + args.timeout
|
||||
last_status: dict[str, Any] | None = None
|
||||
while time.monotonic() < deadline:
|
||||
time.sleep(args.poll_interval)
|
||||
last_status = call_bridge("status", task_id=task_id)
|
||||
task = last_status.get("task") or {}
|
||||
if task.get("status") in TERMINAL_STATUSES:
|
||||
validate_smoke_test_result(last_status)
|
||||
return {
|
||||
"success": True,
|
||||
"task_id": task_id,
|
||||
"status": task.get("status"),
|
||||
"start": started,
|
||||
"final_status": last_status,
|
||||
}
|
||||
return {
|
||||
"success": False,
|
||||
"error": f"smoke-test timed out after {args.timeout} seconds.",
|
||||
"task_id": task_id,
|
||||
"start": started,
|
||||
"last_status": last_status,
|
||||
}
|
||||
|
||||
|
||||
def add_common_start_options(parser: argparse.ArgumentParser) -> None:
|
||||
parser.add_argument("--cwd", default=str(Path.cwd()), help="Working directory for Codex.")
|
||||
parser.add_argument("--model", default=None, help="Optional Codex model override.")
|
||||
parser.add_argument("--sandbox", default=DEFAULT_SANDBOX, type=validate_sandbox)
|
||||
parser.add_argument("--approval-policy", default=DEFAULT_APPROVAL_POLICY, type=validate_approval_policy)
|
||||
parser.add_argument("--codex-home", default=None, help="Optional CODEX_HOME override.")
|
||||
parser.add_argument(
|
||||
"--notify-target",
|
||||
default=None,
|
||||
help="Optional completion notification target, e.g. local or feishu:<chat_id>.",
|
||||
)
|
||||
|
||||
|
||||
def build_parser() -> argparse.ArgumentParser:
|
||||
parser = argparse.ArgumentParser(description="Hermes Codex Bridge skill CLI")
|
||||
subparsers = parser.add_subparsers(dest="command", required=True)
|
||||
|
||||
start = subparsers.add_parser("start", help="Start a Codex task.")
|
||||
start.add_argument("--prompt", help="Task prompt.")
|
||||
start.add_argument("prompt_text", nargs="*", help="Task prompt as positional text.")
|
||||
add_common_start_options(start)
|
||||
start.set_defaults(func=cmd_start)
|
||||
|
||||
status = subparsers.add_parser("status", help="Show task status.")
|
||||
status.add_argument("task_id")
|
||||
status.set_defaults(func=cmd_status)
|
||||
|
||||
list_parser = subparsers.add_parser("list", help="List recent Codex Bridge tasks.")
|
||||
list_parser.add_argument("--limit", type=int, default=10)
|
||||
list_parser.set_defaults(func=cmd_list)
|
||||
|
||||
notify = subparsers.add_parser("notify-completed", help="One-shot poll and notify completed tasks.")
|
||||
notify.add_argument("--limit", type=int, default=10)
|
||||
notify.add_argument("--dry-run", action="store_true", help="Preview notifications without sending or marking.")
|
||||
notify.set_defaults(func=cmd_notify_completed)
|
||||
|
||||
steer = subparsers.add_parser("steer", help="Steer an active Codex turn.")
|
||||
steer.add_argument("task_id")
|
||||
steer.add_argument("--instruction", required=True)
|
||||
steer.set_defaults(func=cmd_steer)
|
||||
|
||||
interrupt = subparsers.add_parser("interrupt", help="Interrupt an active Codex turn.")
|
||||
interrupt.add_argument("task_id")
|
||||
interrupt.set_defaults(func=cmd_interrupt)
|
||||
|
||||
respond = subparsers.add_parser("respond", help="Respond to a pending Codex request.")
|
||||
respond.add_argument("task_id")
|
||||
respond.add_argument("--request-id", required=True)
|
||||
respond.add_argument("--decision", default="decline")
|
||||
respond.add_argument("--answers", default=None, help="JSON object for user-input answers.")
|
||||
respond.set_defaults(func=cmd_respond)
|
||||
|
||||
smoke = subparsers.add_parser("smoke-test", help="Run an async Codex Bridge smoke test.")
|
||||
smoke.add_argument("--wait", type=int, default=10)
|
||||
smoke.add_argument("--timeout", type=int, default=60)
|
||||
smoke.add_argument("--poll-interval", type=float, default=2.0)
|
||||
add_common_start_options(smoke)
|
||||
smoke.set_defaults(func=cmd_smoke_test)
|
||||
|
||||
return parser
|
||||
|
||||
|
||||
def main(argv: list[str] | None = None) -> int:
|
||||
parser = build_parser()
|
||||
try:
|
||||
args = parser.parse_args(argv)
|
||||
result = args.func(args)
|
||||
emit(result)
|
||||
return 0 if result.get("success") is True else 1
|
||||
except ValidationError as exc:
|
||||
emit({"success": False, "error": str(exc)})
|
||||
return 2
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
182
skills/codex-bridge/references/validator.py
Normal file
182
skills/codex-bridge/references/validator.py
Normal file
|
|
@ -0,0 +1,182 @@
|
|||
"""Validation helpers for the Codex Bridge skill CLI."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Any, Mapping
|
||||
|
||||
|
||||
ALLOWED_SANDBOXES = {"read-only", "workspace-write"}
|
||||
ALLOWED_APPROVAL_POLICIES = {"untrusted", "on-request"}
|
||||
ALLOWED_DECISIONS = {"accept", "acceptForSession", "decline", "cancel"}
|
||||
TERMINAL_STATUSES = {"completed", "failed", "cancelled"}
|
||||
NOTIFICATION_STATUSES = {"sent", "failed", "no_target", "dry_run", "pending"}
|
||||
SMOKE_SENTINEL = "CODEX_ASYNC_OK"
|
||||
|
||||
|
||||
class ValidationError(ValueError):
|
||||
"""Raised when a CLI input or bridge output fails validation."""
|
||||
|
||||
|
||||
def parse_json_object(value: str | None, *, field_name: str) -> dict[str, Any]:
|
||||
if not value:
|
||||
return {}
|
||||
try:
|
||||
parsed = json.loads(value)
|
||||
except json.JSONDecodeError as exc:
|
||||
raise ValidationError(f"{field_name} must be valid JSON: {exc.msg}") from exc
|
||||
if not isinstance(parsed, dict):
|
||||
raise ValidationError(f"{field_name} must be a JSON object.")
|
||||
return parsed
|
||||
|
||||
|
||||
def validate_sandbox(sandbox: str) -> str:
|
||||
if sandbox == "danger-full-access":
|
||||
raise ValidationError("danger-full-access is not allowed for Codex Bridge.")
|
||||
if sandbox not in ALLOWED_SANDBOXES:
|
||||
allowed = ", ".join(sorted(ALLOWED_SANDBOXES))
|
||||
raise ValidationError(f"sandbox must be one of: {allowed}.")
|
||||
return sandbox
|
||||
|
||||
|
||||
def validate_approval_policy(approval_policy: str) -> str:
|
||||
if approval_policy not in ALLOWED_APPROVAL_POLICIES:
|
||||
allowed = ", ".join(sorted(ALLOWED_APPROVAL_POLICIES))
|
||||
raise ValidationError(f"approval_policy must be one of: {allowed}.")
|
||||
return approval_policy
|
||||
|
||||
|
||||
def validate_start_input(prompt: str, cwd: str, sandbox: str, approval_policy: str) -> None:
|
||||
if not prompt or not prompt.strip():
|
||||
raise ValidationError("start prompt must be non-empty.")
|
||||
cwd_path = Path(cwd).expanduser()
|
||||
if not cwd_path.exists() or not cwd_path.is_dir():
|
||||
raise ValidationError(f"cwd must be an existing directory: {cwd}")
|
||||
validate_sandbox(sandbox)
|
||||
validate_approval_policy(approval_policy)
|
||||
|
||||
|
||||
def validate_notify_target(target: str | None) -> str | None:
|
||||
if target is None:
|
||||
return None
|
||||
normalized = target.strip()
|
||||
if not normalized:
|
||||
raise ValidationError("notify_target must be non-empty when provided.")
|
||||
return normalized
|
||||
|
||||
|
||||
def validate_task_id(action: str, task_id: str | None) -> None:
|
||||
if not task_id or not str(task_id).strip():
|
||||
raise ValidationError(f"{action} requires task_id.")
|
||||
|
||||
|
||||
def validate_steer_input(task_id: str | None, instruction: str | None) -> None:
|
||||
validate_task_id("steer", task_id)
|
||||
if not instruction or not instruction.strip():
|
||||
raise ValidationError("steer requires instruction.")
|
||||
|
||||
|
||||
def validate_interrupt_input(task_id: str | None) -> None:
|
||||
validate_task_id("interrupt", task_id)
|
||||
|
||||
|
||||
def validate_status_input(task_id: str | None) -> None:
|
||||
validate_task_id("status", task_id)
|
||||
|
||||
|
||||
def validate_respond_input(
|
||||
task_id: str | None,
|
||||
request_id: str | None,
|
||||
decision: str,
|
||||
answers: Mapping[str, Any] | None,
|
||||
) -> None:
|
||||
validate_task_id("respond", task_id)
|
||||
if not request_id or not str(request_id).strip():
|
||||
raise ValidationError("respond requires request_id.")
|
||||
if decision not in ALLOWED_DECISIONS:
|
||||
allowed = ", ".join(sorted(ALLOWED_DECISIONS))
|
||||
raise ValidationError(f"decision must be one of: {allowed}.")
|
||||
if answers is not None and not isinstance(answers, Mapping):
|
||||
raise ValidationError("answers must be a JSON object.")
|
||||
|
||||
|
||||
def validate_start_output(data: Mapping[str, Any]) -> None:
|
||||
if data.get("success") is not True:
|
||||
raise ValidationError("start output must have success=true.")
|
||||
protocol = data.get("protocol")
|
||||
if not isinstance(protocol, Mapping):
|
||||
raise ValidationError("start output must include protocol.")
|
||||
if protocol.get("mailbox") is not False:
|
||||
raise ValidationError("start output must have protocol.mailbox=false.")
|
||||
transport = str(protocol.get("transport") or "")
|
||||
if "app-server" not in transport:
|
||||
raise ValidationError("start output protocol.transport must include app-server.")
|
||||
task = data.get("task")
|
||||
if not isinstance(task, Mapping):
|
||||
raise ValidationError("start output must include task.")
|
||||
required = {
|
||||
"hermes_task_id": "task id",
|
||||
"codex_thread_id": "thread id",
|
||||
"codex_turn_id": "turn id",
|
||||
}
|
||||
for key, label in required.items():
|
||||
if not task.get(key):
|
||||
raise ValidationError(f"start output missing {label}.")
|
||||
|
||||
|
||||
def validate_bridge_output(action: str, data: Mapping[str, Any]) -> None:
|
||||
if not isinstance(data, Mapping):
|
||||
raise ValidationError("bridge output must be a JSON object.")
|
||||
if data.get("success") is not True and data.get("error"):
|
||||
raise ValidationError(str(data["error"]))
|
||||
if action == "start":
|
||||
validate_start_output(data)
|
||||
return
|
||||
if action == "notify_completed":
|
||||
validate_notify_completed_output(data)
|
||||
return
|
||||
if "success" in data and data.get("success") is not True:
|
||||
raise ValidationError(str(data.get("error") or f"{action} failed."))
|
||||
|
||||
|
||||
def validate_notify_completed_output(data: Mapping[str, Any]) -> None:
|
||||
if data.get("success") is not True:
|
||||
raise ValidationError("notify_completed output must have success=true.")
|
||||
notifications = data.get("notifications")
|
||||
if not isinstance(notifications, list):
|
||||
raise ValidationError("notify_completed output must include notifications list.")
|
||||
for item in notifications:
|
||||
if not isinstance(item, Mapping):
|
||||
raise ValidationError("notify_completed notifications must be objects.")
|
||||
if not item.get("task_id"):
|
||||
raise ValidationError("notify_completed notification missing task_id.")
|
||||
status = item.get("notification_status")
|
||||
if status not in NOTIFICATION_STATUSES:
|
||||
allowed = ", ".join(sorted(NOTIFICATION_STATUSES))
|
||||
raise ValidationError(f"notification_status must be one of: {allowed}.")
|
||||
|
||||
|
||||
def contains_text(value: Any, needle: str) -> bool:
|
||||
if isinstance(value, str):
|
||||
return needle in value
|
||||
if isinstance(value, Mapping):
|
||||
return any(contains_text(v, needle) for v in value.values())
|
||||
if isinstance(value, list):
|
||||
return any(contains_text(v, needle) for v in value)
|
||||
return False
|
||||
|
||||
|
||||
def validate_smoke_test_result(status_data: Mapping[str, Any]) -> None:
|
||||
task = status_data.get("task")
|
||||
if not isinstance(task, Mapping):
|
||||
raise ValidationError("smoke-test status output must include task.")
|
||||
status = task.get("status")
|
||||
if status != "completed":
|
||||
raise ValidationError(f"smoke-test final status must be completed, got {status!r}.")
|
||||
searchable = {
|
||||
"recent_events": task.get("recent_events", []),
|
||||
"final_summary": task.get("final_summary"),
|
||||
}
|
||||
if not contains_text(searchable, SMOKE_SENTINEL):
|
||||
raise ValidationError(f"smoke-test output did not include {SMOKE_SENTINEL}.")
|
||||
284
tests/skills/test_codex_bridge_skill.py
Normal file
284
tests/skills/test_codex_bridge_skill.py
Normal file
|
|
@ -0,0 +1,284 @@
|
|||
import importlib.util
|
||||
import json
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
SKILL_REFS = Path(__file__).resolve().parents[2] / "skills" / "codex-bridge" / "references"
|
||||
|
||||
|
||||
def load_reference_module(name):
|
||||
module_path = SKILL_REFS / f"{name}.py"
|
||||
sys.path.insert(0, str(SKILL_REFS))
|
||||
try:
|
||||
spec = importlib.util.spec_from_file_location(f"codex_bridge_skill_{name}", module_path)
|
||||
module = importlib.util.module_from_spec(spec)
|
||||
assert spec and spec.loader
|
||||
spec.loader.exec_module(module)
|
||||
return module
|
||||
finally:
|
||||
try:
|
||||
sys.path.remove(str(SKILL_REFS))
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
|
||||
def test_validator_rejects_unsafe_start_inputs(tmp_path):
|
||||
validator = load_reference_module("validator")
|
||||
|
||||
for sandbox in ["danger-full-access", "network-only"]:
|
||||
try:
|
||||
validator.validate_start_input("hello", str(tmp_path), sandbox, "untrusted")
|
||||
except validator.ValidationError as exc:
|
||||
assert "sandbox" in str(exc) or "danger-full-access" in str(exc)
|
||||
else:
|
||||
raise AssertionError(f"expected {sandbox} to be rejected")
|
||||
|
||||
try:
|
||||
validator.validate_start_input("hello", str(tmp_path), "read-only", "never")
|
||||
except validator.ValidationError as exc:
|
||||
assert "approval_policy" in str(exc)
|
||||
else:
|
||||
raise AssertionError("expected approval_policy=never to be rejected")
|
||||
|
||||
try:
|
||||
validator.validate_start_input("", str(tmp_path), "read-only", "untrusted")
|
||||
except validator.ValidationError as exc:
|
||||
assert "prompt" in str(exc)
|
||||
else:
|
||||
raise AssertionError("expected empty prompt to be rejected")
|
||||
|
||||
try:
|
||||
validator.validate_start_input("hello", str(tmp_path / "missing"), "read-only", "untrusted")
|
||||
except validator.ValidationError as exc:
|
||||
assert "cwd" in str(exc)
|
||||
else:
|
||||
raise AssertionError("expected missing cwd to be rejected")
|
||||
|
||||
|
||||
def test_validator_requires_safe_start_output_contract():
|
||||
validator = load_reference_module("validator")
|
||||
|
||||
valid = {
|
||||
"success": True,
|
||||
"protocol": {"mailbox": False, "transport": "app-server stdio"},
|
||||
"task": {
|
||||
"hermes_task_id": "codex-1",
|
||||
"codex_thread_id": "thread-1",
|
||||
"codex_turn_id": "turn-1",
|
||||
},
|
||||
}
|
||||
validator.validate_start_output(valid)
|
||||
|
||||
invalid = dict(valid)
|
||||
invalid["protocol"] = {"mailbox": True, "transport": "app-server stdio"}
|
||||
try:
|
||||
validator.validate_start_output(invalid)
|
||||
except validator.ValidationError as exc:
|
||||
assert "mailbox" in str(exc)
|
||||
else:
|
||||
raise AssertionError("expected mailbox output to be rejected")
|
||||
|
||||
invalid = dict(valid)
|
||||
invalid["protocol"] = {"mailbox": False, "transport": "mailbox"}
|
||||
try:
|
||||
validator.validate_start_output(invalid)
|
||||
except validator.ValidationError as exc:
|
||||
assert "app-server" in str(exc)
|
||||
else:
|
||||
raise AssertionError("expected non app-server transport to be rejected")
|
||||
|
||||
|
||||
def test_cli_start_validates_and_emits_bridge_json(tmp_path, monkeypatch, capsys):
|
||||
cli = load_reference_module("cli")
|
||||
calls = []
|
||||
|
||||
def fake_codex_bridge(**kwargs):
|
||||
calls.append(kwargs)
|
||||
return json.dumps(
|
||||
{
|
||||
"success": True,
|
||||
"protocol": {"mailbox": False, "transport": "app-server stdio"},
|
||||
"task": {
|
||||
"hermes_task_id": "codex-abc",
|
||||
"codex_thread_id": "thread-abc",
|
||||
"codex_turn_id": "turn-abc",
|
||||
},
|
||||
}
|
||||
)
|
||||
|
||||
monkeypatch.setattr(cli, "codex_bridge", fake_codex_bridge)
|
||||
|
||||
exit_code = cli.main(["start", "--cwd", str(tmp_path), "--prompt", "Analyze tests"])
|
||||
|
||||
assert exit_code == 0
|
||||
output = json.loads(capsys.readouterr().out)
|
||||
assert output["task"]["hermes_task_id"] == "codex-abc"
|
||||
assert calls == [
|
||||
{
|
||||
"action": "start",
|
||||
"prompt": "Analyze tests",
|
||||
"cwd": str(tmp_path),
|
||||
"model": None,
|
||||
"sandbox": "read-only",
|
||||
"approval_policy": "untrusted",
|
||||
"codex_home": None,
|
||||
"notify_target": None,
|
||||
}
|
||||
]
|
||||
|
||||
|
||||
def test_cli_start_passes_notify_target(tmp_path, monkeypatch, capsys):
|
||||
cli = load_reference_module("cli")
|
||||
calls = []
|
||||
|
||||
def fake_codex_bridge(**kwargs):
|
||||
calls.append(kwargs)
|
||||
return json.dumps(
|
||||
{
|
||||
"success": True,
|
||||
"protocol": {"mailbox": False, "transport": "app-server stdio"},
|
||||
"task": {
|
||||
"hermes_task_id": "codex-abc",
|
||||
"codex_thread_id": "thread-abc",
|
||||
"codex_turn_id": "turn-abc",
|
||||
"notify_target": kwargs["notify_target"],
|
||||
},
|
||||
}
|
||||
)
|
||||
|
||||
monkeypatch.setattr(cli, "codex_bridge", fake_codex_bridge)
|
||||
|
||||
exit_code = cli.main(["start", "--cwd", str(tmp_path), "--notify-target", "local", "--prompt", "Analyze tests"])
|
||||
|
||||
assert exit_code == 0
|
||||
output = json.loads(capsys.readouterr().out)
|
||||
assert output["task"]["notify_target"] == "local"
|
||||
assert calls[0]["notify_target"] == "local"
|
||||
|
||||
|
||||
def test_cli_respond_maps_request_id_to_bridge_instruction(monkeypatch, capsys):
|
||||
cli = load_reference_module("cli")
|
||||
calls = []
|
||||
|
||||
def fake_codex_bridge(**kwargs):
|
||||
calls.append(kwargs)
|
||||
return json.dumps({"success": True, "response": {"decision": kwargs["decision"]}})
|
||||
|
||||
monkeypatch.setattr(cli, "codex_bridge", fake_codex_bridge)
|
||||
|
||||
exit_code = cli.main(
|
||||
[
|
||||
"respond",
|
||||
"codex-abc",
|
||||
"--request-id",
|
||||
"approval-1",
|
||||
"--decision",
|
||||
"decline",
|
||||
"--answers",
|
||||
'{"q1": {"answers": ["yes"]}}',
|
||||
]
|
||||
)
|
||||
|
||||
assert exit_code == 0
|
||||
output = json.loads(capsys.readouterr().out)
|
||||
assert output["response"] == {"decision": "decline"}
|
||||
assert calls == [
|
||||
{
|
||||
"action": "respond",
|
||||
"task_id": "codex-abc",
|
||||
"instruction": "approval-1",
|
||||
"decision": "decline",
|
||||
"answers": {"q1": {"answers": ["yes"]}},
|
||||
}
|
||||
]
|
||||
|
||||
|
||||
def test_cli_smoke_test_polls_until_completed_with_sentinel(tmp_path, monkeypatch, capsys):
|
||||
cli = load_reference_module("cli")
|
||||
calls = []
|
||||
|
||||
def fake_codex_bridge(**kwargs):
|
||||
calls.append(kwargs)
|
||||
action = kwargs["action"]
|
||||
if action == "start":
|
||||
return json.dumps(
|
||||
{
|
||||
"success": True,
|
||||
"protocol": {"mailbox": False, "transport": "app-server stdio"},
|
||||
"task": {
|
||||
"hermes_task_id": "codex-smoke",
|
||||
"codex_thread_id": "thread-smoke",
|
||||
"codex_turn_id": "turn-smoke",
|
||||
},
|
||||
}
|
||||
)
|
||||
return json.dumps(
|
||||
{
|
||||
"success": True,
|
||||
"task": {
|
||||
"hermes_task_id": "codex-smoke",
|
||||
"status": "completed",
|
||||
"recent_events": [{"payload_summary": "assistant replied CODEX_ASYNC_OK"}],
|
||||
"final_summary": None,
|
||||
},
|
||||
}
|
||||
)
|
||||
|
||||
monkeypatch.setattr(cli, "codex_bridge", fake_codex_bridge)
|
||||
monkeypatch.setattr(cli.time, "sleep", lambda _seconds: None)
|
||||
|
||||
exit_code = cli.main(
|
||||
[
|
||||
"smoke-test",
|
||||
"--cwd",
|
||||
str(tmp_path),
|
||||
"--wait",
|
||||
"3",
|
||||
"--timeout",
|
||||
"10",
|
||||
"--poll-interval",
|
||||
"0.01",
|
||||
]
|
||||
)
|
||||
|
||||
assert exit_code == 0
|
||||
output = json.loads(capsys.readouterr().out)
|
||||
assert output["success"] is True
|
||||
assert output["task_id"] == "codex-smoke"
|
||||
assert [call["action"] for call in calls] == ["start", "status"]
|
||||
assert "CODEX_ASYNC_OK" in calls[0]["prompt"]
|
||||
assert calls[0]["notify_target"] is None
|
||||
|
||||
|
||||
def test_cli_notify_completed_dry_run_uses_bridge_without_real_notifier(monkeypatch, capsys):
|
||||
cli = load_reference_module("cli")
|
||||
calls = []
|
||||
|
||||
def fake_codex_bridge(**kwargs):
|
||||
calls.append(kwargs)
|
||||
return json.dumps(
|
||||
{
|
||||
"success": True,
|
||||
"dry_run": True,
|
||||
"processed": 1,
|
||||
"notifications": [
|
||||
{
|
||||
"task_id": "codex-abc",
|
||||
"target": "local",
|
||||
"notification_status": "dry_run",
|
||||
"sent": False,
|
||||
"message": "preview",
|
||||
}
|
||||
],
|
||||
}
|
||||
)
|
||||
|
||||
monkeypatch.setattr(cli, "codex_bridge", fake_codex_bridge)
|
||||
|
||||
exit_code = cli.main(["notify-completed", "--limit", "5", "--dry-run"])
|
||||
|
||||
assert exit_code == 0
|
||||
output = json.loads(capsys.readouterr().out)
|
||||
assert output["notifications"][0]["notification_status"] == "dry_run"
|
||||
assert calls == [{"action": "notify_completed", "limit": 5, "dry_run": True}]
|
||||
231
tests/tools/test_codex_bridge_tool.py
Normal file
231
tests/tools/test_codex_bridge_tool.py
Normal file
|
|
@ -0,0 +1,231 @@
|
|||
import json
|
||||
|
||||
import tools.codex_bridge_tool as bridge
|
||||
from tools.codex_bridge_tool import CodexBridgeManager, CodexBridgeStore
|
||||
|
||||
|
||||
class FakeCodexClient:
|
||||
instances = []
|
||||
|
||||
def __init__(self, task_id, task, manager):
|
||||
self.task_id = task_id
|
||||
self.task = task
|
||||
self.manager = manager
|
||||
self.requests = []
|
||||
self.responses = []
|
||||
self.closed = False
|
||||
FakeCodexClient.instances.append(self)
|
||||
|
||||
def start(self, *, codex_home=None):
|
||||
self.codex_home = codex_home
|
||||
|
||||
def initialize(self):
|
||||
return {"userAgent": "fake-codex", "codexHome": "/tmp/codex"}
|
||||
|
||||
def request(self, method, params=None, timeout=30):
|
||||
self.requests.append((method, params, timeout))
|
||||
if method == "thread/start":
|
||||
return {"thread": {"id": "thread-1"}}
|
||||
if method == "turn/start":
|
||||
return {"turn": {"id": "turn-1", "status": "inProgress"}}
|
||||
if method == "turn/steer":
|
||||
return {"ok": True, "steered": params}
|
||||
if method == "turn/interrupt":
|
||||
return {"ok": True, "interrupted": params}
|
||||
raise AssertionError(f"unexpected request: {method}")
|
||||
|
||||
def notify(self, method, params=None):
|
||||
self.notifications = getattr(self, "notifications", [])
|
||||
self.notifications.append((method, params))
|
||||
|
||||
def respond(self, request_id, result):
|
||||
self.responses.append((request_id, result))
|
||||
|
||||
def close(self):
|
||||
self.closed = True
|
||||
|
||||
|
||||
def make_manager(tmp_path, monkeypatch):
|
||||
FakeCodexClient.instances.clear()
|
||||
monkeypatch.setattr(bridge, "CodexJsonRpcClient", FakeCodexClient)
|
||||
store = CodexBridgeStore(tmp_path / "codex_bridge.db")
|
||||
return CodexBridgeManager(store=store)
|
||||
|
||||
|
||||
def test_start_task_uses_app_server_thread_turn_without_mailbox(tmp_path, monkeypatch):
|
||||
manager = make_manager(tmp_path, monkeypatch)
|
||||
|
||||
result = manager.start_task("Investigate the failing test", cwd=str(tmp_path))
|
||||
|
||||
assert result["success"] is True
|
||||
assert result["protocol"] == {"transport": "app-server stdio", "mailbox": False}
|
||||
task = result["task"]
|
||||
assert task["status"] == "working"
|
||||
assert task["codex_thread_id"] == "thread-1"
|
||||
assert task["codex_turn_id"] == "turn-1"
|
||||
|
||||
client = FakeCodexClient.instances[0]
|
||||
methods = [method for method, _params, _timeout in client.requests]
|
||||
assert methods == ["thread/start", "turn/start"]
|
||||
thread_params = client.requests[0][1]
|
||||
assert thread_params["sandbox"] == "read-only"
|
||||
assert thread_params["approvalPolicy"] == "untrusted"
|
||||
assert "mailbox" not in json.dumps(client.requests).lower()
|
||||
assert "outbox" not in json.dumps(client.requests).lower()
|
||||
assert "inbox" not in json.dumps(client.requests).lower()
|
||||
|
||||
|
||||
def test_start_task_records_notify_target(tmp_path, monkeypatch):
|
||||
manager = make_manager(tmp_path, monkeypatch)
|
||||
|
||||
result = manager.start_task("Analyze tests", cwd=str(tmp_path), notify_target="feishu:chat-1")
|
||||
task_id = result["task"]["hermes_task_id"]
|
||||
|
||||
assert result["task"]["notify_target"] == "feishu:chat-1"
|
||||
assert result["task"]["notification_status"] == "pending"
|
||||
persisted = manager.status(task_id)["task"]
|
||||
assert persisted["notify_target"] == "feishu:chat-1"
|
||||
assert persisted["notification_status"] == "pending"
|
||||
|
||||
|
||||
def test_server_approval_request_can_be_reported_and_resolved(tmp_path, monkeypatch):
|
||||
manager = make_manager(tmp_path, monkeypatch)
|
||||
started = manager.start_task("Run a safe command", cwd=str(tmp_path))
|
||||
task_id = started["task"]["hermes_task_id"]
|
||||
client = FakeCodexClient.instances[0]
|
||||
|
||||
manager.handle_server_request(
|
||||
task_id,
|
||||
client,
|
||||
{
|
||||
"id": "approval-1",
|
||||
"method": "item/commandExecution/requestApproval",
|
||||
"params": {"threadId": "thread-1", "turnId": "turn-1", "command": "pwd"},
|
||||
},
|
||||
)
|
||||
|
||||
status = manager.status(task_id)
|
||||
assert status["task"]["status"] == "waiting_for_approval"
|
||||
assert status["task"]["pending_requests"][0]["request_id"] == "approval-1"
|
||||
|
||||
response = manager.respond(task_id, "approval-1", decision="decline")
|
||||
assert response["success"] is True
|
||||
assert client.responses == [("approval-1", {"decision": "decline"})]
|
||||
assert manager.status(task_id)["task"]["pending_requests"] == []
|
||||
|
||||
|
||||
def test_request_user_input_response_uses_answers_payload(tmp_path, monkeypatch):
|
||||
manager = make_manager(tmp_path, monkeypatch)
|
||||
started = manager.start_task("Ask for missing context", cwd=str(tmp_path))
|
||||
task_id = started["task"]["hermes_task_id"]
|
||||
client = FakeCodexClient.instances[0]
|
||||
|
||||
manager.handle_server_request(
|
||||
task_id,
|
||||
client,
|
||||
{
|
||||
"id": "input-1",
|
||||
"method": "item/tool/requestUserInput",
|
||||
"params": {
|
||||
"threadId": "thread-1",
|
||||
"turnId": "turn-1",
|
||||
"questions": [{"id": "q1", "question": "Which file?", "options": None}],
|
||||
},
|
||||
},
|
||||
)
|
||||
|
||||
answers = {"q1": {"answers": ["README.md"]}}
|
||||
manager.respond(task_id, "input-1", decision="decline", answers=answers)
|
||||
|
||||
assert client.responses == [("input-1", {"answers": answers})]
|
||||
|
||||
|
||||
def test_steer_and_interrupt_call_codex_turn_methods(tmp_path, monkeypatch):
|
||||
manager = make_manager(tmp_path, monkeypatch)
|
||||
started = manager.start_task("Long running task", cwd=str(tmp_path))
|
||||
task_id = started["task"]["hermes_task_id"]
|
||||
client = FakeCodexClient.instances[0]
|
||||
|
||||
steer = manager.steer(task_id, "Only analyze; do not edit.")
|
||||
interrupt = manager.interrupt(task_id)
|
||||
|
||||
assert steer["success"] is True
|
||||
assert interrupt["task"]["status"] == "cancelled"
|
||||
assert client.requests[-2][0] == "turn/steer"
|
||||
assert client.requests[-2][1]["expectedTurnId"] == "turn-1"
|
||||
assert client.requests[-1][0] == "turn/interrupt"
|
||||
|
||||
|
||||
def test_notify_completed_sends_once_for_targeted_completed_task(tmp_path, monkeypatch):
|
||||
manager = make_manager(tmp_path, monkeypatch)
|
||||
started = manager.start_task("Summarize a bug", cwd=str(tmp_path), notify_target="feishu:chat-1")
|
||||
task_id = started["task"]["hermes_task_id"]
|
||||
deliveries = []
|
||||
|
||||
manager.record_event(
|
||||
task_id,
|
||||
"turn/completed",
|
||||
{"turn": {"id": "turn-1", "status": "completed"}, "message": "Done fixing it."},
|
||||
)
|
||||
|
||||
first = manager.notify_completed(notifier=lambda target, message: deliveries.append((target, message)) or {"ok": True})
|
||||
second = manager.notify_completed(notifier=lambda target, message: deliveries.append((target, message)) or {"ok": True})
|
||||
|
||||
assert first["processed"] == 1
|
||||
assert first["notifications"][0]["notification_status"] == "sent"
|
||||
assert first["notifications"][0]["sent"] is True
|
||||
assert second["processed"] == 0
|
||||
assert len(deliveries) == 1
|
||||
assert deliveries[0][0] == "feishu:chat-1"
|
||||
assert task_id in deliveries[0][1]
|
||||
assert manager.status(task_id)["task"]["notification_status"] == "sent"
|
||||
|
||||
|
||||
def test_notify_completed_marks_no_target_without_sending(tmp_path, monkeypatch):
|
||||
manager = make_manager(tmp_path, monkeypatch)
|
||||
started = manager.start_task("No callback needed", cwd=str(tmp_path))
|
||||
task_id = started["task"]["hermes_task_id"]
|
||||
|
||||
manager.record_event(
|
||||
task_id,
|
||||
"turn/completed",
|
||||
{"turn": {"id": "turn-1", "status": "completed"}, "message": "Done."},
|
||||
)
|
||||
|
||||
result = manager.notify_completed(notifier=lambda _target, _message: (_ for _ in ()).throw(AssertionError("sent")))
|
||||
|
||||
assert result["processed"] == 1
|
||||
assert result["notifications"][0]["notification_status"] == "no_target"
|
||||
assert result["notifications"][0]["sent"] is False
|
||||
assert manager.status(task_id)["task"]["notification_status"] == "no_target"
|
||||
|
||||
|
||||
def test_notify_completed_dry_run_does_not_send_or_mark(tmp_path, monkeypatch):
|
||||
manager = make_manager(tmp_path, monkeypatch)
|
||||
started = manager.start_task("Preview callback", cwd=str(tmp_path), notify_target="local")
|
||||
task_id = started["task"]["hermes_task_id"]
|
||||
|
||||
manager.record_event(
|
||||
task_id,
|
||||
"turn/completed",
|
||||
{"turn": {"id": "turn-1", "status": "completed"}, "message": "Done."},
|
||||
)
|
||||
|
||||
result = manager.notify_completed(
|
||||
dry_run=True,
|
||||
notifier=lambda _target, _message: (_ for _ in ()).throw(AssertionError("sent")),
|
||||
)
|
||||
|
||||
assert result["processed"] == 1
|
||||
assert result["notifications"][0]["notification_status"] == "dry_run"
|
||||
assert result["notifications"][0]["sent"] is False
|
||||
assert manager.status(task_id)["task"]["notification_status"] == "pending"
|
||||
|
||||
|
||||
def test_tool_schema_refuses_danger_full_access():
|
||||
props = bridge.CODEX_BRIDGE_SCHEMA["parameters"]["properties"]
|
||||
|
||||
assert "danger-full-access" not in props["sandbox"]["enum"]
|
||||
assert "never" not in props["approval_policy"]["enum"]
|
||||
assert "notify_completed" in props["action"]["enum"]
|
||||
assert "notify_target" in props
|
||||
1047
tools/codex_bridge_tool.py
Normal file
1047
tools/codex_bridge_tool.py
Normal file
File diff suppressed because it is too large
Load diff
12
toolsets.py
12
toolsets.py
|
|
@ -53,7 +53,7 @@ _HERMES_CORE_TOOLS = [
|
|||
# Clarifying questions
|
||||
"clarify",
|
||||
# Code execution + delegation
|
||||
"execute_code", "delegate_task",
|
||||
"execute_code", "delegate_task", "codex_bridge",
|
||||
# Cronjob management
|
||||
"cronjob",
|
||||
# Cross-platform messaging (gated on gateway running via check_fn)
|
||||
|
|
@ -193,6 +193,12 @@ TOOLSETS = {
|
|||
"includes": []
|
||||
},
|
||||
|
||||
"codex_bridge": {
|
||||
"description": "Run local Codex tasks through Codex app-server JSON-RPC without mailbox files",
|
||||
"tools": ["codex_bridge"],
|
||||
"includes": []
|
||||
},
|
||||
|
||||
# "honcho" toolset removed — Honcho is now a memory provider plugin.
|
||||
# Tools are injected via MemoryManager, not the toolset system.
|
||||
|
||||
|
|
@ -262,7 +268,7 @@ TOOLSETS = {
|
|||
"browser_vision", "browser_console", "browser_cdp", "browser_dialog",
|
||||
"todo", "memory",
|
||||
"session_search",
|
||||
"execute_code", "delegate_task",
|
||||
"execute_code", "delegate_task", "codex_bridge",
|
||||
],
|
||||
"includes": []
|
||||
},
|
||||
|
|
@ -290,7 +296,7 @@ TOOLSETS = {
|
|||
# Session history search
|
||||
"session_search",
|
||||
# Code execution + delegation
|
||||
"execute_code", "delegate_task",
|
||||
"execute_code", "delegate_task", "codex_bridge",
|
||||
# Cronjob management
|
||||
"cronjob",
|
||||
# Home Assistant smart home control (gated on HASS_TOKEN via check_fn)
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue