This commit is contained in:
王韧竹(Bamboo Wang/雪落) 2026-04-25 06:33:41 +08:00 committed by GitHub
commit 75e62a5af4
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
11 changed files with 2279 additions and 3 deletions

View file

@ -0,0 +1,47 @@
---
title: Codex Bridge 异步完成通知 MVP 需求
date: 2026-04-25
status: accepted
scope: lightweight
---
# Codex Bridge 异步完成通知 MVP 需求
## 背景
Hermes 通过 `skills/codex-bridge/references/cli.py start` 启动 Codex app-server stdio 任务后,会把任务状态写入本地 `codex_bridge.db`。当前实现不是常驻订阅或完成回调模式Codex turn 完成后不会主动通知原 Feishu/平台会话,用户必须再次说“继续”后 Hermes 才会手动查询 `status``list`
这造成一个产品异味:异步任务已经启动,但完成后没有人主动查收。
## 范围决策
本次做窄范围 MVP让 Codex Bridge 启动的异步任务在完成后能回到原会话或目标发送完成摘要。不要做多租户调度系统,不重写现有 Codex Bridge 低层协议,不引入 mailbox/outbox/inbox 作为主通信机制。
## 目标
- 启动任务时可选记录通知目标,例如 `local``feishu:<chat_id>` 或其他 `send_message` 支持的显式平台目标。
- 默认不改变现有 API 行为;未传通知目标时仍能正常启动和查询。
- 提供 watcher/one-shot poll 入口,发现已完成但未处理通知的任务。
- 对有目标的任务读取 final summary生成简洁完成摘要并通过可注入 notifier 发送。
- 对无目标的完成任务标记为 `no_target`,避免 watcher 重启后重复处理。
- 通过持久化 `notification_status` / `notified_at` 防重复通知。
## 非目标
- 不实现常驻多租户调度器。
- 不实现 pending approval / `requestUserInput` 的实时双向交互。
- 不让测试向真实 Feishu、WeChat、Telegram 等外部平台发消息。
- 不开放 `danger-full-access` 默认权限。
- 不用 mailbox/outbox/inbox 作为通信机制。
## 验收标准
- `codex_bridge(action="start", notify_target=...)` 能把目标写入任务状态。
- watcher/notify 入口只通知 terminal 状态任务一次;重启或重复运行不会重复发送。
- terminal 任务没有 target 时会被标记为 `no_target`,不会调用 notifier。
- CLI 暴露 `--notify-target` 和 one-shot `notify-completed` 入口,并支持 dry-run。
- 测试通过 mock/inject notifier 覆盖通知行为。
## 后续扩展说明
pending approval 和 `requestUserInput` 后续可复用同一通知目标字段:当任务进入 `waiting_for_approval``waiting_for_user_input`watcher 可以发送带 request id 的交互提示;平台侧回复再映射到 `codex_bridge respond`。本次先只处理 terminal completion避免把交互式审批设计混入 MVP。

View file

@ -0,0 +1,82 @@
---
title: Codex Bridge 异步完成通知 MVP 实现计划
date: 2026-04-25
status: active
origin: docs/brainstorms/2026-04-25-codex-bridge-completion-notification-requirements.md
---
# Codex Bridge 异步完成通知 MVP 实现计划
## 问题框架
Codex Bridge 已能通过 app-server stdio 启动异步 Codex turn并把状态写入 `codex_bridge.db`。缺口在完成后的主动送达:当前没有通知目标、通知状态,也没有 watcher 入口来把 terminal 任务的摘要回发给原会话。
## 技术决策
- 在 `codex_bridge_tasks` 上新增通知元数据:`notify_target``notification_status``notified_at``notification_error`
- `start` 接受可选 `notify_target`,不传时保持旧行为。
- 新增 one-shot `notify_completed` action扫描 terminal 且尚未处理通知的任务,按目标发送或标记 `no_target`
- 默认 notifier 复用现有 `send_message` 工具;测试和 CLI dry-run 通过注入或 dry-run 避免真实外发。
- `local` 目标作为本地消费目标:记录为已通知并返回摘要,不调用外部平台。
## 实现单元
### U1: 持久化通知目标与状态
修改文件:
- `tools/codex_bridge_tool.py`
- `tests/tools/test_codex_bridge_tool.py`
做法:
- 数据库初始化时对旧库执行兼容迁移。
- `CodexBridgeTask.snapshot()``list_tasks()``get_task_snapshot()` 暴露通知字段。
- `start_task()` 接受 `notify_target` 并保存。
测试场景:
- 启动任务时传入 `notify_target`,状态快照和持久化查询都能看到该值。
### U2: 完成通知 one-shot watcher
修改文件:
- `tools/codex_bridge_tool.py`
- `tests/tools/test_codex_bridge_tool.py`
做法:
- 增加扫描 terminal 任务的方法。
- 对无 target 的任务标记 `no_target`,不调用 notifier。
- 对有 target 的任务构造简洁摘要,调用 notifier 后标记 `sent``notified_at`
- 已 `sent``no_target` 的任务不再重复处理。
- 支持 `dry_run`,只返回会处理的任务,不写通知状态,不发送。
测试场景:
- completed 任务只通知一次。
- 无 target completed 任务不发送,并标记 `no_target`
- dry-run 不发送且不改变通知状态。
### U3: 工具 schema 与 CLI 入口
修改文件:
- `tools/codex_bridge_tool.py`
- `skills/codex-bridge/references/cli.py`
- `skills/codex-bridge/references/validator.py`
- `tests/skills/test_codex_bridge_skill.py`
做法:
- schema 加入 `notify_completed` action、`notify_target``dry_run`
- CLI `start`/`smoke-test` 增加 `--notify-target`
- CLI 增加 `notify-completed` one-shot 命令。
- validator 校验 notify 输出的基本结构。
测试场景:
- CLI start 能把 `--notify-target` 传给工具。
- CLI notify-completed dry-run 调用 bridge 且不依赖真实平台。
## 验证
- `python -m py_compile tools/codex_bridge_tool.py skills/codex-bridge/references/cli.py skills/codex-bridge/references/validator.py`
- `scripts/run_tests.sh tests/tools/test_codex_bridge_tool.py tests/skills/test_codex_bridge_skill.py`
## 风险
- 默认 notifier 依赖 `send_message` 的运行环境;没有 gateway 或目标不可达时会记录 `notification_error` 并保留可重试状态。
- 当前只处理 terminal completion不处理实时 approval/input后续应在同一 target 模型上扩展。

View file

@ -0,0 +1,85 @@
---
title: Codex Bridge 异步任务需要持久化完成通知状态
date: 2026-04-25
category: docs/solutions/developer-experience/
module: Codex Bridge
problem_type: developer_experience
component: assistant
severity: medium
applies_when:
- 异步 agent 任务由本地 bridge 启动,但完成结果需要回到原平台会话
- 任务状态已经持久化,但缺少完成后主动送达能力
- 测试不能向真实外部平台发送消息
tags: [codex-bridge, async-notification, app-server, send-message, watcher]
---
# Codex Bridge 异步任务需要持久化完成通知状态
## Context
Codex Bridge 已经通过 app-server stdio 启动 Codex 任务,并把状态写入 `codex_bridge.db`。dogfood 暴露出的体验问题是:异步任务完成后没有主动通知原 Feishu/平台会话,用户必须再次触发 Hermes 查询 `status``list` 才能知道结果。
这类问题不需要先做多租户调度系统。MVP 的关键是让任务在启动时可选记录通知目标,并让一个 one-shot watcher 可以可靠地处理 terminal 任务。
## Guidance
在已有任务表上补齐三个概念,而不是重写底层通信协议:
- `notify_target`:启动时可选记录目标,例如 `local``feishu:<chat_id>`
- `notification_status`:记录通知生命周期,例如 `pending``sent``failed``no_target`
- `notified_at` / `notification_error`:让 watcher 重启后能防重复,并保留失败原因。
watcher 应该只扫描 terminal 状态任务,并做幂等处理:
- 有目标:构造简洁完成摘要,调用可注入 notifier成功后标记 `sent`
- 无目标:标记 `no_target`,不发送,避免每次扫描重复捞到同一任务。
- dry-run返回预览不发送也不写通知状态。
默认 notifier 可以复用现有 `send_message` 能力,但核心 manager 方法要允许注入 notifier。这样单元测试可以用 fake notifier 验证行为,避免真实平台副作用。
## Why This Matters
异步 bridge 的产品承诺不是“能启动后台任务”而是“任务结束后用户能在原上下文看到结果”。如果只有状态表但没有通知状态系统会卡在“完成但无人查收”的灰区如果没有持久化防重复watcher 或 daemon 重启又可能重复推送。
把通知状态做成任务元数据,可以在不引入 mailbox/outbox/inbox 通信机制的情况下满足 MVP并为后续实时 approval / `requestUserInput` 扩展留下同一套 target 语义。
## When to Apply
- 异步任务生命周期已经持久化,但完成后需要跨平台送达。
- 现有平台发送能力已经存在,新增功能只需要选择目标和调用发送。
- 需要保证测试环境不触发真实外部消息。
- 需要 watcher/daemon 重启后不重复通知。
## Examples
启动时记录目标:
```python
codex_bridge(
action="start",
prompt="Investigate the failing tests",
notify_target="feishu:chat-1",
)
```
one-shot watcher dry-run
```bash
python skills/codex-bridge/references/cli.py notify-completed --dry-run
```
测试中注入 notifier
```python
deliveries = []
manager.notify_completed(
notifier=lambda target, message: deliveries.append((target, message)) or {"ok": True}
)
```
## Related
- `docs/brainstorms/2026-04-25-codex-bridge-completion-notification-requirements.md`
- `docs/plans/2026-04-25-codex-bridge-completion-notification-plan.md`
- `tools/codex_bridge_tool.py`
- `skills/codex-bridge/references/cli.py`

View file

@ -0,0 +1,59 @@
---
name: codex-bridge
description: Start and control local Codex tasks through Hermes Codex Bridge app-server integration.
version: 1.0.0
platforms: [linux, macos]
metadata:
hermes:
tags: [codex, agent, bridge, app-server]
category: software-development
---
# Codex Bridge
Use this skill when you need Hermes to start or steer a local Codex task through the Codex app-server protocol.
## CLI
Run the reference CLI from the repository root:
```bash
python skills/codex-bridge/references/cli.py start --prompt "Inspect this repository and summarize the test layout."
python skills/codex-bridge/references/cli.py status <task_id>
python skills/codex-bridge/references/cli.py list
python skills/codex-bridge/references/cli.py steer <task_id> --instruction "Focus only on tests."
python skills/codex-bridge/references/cli.py interrupt <task_id>
python skills/codex-bridge/references/cli.py respond <task_id> --request-id <request_id> --decision decline
python skills/codex-bridge/references/cli.py smoke-test --wait 10 --timeout 60
```
The CLI is a productized wrapper around `tools.codex_bridge_tool.codex_bridge`.
It does not implement the app-server protocol itself and does not use mailbox,
inbox, or outbox files.
## Safety Defaults
- Sandbox is limited to `read-only` or `workspace-write`.
- `danger-full-access` is rejected.
- Approval policy is limited to `untrusted` or `on-request`.
- `approval_policy=never` is rejected.
- `start` requires a non-empty prompt and an existing `cwd`.
## Output
Commands print JSON to stdout. Validation errors return:
```json
{"success": false, "error": "..."}
```
Successful `start` output is validated to ensure:
- `success` is `true`
- `protocol.mailbox` is `false`
- `protocol.transport` includes `app-server`
- task id, Codex thread id, and Codex turn id are present
The smoke test starts an async Codex task, polls `status`, and succeeds only
when the final task status is `completed` and `CODEX_ASYNC_OK` appears in
`recent_events` or `final_summary`.

View file

@ -0,0 +1 @@
"""Codex Bridge skill reference utilities."""

View file

@ -0,0 +1,252 @@
#!/usr/bin/env python3
"""Productized CLI for Hermes Codex Bridge."""
from __future__ import annotations
import argparse
import json
import sys
import time
from pathlib import Path
from typing import Any
REPO_ROOT = Path(__file__).resolve().parents[3]
if str(REPO_ROOT) not in sys.path:
sys.path.insert(0, str(REPO_ROOT))
try:
from .validator import (
SMOKE_SENTINEL,
TERMINAL_STATUSES,
ValidationError,
parse_json_object,
validate_approval_policy,
validate_bridge_output,
validate_interrupt_input,
validate_respond_input,
validate_sandbox,
validate_smoke_test_result,
validate_start_input,
validate_status_input,
validate_steer_input,
validate_notify_completed_output,
validate_notify_target,
)
except ImportError:
from validator import ( # type: ignore
SMOKE_SENTINEL,
TERMINAL_STATUSES,
ValidationError,
parse_json_object,
validate_approval_policy,
validate_bridge_output,
validate_interrupt_input,
validate_respond_input,
validate_sandbox,
validate_smoke_test_result,
validate_start_input,
validate_status_input,
validate_steer_input,
validate_notify_completed_output,
validate_notify_target,
)
from tools.codex_bridge_tool import DEFAULT_APPROVAL_POLICY, DEFAULT_SANDBOX, codex_bridge
def emit(data: dict[str, Any]) -> None:
print(json.dumps(data, ensure_ascii=False, sort_keys=True))
def call_bridge(action: str, **kwargs: Any) -> dict[str, Any]:
raw = codex_bridge(action=action, **kwargs)
try:
data = json.loads(raw)
except json.JSONDecodeError as exc:
raise ValidationError(f"codex_bridge returned invalid JSON for {action}: {exc.msg}") from exc
validate_bridge_output(action, data)
return data
def _prompt_from_args(args: argparse.Namespace) -> str:
prompt = args.prompt
if prompt is None and args.prompt_text:
prompt = " ".join(args.prompt_text)
return prompt or ""
def cmd_start(args: argparse.Namespace) -> dict[str, Any]:
prompt = _prompt_from_args(args)
validate_start_input(prompt, args.cwd, args.sandbox, args.approval_policy)
notify_target = validate_notify_target(args.notify_target)
return call_bridge(
"start",
prompt=prompt,
cwd=args.cwd,
model=args.model,
sandbox=args.sandbox,
approval_policy=args.approval_policy,
codex_home=args.codex_home,
notify_target=notify_target,
)
def cmd_status(args: argparse.Namespace) -> dict[str, Any]:
validate_status_input(args.task_id)
return call_bridge("status", task_id=args.task_id)
def cmd_list(args: argparse.Namespace) -> dict[str, Any]:
return call_bridge("list", limit=args.limit)
def cmd_notify_completed(args: argparse.Namespace) -> dict[str, Any]:
data = call_bridge("notify_completed", limit=args.limit, dry_run=args.dry_run)
validate_notify_completed_output(data)
return data
def cmd_steer(args: argparse.Namespace) -> dict[str, Any]:
validate_steer_input(args.task_id, args.instruction)
return call_bridge("steer", task_id=args.task_id, instruction=args.instruction)
def cmd_interrupt(args: argparse.Namespace) -> dict[str, Any]:
validate_interrupt_input(args.task_id)
return call_bridge("interrupt", task_id=args.task_id)
def cmd_respond(args: argparse.Namespace) -> dict[str, Any]:
answers = parse_json_object(args.answers, field_name="answers")
validate_respond_input(args.task_id, args.request_id, args.decision, answers)
return call_bridge(
"respond",
task_id=args.task_id,
instruction=args.request_id,
decision=args.decision,
answers=answers,
)
def _smoke_prompt(wait_seconds: int) -> str:
return (
f"Wait {wait_seconds} seconds asynchronously, then reply exactly {SMOKE_SENTINEL}. "
"Do not modify files."
)
def cmd_smoke_test(args: argparse.Namespace) -> dict[str, Any]:
validate_start_input(_smoke_prompt(args.wait), args.cwd, args.sandbox, args.approval_policy)
notify_target = validate_notify_target(args.notify_target)
started = call_bridge(
"start",
prompt=_smoke_prompt(args.wait),
cwd=args.cwd,
model=args.model,
sandbox=args.sandbox,
approval_policy=args.approval_policy,
codex_home=args.codex_home,
notify_target=notify_target,
)
task_id = started["task"]["hermes_task_id"]
deadline = time.monotonic() + args.timeout
last_status: dict[str, Any] | None = None
while time.monotonic() < deadline:
time.sleep(args.poll_interval)
last_status = call_bridge("status", task_id=task_id)
task = last_status.get("task") or {}
if task.get("status") in TERMINAL_STATUSES:
validate_smoke_test_result(last_status)
return {
"success": True,
"task_id": task_id,
"status": task.get("status"),
"start": started,
"final_status": last_status,
}
return {
"success": False,
"error": f"smoke-test timed out after {args.timeout} seconds.",
"task_id": task_id,
"start": started,
"last_status": last_status,
}
def add_common_start_options(parser: argparse.ArgumentParser) -> None:
parser.add_argument("--cwd", default=str(Path.cwd()), help="Working directory for Codex.")
parser.add_argument("--model", default=None, help="Optional Codex model override.")
parser.add_argument("--sandbox", default=DEFAULT_SANDBOX, type=validate_sandbox)
parser.add_argument("--approval-policy", default=DEFAULT_APPROVAL_POLICY, type=validate_approval_policy)
parser.add_argument("--codex-home", default=None, help="Optional CODEX_HOME override.")
parser.add_argument(
"--notify-target",
default=None,
help="Optional completion notification target, e.g. local or feishu:<chat_id>.",
)
def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(description="Hermes Codex Bridge skill CLI")
subparsers = parser.add_subparsers(dest="command", required=True)
start = subparsers.add_parser("start", help="Start a Codex task.")
start.add_argument("--prompt", help="Task prompt.")
start.add_argument("prompt_text", nargs="*", help="Task prompt as positional text.")
add_common_start_options(start)
start.set_defaults(func=cmd_start)
status = subparsers.add_parser("status", help="Show task status.")
status.add_argument("task_id")
status.set_defaults(func=cmd_status)
list_parser = subparsers.add_parser("list", help="List recent Codex Bridge tasks.")
list_parser.add_argument("--limit", type=int, default=10)
list_parser.set_defaults(func=cmd_list)
notify = subparsers.add_parser("notify-completed", help="One-shot poll and notify completed tasks.")
notify.add_argument("--limit", type=int, default=10)
notify.add_argument("--dry-run", action="store_true", help="Preview notifications without sending or marking.")
notify.set_defaults(func=cmd_notify_completed)
steer = subparsers.add_parser("steer", help="Steer an active Codex turn.")
steer.add_argument("task_id")
steer.add_argument("--instruction", required=True)
steer.set_defaults(func=cmd_steer)
interrupt = subparsers.add_parser("interrupt", help="Interrupt an active Codex turn.")
interrupt.add_argument("task_id")
interrupt.set_defaults(func=cmd_interrupt)
respond = subparsers.add_parser("respond", help="Respond to a pending Codex request.")
respond.add_argument("task_id")
respond.add_argument("--request-id", required=True)
respond.add_argument("--decision", default="decline")
respond.add_argument("--answers", default=None, help="JSON object for user-input answers.")
respond.set_defaults(func=cmd_respond)
smoke = subparsers.add_parser("smoke-test", help="Run an async Codex Bridge smoke test.")
smoke.add_argument("--wait", type=int, default=10)
smoke.add_argument("--timeout", type=int, default=60)
smoke.add_argument("--poll-interval", type=float, default=2.0)
add_common_start_options(smoke)
smoke.set_defaults(func=cmd_smoke_test)
return parser
def main(argv: list[str] | None = None) -> int:
parser = build_parser()
try:
args = parser.parse_args(argv)
result = args.func(args)
emit(result)
return 0 if result.get("success") is True else 1
except ValidationError as exc:
emit({"success": False, "error": str(exc)})
return 2
if __name__ == "__main__":
raise SystemExit(main())

View file

@ -0,0 +1,182 @@
"""Validation helpers for the Codex Bridge skill CLI."""
from __future__ import annotations
import json
from pathlib import Path
from typing import Any, Mapping
ALLOWED_SANDBOXES = {"read-only", "workspace-write"}
ALLOWED_APPROVAL_POLICIES = {"untrusted", "on-request"}
ALLOWED_DECISIONS = {"accept", "acceptForSession", "decline", "cancel"}
TERMINAL_STATUSES = {"completed", "failed", "cancelled"}
NOTIFICATION_STATUSES = {"sent", "failed", "no_target", "dry_run", "pending"}
SMOKE_SENTINEL = "CODEX_ASYNC_OK"
class ValidationError(ValueError):
"""Raised when a CLI input or bridge output fails validation."""
def parse_json_object(value: str | None, *, field_name: str) -> dict[str, Any]:
if not value:
return {}
try:
parsed = json.loads(value)
except json.JSONDecodeError as exc:
raise ValidationError(f"{field_name} must be valid JSON: {exc.msg}") from exc
if not isinstance(parsed, dict):
raise ValidationError(f"{field_name} must be a JSON object.")
return parsed
def validate_sandbox(sandbox: str) -> str:
if sandbox == "danger-full-access":
raise ValidationError("danger-full-access is not allowed for Codex Bridge.")
if sandbox not in ALLOWED_SANDBOXES:
allowed = ", ".join(sorted(ALLOWED_SANDBOXES))
raise ValidationError(f"sandbox must be one of: {allowed}.")
return sandbox
def validate_approval_policy(approval_policy: str) -> str:
if approval_policy not in ALLOWED_APPROVAL_POLICIES:
allowed = ", ".join(sorted(ALLOWED_APPROVAL_POLICIES))
raise ValidationError(f"approval_policy must be one of: {allowed}.")
return approval_policy
def validate_start_input(prompt: str, cwd: str, sandbox: str, approval_policy: str) -> None:
if not prompt or not prompt.strip():
raise ValidationError("start prompt must be non-empty.")
cwd_path = Path(cwd).expanduser()
if not cwd_path.exists() or not cwd_path.is_dir():
raise ValidationError(f"cwd must be an existing directory: {cwd}")
validate_sandbox(sandbox)
validate_approval_policy(approval_policy)
def validate_notify_target(target: str | None) -> str | None:
if target is None:
return None
normalized = target.strip()
if not normalized:
raise ValidationError("notify_target must be non-empty when provided.")
return normalized
def validate_task_id(action: str, task_id: str | None) -> None:
if not task_id or not str(task_id).strip():
raise ValidationError(f"{action} requires task_id.")
def validate_steer_input(task_id: str | None, instruction: str | None) -> None:
validate_task_id("steer", task_id)
if not instruction or not instruction.strip():
raise ValidationError("steer requires instruction.")
def validate_interrupt_input(task_id: str | None) -> None:
validate_task_id("interrupt", task_id)
def validate_status_input(task_id: str | None) -> None:
validate_task_id("status", task_id)
def validate_respond_input(
task_id: str | None,
request_id: str | None,
decision: str,
answers: Mapping[str, Any] | None,
) -> None:
validate_task_id("respond", task_id)
if not request_id or not str(request_id).strip():
raise ValidationError("respond requires request_id.")
if decision not in ALLOWED_DECISIONS:
allowed = ", ".join(sorted(ALLOWED_DECISIONS))
raise ValidationError(f"decision must be one of: {allowed}.")
if answers is not None and not isinstance(answers, Mapping):
raise ValidationError("answers must be a JSON object.")
def validate_start_output(data: Mapping[str, Any]) -> None:
if data.get("success") is not True:
raise ValidationError("start output must have success=true.")
protocol = data.get("protocol")
if not isinstance(protocol, Mapping):
raise ValidationError("start output must include protocol.")
if protocol.get("mailbox") is not False:
raise ValidationError("start output must have protocol.mailbox=false.")
transport = str(protocol.get("transport") or "")
if "app-server" not in transport:
raise ValidationError("start output protocol.transport must include app-server.")
task = data.get("task")
if not isinstance(task, Mapping):
raise ValidationError("start output must include task.")
required = {
"hermes_task_id": "task id",
"codex_thread_id": "thread id",
"codex_turn_id": "turn id",
}
for key, label in required.items():
if not task.get(key):
raise ValidationError(f"start output missing {label}.")
def validate_bridge_output(action: str, data: Mapping[str, Any]) -> None:
if not isinstance(data, Mapping):
raise ValidationError("bridge output must be a JSON object.")
if data.get("success") is not True and data.get("error"):
raise ValidationError(str(data["error"]))
if action == "start":
validate_start_output(data)
return
if action == "notify_completed":
validate_notify_completed_output(data)
return
if "success" in data and data.get("success") is not True:
raise ValidationError(str(data.get("error") or f"{action} failed."))
def validate_notify_completed_output(data: Mapping[str, Any]) -> None:
if data.get("success") is not True:
raise ValidationError("notify_completed output must have success=true.")
notifications = data.get("notifications")
if not isinstance(notifications, list):
raise ValidationError("notify_completed output must include notifications list.")
for item in notifications:
if not isinstance(item, Mapping):
raise ValidationError("notify_completed notifications must be objects.")
if not item.get("task_id"):
raise ValidationError("notify_completed notification missing task_id.")
status = item.get("notification_status")
if status not in NOTIFICATION_STATUSES:
allowed = ", ".join(sorted(NOTIFICATION_STATUSES))
raise ValidationError(f"notification_status must be one of: {allowed}.")
def contains_text(value: Any, needle: str) -> bool:
if isinstance(value, str):
return needle in value
if isinstance(value, Mapping):
return any(contains_text(v, needle) for v in value.values())
if isinstance(value, list):
return any(contains_text(v, needle) for v in value)
return False
def validate_smoke_test_result(status_data: Mapping[str, Any]) -> None:
task = status_data.get("task")
if not isinstance(task, Mapping):
raise ValidationError("smoke-test status output must include task.")
status = task.get("status")
if status != "completed":
raise ValidationError(f"smoke-test final status must be completed, got {status!r}.")
searchable = {
"recent_events": task.get("recent_events", []),
"final_summary": task.get("final_summary"),
}
if not contains_text(searchable, SMOKE_SENTINEL):
raise ValidationError(f"smoke-test output did not include {SMOKE_SENTINEL}.")

View file

@ -0,0 +1,284 @@
import importlib.util
import json
import sys
from pathlib import Path
SKILL_REFS = Path(__file__).resolve().parents[2] / "skills" / "codex-bridge" / "references"
def load_reference_module(name):
module_path = SKILL_REFS / f"{name}.py"
sys.path.insert(0, str(SKILL_REFS))
try:
spec = importlib.util.spec_from_file_location(f"codex_bridge_skill_{name}", module_path)
module = importlib.util.module_from_spec(spec)
assert spec and spec.loader
spec.loader.exec_module(module)
return module
finally:
try:
sys.path.remove(str(SKILL_REFS))
except ValueError:
pass
def test_validator_rejects_unsafe_start_inputs(tmp_path):
validator = load_reference_module("validator")
for sandbox in ["danger-full-access", "network-only"]:
try:
validator.validate_start_input("hello", str(tmp_path), sandbox, "untrusted")
except validator.ValidationError as exc:
assert "sandbox" in str(exc) or "danger-full-access" in str(exc)
else:
raise AssertionError(f"expected {sandbox} to be rejected")
try:
validator.validate_start_input("hello", str(tmp_path), "read-only", "never")
except validator.ValidationError as exc:
assert "approval_policy" in str(exc)
else:
raise AssertionError("expected approval_policy=never to be rejected")
try:
validator.validate_start_input("", str(tmp_path), "read-only", "untrusted")
except validator.ValidationError as exc:
assert "prompt" in str(exc)
else:
raise AssertionError("expected empty prompt to be rejected")
try:
validator.validate_start_input("hello", str(tmp_path / "missing"), "read-only", "untrusted")
except validator.ValidationError as exc:
assert "cwd" in str(exc)
else:
raise AssertionError("expected missing cwd to be rejected")
def test_validator_requires_safe_start_output_contract():
validator = load_reference_module("validator")
valid = {
"success": True,
"protocol": {"mailbox": False, "transport": "app-server stdio"},
"task": {
"hermes_task_id": "codex-1",
"codex_thread_id": "thread-1",
"codex_turn_id": "turn-1",
},
}
validator.validate_start_output(valid)
invalid = dict(valid)
invalid["protocol"] = {"mailbox": True, "transport": "app-server stdio"}
try:
validator.validate_start_output(invalid)
except validator.ValidationError as exc:
assert "mailbox" in str(exc)
else:
raise AssertionError("expected mailbox output to be rejected")
invalid = dict(valid)
invalid["protocol"] = {"mailbox": False, "transport": "mailbox"}
try:
validator.validate_start_output(invalid)
except validator.ValidationError as exc:
assert "app-server" in str(exc)
else:
raise AssertionError("expected non app-server transport to be rejected")
def test_cli_start_validates_and_emits_bridge_json(tmp_path, monkeypatch, capsys):
cli = load_reference_module("cli")
calls = []
def fake_codex_bridge(**kwargs):
calls.append(kwargs)
return json.dumps(
{
"success": True,
"protocol": {"mailbox": False, "transport": "app-server stdio"},
"task": {
"hermes_task_id": "codex-abc",
"codex_thread_id": "thread-abc",
"codex_turn_id": "turn-abc",
},
}
)
monkeypatch.setattr(cli, "codex_bridge", fake_codex_bridge)
exit_code = cli.main(["start", "--cwd", str(tmp_path), "--prompt", "Analyze tests"])
assert exit_code == 0
output = json.loads(capsys.readouterr().out)
assert output["task"]["hermes_task_id"] == "codex-abc"
assert calls == [
{
"action": "start",
"prompt": "Analyze tests",
"cwd": str(tmp_path),
"model": None,
"sandbox": "read-only",
"approval_policy": "untrusted",
"codex_home": None,
"notify_target": None,
}
]
def test_cli_start_passes_notify_target(tmp_path, monkeypatch, capsys):
cli = load_reference_module("cli")
calls = []
def fake_codex_bridge(**kwargs):
calls.append(kwargs)
return json.dumps(
{
"success": True,
"protocol": {"mailbox": False, "transport": "app-server stdio"},
"task": {
"hermes_task_id": "codex-abc",
"codex_thread_id": "thread-abc",
"codex_turn_id": "turn-abc",
"notify_target": kwargs["notify_target"],
},
}
)
monkeypatch.setattr(cli, "codex_bridge", fake_codex_bridge)
exit_code = cli.main(["start", "--cwd", str(tmp_path), "--notify-target", "local", "--prompt", "Analyze tests"])
assert exit_code == 0
output = json.loads(capsys.readouterr().out)
assert output["task"]["notify_target"] == "local"
assert calls[0]["notify_target"] == "local"
def test_cli_respond_maps_request_id_to_bridge_instruction(monkeypatch, capsys):
cli = load_reference_module("cli")
calls = []
def fake_codex_bridge(**kwargs):
calls.append(kwargs)
return json.dumps({"success": True, "response": {"decision": kwargs["decision"]}})
monkeypatch.setattr(cli, "codex_bridge", fake_codex_bridge)
exit_code = cli.main(
[
"respond",
"codex-abc",
"--request-id",
"approval-1",
"--decision",
"decline",
"--answers",
'{"q1": {"answers": ["yes"]}}',
]
)
assert exit_code == 0
output = json.loads(capsys.readouterr().out)
assert output["response"] == {"decision": "decline"}
assert calls == [
{
"action": "respond",
"task_id": "codex-abc",
"instruction": "approval-1",
"decision": "decline",
"answers": {"q1": {"answers": ["yes"]}},
}
]
def test_cli_smoke_test_polls_until_completed_with_sentinel(tmp_path, monkeypatch, capsys):
cli = load_reference_module("cli")
calls = []
def fake_codex_bridge(**kwargs):
calls.append(kwargs)
action = kwargs["action"]
if action == "start":
return json.dumps(
{
"success": True,
"protocol": {"mailbox": False, "transport": "app-server stdio"},
"task": {
"hermes_task_id": "codex-smoke",
"codex_thread_id": "thread-smoke",
"codex_turn_id": "turn-smoke",
},
}
)
return json.dumps(
{
"success": True,
"task": {
"hermes_task_id": "codex-smoke",
"status": "completed",
"recent_events": [{"payload_summary": "assistant replied CODEX_ASYNC_OK"}],
"final_summary": None,
},
}
)
monkeypatch.setattr(cli, "codex_bridge", fake_codex_bridge)
monkeypatch.setattr(cli.time, "sleep", lambda _seconds: None)
exit_code = cli.main(
[
"smoke-test",
"--cwd",
str(tmp_path),
"--wait",
"3",
"--timeout",
"10",
"--poll-interval",
"0.01",
]
)
assert exit_code == 0
output = json.loads(capsys.readouterr().out)
assert output["success"] is True
assert output["task_id"] == "codex-smoke"
assert [call["action"] for call in calls] == ["start", "status"]
assert "CODEX_ASYNC_OK" in calls[0]["prompt"]
assert calls[0]["notify_target"] is None
def test_cli_notify_completed_dry_run_uses_bridge_without_real_notifier(monkeypatch, capsys):
cli = load_reference_module("cli")
calls = []
def fake_codex_bridge(**kwargs):
calls.append(kwargs)
return json.dumps(
{
"success": True,
"dry_run": True,
"processed": 1,
"notifications": [
{
"task_id": "codex-abc",
"target": "local",
"notification_status": "dry_run",
"sent": False,
"message": "preview",
}
],
}
)
monkeypatch.setattr(cli, "codex_bridge", fake_codex_bridge)
exit_code = cli.main(["notify-completed", "--limit", "5", "--dry-run"])
assert exit_code == 0
output = json.loads(capsys.readouterr().out)
assert output["notifications"][0]["notification_status"] == "dry_run"
assert calls == [{"action": "notify_completed", "limit": 5, "dry_run": True}]

View file

@ -0,0 +1,231 @@
import json
import tools.codex_bridge_tool as bridge
from tools.codex_bridge_tool import CodexBridgeManager, CodexBridgeStore
class FakeCodexClient:
instances = []
def __init__(self, task_id, task, manager):
self.task_id = task_id
self.task = task
self.manager = manager
self.requests = []
self.responses = []
self.closed = False
FakeCodexClient.instances.append(self)
def start(self, *, codex_home=None):
self.codex_home = codex_home
def initialize(self):
return {"userAgent": "fake-codex", "codexHome": "/tmp/codex"}
def request(self, method, params=None, timeout=30):
self.requests.append((method, params, timeout))
if method == "thread/start":
return {"thread": {"id": "thread-1"}}
if method == "turn/start":
return {"turn": {"id": "turn-1", "status": "inProgress"}}
if method == "turn/steer":
return {"ok": True, "steered": params}
if method == "turn/interrupt":
return {"ok": True, "interrupted": params}
raise AssertionError(f"unexpected request: {method}")
def notify(self, method, params=None):
self.notifications = getattr(self, "notifications", [])
self.notifications.append((method, params))
def respond(self, request_id, result):
self.responses.append((request_id, result))
def close(self):
self.closed = True
def make_manager(tmp_path, monkeypatch):
FakeCodexClient.instances.clear()
monkeypatch.setattr(bridge, "CodexJsonRpcClient", FakeCodexClient)
store = CodexBridgeStore(tmp_path / "codex_bridge.db")
return CodexBridgeManager(store=store)
def test_start_task_uses_app_server_thread_turn_without_mailbox(tmp_path, monkeypatch):
manager = make_manager(tmp_path, monkeypatch)
result = manager.start_task("Investigate the failing test", cwd=str(tmp_path))
assert result["success"] is True
assert result["protocol"] == {"transport": "app-server stdio", "mailbox": False}
task = result["task"]
assert task["status"] == "working"
assert task["codex_thread_id"] == "thread-1"
assert task["codex_turn_id"] == "turn-1"
client = FakeCodexClient.instances[0]
methods = [method for method, _params, _timeout in client.requests]
assert methods == ["thread/start", "turn/start"]
thread_params = client.requests[0][1]
assert thread_params["sandbox"] == "read-only"
assert thread_params["approvalPolicy"] == "untrusted"
assert "mailbox" not in json.dumps(client.requests).lower()
assert "outbox" not in json.dumps(client.requests).lower()
assert "inbox" not in json.dumps(client.requests).lower()
def test_start_task_records_notify_target(tmp_path, monkeypatch):
manager = make_manager(tmp_path, monkeypatch)
result = manager.start_task("Analyze tests", cwd=str(tmp_path), notify_target="feishu:chat-1")
task_id = result["task"]["hermes_task_id"]
assert result["task"]["notify_target"] == "feishu:chat-1"
assert result["task"]["notification_status"] == "pending"
persisted = manager.status(task_id)["task"]
assert persisted["notify_target"] == "feishu:chat-1"
assert persisted["notification_status"] == "pending"
def test_server_approval_request_can_be_reported_and_resolved(tmp_path, monkeypatch):
manager = make_manager(tmp_path, monkeypatch)
started = manager.start_task("Run a safe command", cwd=str(tmp_path))
task_id = started["task"]["hermes_task_id"]
client = FakeCodexClient.instances[0]
manager.handle_server_request(
task_id,
client,
{
"id": "approval-1",
"method": "item/commandExecution/requestApproval",
"params": {"threadId": "thread-1", "turnId": "turn-1", "command": "pwd"},
},
)
status = manager.status(task_id)
assert status["task"]["status"] == "waiting_for_approval"
assert status["task"]["pending_requests"][0]["request_id"] == "approval-1"
response = manager.respond(task_id, "approval-1", decision="decline")
assert response["success"] is True
assert client.responses == [("approval-1", {"decision": "decline"})]
assert manager.status(task_id)["task"]["pending_requests"] == []
def test_request_user_input_response_uses_answers_payload(tmp_path, monkeypatch):
manager = make_manager(tmp_path, monkeypatch)
started = manager.start_task("Ask for missing context", cwd=str(tmp_path))
task_id = started["task"]["hermes_task_id"]
client = FakeCodexClient.instances[0]
manager.handle_server_request(
task_id,
client,
{
"id": "input-1",
"method": "item/tool/requestUserInput",
"params": {
"threadId": "thread-1",
"turnId": "turn-1",
"questions": [{"id": "q1", "question": "Which file?", "options": None}],
},
},
)
answers = {"q1": {"answers": ["README.md"]}}
manager.respond(task_id, "input-1", decision="decline", answers=answers)
assert client.responses == [("input-1", {"answers": answers})]
def test_steer_and_interrupt_call_codex_turn_methods(tmp_path, monkeypatch):
manager = make_manager(tmp_path, monkeypatch)
started = manager.start_task("Long running task", cwd=str(tmp_path))
task_id = started["task"]["hermes_task_id"]
client = FakeCodexClient.instances[0]
steer = manager.steer(task_id, "Only analyze; do not edit.")
interrupt = manager.interrupt(task_id)
assert steer["success"] is True
assert interrupt["task"]["status"] == "cancelled"
assert client.requests[-2][0] == "turn/steer"
assert client.requests[-2][1]["expectedTurnId"] == "turn-1"
assert client.requests[-1][0] == "turn/interrupt"
def test_notify_completed_sends_once_for_targeted_completed_task(tmp_path, monkeypatch):
manager = make_manager(tmp_path, monkeypatch)
started = manager.start_task("Summarize a bug", cwd=str(tmp_path), notify_target="feishu:chat-1")
task_id = started["task"]["hermes_task_id"]
deliveries = []
manager.record_event(
task_id,
"turn/completed",
{"turn": {"id": "turn-1", "status": "completed"}, "message": "Done fixing it."},
)
first = manager.notify_completed(notifier=lambda target, message: deliveries.append((target, message)) or {"ok": True})
second = manager.notify_completed(notifier=lambda target, message: deliveries.append((target, message)) or {"ok": True})
assert first["processed"] == 1
assert first["notifications"][0]["notification_status"] == "sent"
assert first["notifications"][0]["sent"] is True
assert second["processed"] == 0
assert len(deliveries) == 1
assert deliveries[0][0] == "feishu:chat-1"
assert task_id in deliveries[0][1]
assert manager.status(task_id)["task"]["notification_status"] == "sent"
def test_notify_completed_marks_no_target_without_sending(tmp_path, monkeypatch):
manager = make_manager(tmp_path, monkeypatch)
started = manager.start_task("No callback needed", cwd=str(tmp_path))
task_id = started["task"]["hermes_task_id"]
manager.record_event(
task_id,
"turn/completed",
{"turn": {"id": "turn-1", "status": "completed"}, "message": "Done."},
)
result = manager.notify_completed(notifier=lambda _target, _message: (_ for _ in ()).throw(AssertionError("sent")))
assert result["processed"] == 1
assert result["notifications"][0]["notification_status"] == "no_target"
assert result["notifications"][0]["sent"] is False
assert manager.status(task_id)["task"]["notification_status"] == "no_target"
def test_notify_completed_dry_run_does_not_send_or_mark(tmp_path, monkeypatch):
manager = make_manager(tmp_path, monkeypatch)
started = manager.start_task("Preview callback", cwd=str(tmp_path), notify_target="local")
task_id = started["task"]["hermes_task_id"]
manager.record_event(
task_id,
"turn/completed",
{"turn": {"id": "turn-1", "status": "completed"}, "message": "Done."},
)
result = manager.notify_completed(
dry_run=True,
notifier=lambda _target, _message: (_ for _ in ()).throw(AssertionError("sent")),
)
assert result["processed"] == 1
assert result["notifications"][0]["notification_status"] == "dry_run"
assert result["notifications"][0]["sent"] is False
assert manager.status(task_id)["task"]["notification_status"] == "pending"
def test_tool_schema_refuses_danger_full_access():
props = bridge.CODEX_BRIDGE_SCHEMA["parameters"]["properties"]
assert "danger-full-access" not in props["sandbox"]["enum"]
assert "never" not in props["approval_policy"]["enum"]
assert "notify_completed" in props["action"]["enum"]
assert "notify_target" in props

1047
tools/codex_bridge_tool.py Normal file

File diff suppressed because it is too large Load diff

View file

@ -53,7 +53,7 @@ _HERMES_CORE_TOOLS = [
# Clarifying questions
"clarify",
# Code execution + delegation
"execute_code", "delegate_task",
"execute_code", "delegate_task", "codex_bridge",
# Cronjob management
"cronjob",
# Cross-platform messaging (gated on gateway running via check_fn)
@ -193,6 +193,12 @@ TOOLSETS = {
"includes": []
},
"codex_bridge": {
"description": "Run local Codex tasks through Codex app-server JSON-RPC without mailbox files",
"tools": ["codex_bridge"],
"includes": []
},
# "honcho" toolset removed — Honcho is now a memory provider plugin.
# Tools are injected via MemoryManager, not the toolset system.
@ -262,7 +268,7 @@ TOOLSETS = {
"browser_vision", "browser_console", "browser_cdp", "browser_dialog",
"todo", "memory",
"session_search",
"execute_code", "delegate_task",
"execute_code", "delegate_task", "codex_bridge",
],
"includes": []
},
@ -290,7 +296,7 @@ TOOLSETS = {
# Session history search
"session_search",
# Code execution + delegation
"execute_code", "delegate_task",
"execute_code", "delegate_task", "codex_bridge",
# Cronjob management
"cronjob",
# Home Assistant smart home control (gated on HASS_TOKEN via check_fn)