feat: add Codex fast mode toggle (/fast command)

Add /fast slash command to toggle OpenAI Codex service_tier between normal and priority ('fast') inference. Only exposed for models registered in _FAST_MODE_BACKEND_CONFIG (currently gpt-5.4). - Registry-based backend config for extensibility - Dynamic command visibility (hidden from help/autocomplete for non-supported models) via command_filter on SlashCommandCompleter - service_tier flows through request_overrides from route resolution - Omit max_output_tokens for Codex backend (rejects it) - Persists to config.yaml under agent.service_tier Salvage cleanup: removed simple_term_menu/input() menu (banned), bare /fast now shows status like /reasoning. Removed redundant override resolution in _build_api_kwargs — single source of truth via request_overrides from route. Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-04-30 01:41:43 +00:00 · 2026-04-09 18:10:57 -07:00 · 2026-04-09 18:10:57 -07:00 · d416a69288
commit d416a69288
parent 4caa635803
9 changed files with 473 additions and 5 deletions
--- a/tests/run_agent/test_run_agent_codex_responses.py
+++ b/tests/run_agent/test_run_agent_codex_responses.py
@ -648,6 +648,15 @@ def test_preflight_codex_api_kwargs_allows_reasoning_and_temperature(monkeypatch
    assert result["max_output_tokens"] == 4096


+def test_preflight_codex_api_kwargs_allows_service_tier(monkeypatch):
+    agent = _build_agent(monkeypatch)
+    kwargs = _codex_request_kwargs()
+    kwargs["service_tier"] = "priority"
+
+    result = agent._preflight_codex_api_kwargs(kwargs)
+    assert result["service_tier"] == "priority"
+
+
 def test_run_conversation_codex_replay_payload_keeps_call_id(monkeypatch):
    agent = _build_agent(monkeypatch)
    responses = [_codex_tool_call_response(), _codex_message_response("done")]