hermes-agent/hermes_cli/nous_billing.py
Siddharth Balyan 73cd8622f9
feat(billing): /billing terminal billing — interactive TUI + CLI client (#45449)
* feat(billing): nous_billing http client + BillingState core (phase 2b)

Phase 2b terminal-billing client foundation:
- hermes_cli/nous_billing.py: typed client for the 4 /api/billing/* endpoints
  (state/charge/poll/auto-top-up). Raises typed errors (BillingScopeRequired,
  BillingRateLimited, BillingAuthError) mapped from the live-verified contract;
  fail-open is the caller's job. Idempotency-Key enforced client-side.
- agent/billing_view.py: surface-agnostic BillingState core + Decimal money
  parsing (server emits decimal strings, not 2dp), fail-open builder,
  idempotency-key gen, custom-amount validation.
- 51 unit tests (decimal parse/format, payload tiering, error->exception
  matrix, fail-open, amount validation).

Plan: docs/plans/2026-06-13-001-phase-2b-terminal-billing-tui-plan.md

* feat(billing): billing:manage scope + lazy step-up re-auth (phase 2b)

- NOUS_BILLING_MANAGE_SCOPE constant.
- nous_token_has_billing_scope(): split-based scope check (no false-positive
  substring match).
- step_up_nous_billing_scope(): re-runs the device flow requesting
  billing:manage, reusing the held credential's portal/inference URLs + client_id
  (so a preview stays a preview), persists like _login_nous but WITHOUT the model
  picker. Returns True iff the minted token carries the scope (False when NAS
  silently downscopes a non-admin / unticked grant).

Lazy step-up (plan D-A): normal login path unchanged; 403 insufficient_scope
from a billing call triggers this. 7 unit tests.

* feat(billing): billing JSON-RPC methods for the TUI (phase 2b)

billing.state / charge / charge_status / auto_reload / step_up in
tui_gateway/server.py. Return STRUCTURED success envelopes (result.ok +
result.error=<code>) rather than JSON-RPC-level errors, so the Ink rpc() promise
always resolves and the TUI branches on the typed billing error code
(insufficient_scope, rate_limited, no_payment_method, …) to render the right
affordance. Money serialized as decimal STRINGS + display strings. charge mints
+ echoes an idempotency_key for retry reuse. 16 unit tests.

* feat(billing): /billing CLI handler + command registry (phase 2b)

- CommandDef("billing", subcommands=buy|auto-reload|limit), added to
  _SLACK_VIA_HERMES_ONLY so it routes via /hermes on Slack (keeps the 50-cap
  parity test green, same as /credits).
- cli.py::_show_billing + screen helpers: all 5 screens (overview, buy→confirm→
  poll, auto-reload, monthly-limit read-only). Reuses _prompt_text_input_modal /
  _prompt_text_input (D-C). Non-interactive (_app is None) renders text + portal
  deep-link, never prompts (R7). Decimal money end-to-end. 2s/5-min cancellable
  poll loop; 429/503 = retry not failure; settled = ledger truth. Lazy step-up on
  403 insufficient_scope. no_payment_method treated as mainline funnel-to-portal.
- 6 CLI tests; 156 command tests (incl. Slack/Telegram parity) green.

* feat(billing): /billing Ink TUI screens + tests (phase 2b)

- ui-tui/src/app/slash/commands/billing.ts: /billing TUI command covering all 5
  screens — overview (text), buy <amt> → ConfirmReq → charge → non-blocking 2s/
  5-min poll loop → settled/failed/timeout branches, auto-reload <below> <to> →
  ConfirmReq → PATCH, limit (read-only). Reuses the existing ConfirmReq overlay
  (D-C) — no bespoke component. Typed-error envelope branching: insufficient_scope
  arms the lazy step-up confirm; no_payment_method/rate_limited/cap funnel to
  portal. Client-side amount validation mirrors the server (bounds + 2dp).
- gatewayTypes.ts: Billing* response interfaces.
- registry.ts: register billingCommands.
- billingCommand.test.ts: 12 vitest cases (overview/gating/buy-confirm-poll-
  settled/no_payment_method/step-up/limit/auto-reload/validation).

TUI build green; 12/12 vitest pass; slash tests pass once @hermes/ink is built.

* docs(billing): scrub private cross-repo references

NAS is a private repo — remove all references to it from the public PR:
- drop the cross-repo planning doc (planning scaffolding, not a deliverable;
  the PR description documents the design)
- replace 'NAS' / 'PR #412 preview' mentions in code + test comments with
  generic 'the server' / 'a preview deployment'

* docs(billing): scrub final NAS reference in step-up docstring

* docs(billing): drop dangling plan-doc refs

The phase-2b plan doc was removed in the cross-repo scrub (300afcc0b)
but two module docstrings still pointed at it. Drop the dead refs.

* feat(billing): interactive /billing overlay + step-up UX, portal-URL & token fixes

Adds the interactive /billing TUI overlay and hardens the terminal-billing
client across CLI and TUI.

- TUI: full /billing overlay state machine (overview to buy to confirm,
  auto-reload, read-only monthly limit) reusing the existing confirm overlay.
- Step-up: surface the verification link in-transcript and open the browser
  via the TUI's own opener (the device flow runs in the headless gateway, so a
  printed URL was being dropped); run the step-up handler off the main loop and
  emit the link as an out-of-band event so the gateway stays responsive.
- Step-up copy is scope-accurate ("Billing permission granted") and re-checks
  /state so it never claims "enabled" when the org kill-switch is still off.
- Portal deep-links resolve to absolute URLs against the active portal base
  (the server emits them relative) - fixes a bare "/billing?topup=open" link.
- Billing calls refresh an expired access token via the stored refresh token
  instead of reporting a false "not logged in".
- Optimistic funnel: advise "set up a saved card on the portal" up front when
  no card is on file (advisory, not a hard gate).
- Token resolution is cached briefly so the 2s charge poll loop stops
  re-locking + re-reading the auth store on every tick; 401 re-resolves fresh.
- Remove the temporary demo-mode shims.

Validation: 87 Python billing tests, 88 TS tests (billing command + gateway
event handler), tsc clean, ink + ui-tui builds green.

* docs(billing): add /billing TUI screenshots for PR

* fix(cli): guard _last_invalidate on bare instances; update stale prompt-fallback test

The UI-invalidate throttle read self._last_invalidate unconditionally, which
raised AttributeError on HermesCLI instances built without __init__ (the
thread-safety test's object.__new__ shell). Guard the read with getattr.

The off-main-thread branch of _prompt_text_input was changed (#23185) to cancel
cleanly to None instead of falling back to a bare input() that would hang on the
slash-worker thread; the test still asserted the old direct-input fallback.
Update it to assert the current intended behavior: returns None, calls neither
run_in_terminal nor input(), and does not hang.
2026-06-19 01:53:32 +05:30

406 lines
15 KiB
Python

"""Nous Portal terminal-billing HTTP client (Phase 2b).
Thin, fail-loud client for the four ``/api/billing/*`` endpoints the terminal
billing screens drive. Companion to ``hermes_cli/nous_account.py`` (which owns
read-only entitlement/balance) — this module owns the *write* side: buy credits,
poll a charge, configure auto-reload.
Design rules:
- **Money is decimal, never float.** The server emits decimal STRINGS
(``"142.5"`` — not fixed 2dp). We parse with :class:`decimal.Decimal` and never
round-trip through float.
- **This client raises typed exceptions; it does NOT fail open.** Fail-open is the
*caller's* job (the ``agent/billing_view.py`` builders) so each surface can
decide how to degrade. A raw network/HTTP error here surfaces as
:class:`BillingError` (or a subclass) carrying the parsed server ``error`` code,
HTTP status, ``portalUrl`` deep-link, and ``retry_after``.
- **Auth** = the OAuth bearer JWT Hermes already holds for inference
(``get_provider_auth_state("nous")["access_token"]``). No API-key auth on these.
- **Portal base URL** resolves with the same precedence as the device-flow login
(``auth.py``): ``HERMES_PORTAL_BASE_URL`` → ``NOUS_PORTAL_BASE_URL`` → the
stored auth-state ``portal_base_url`` → the registry default. This is how the
E2E run points the client at a preview deployment with zero code change.
"""
from __future__ import annotations
import json
import os
import urllib.error
import urllib.parse
import urllib.request
from typing import Any, Optional
DEFAULT_PORTAL_BASE_URL = "https://portal.nousresearch.com"
# Default HTTP timeout (seconds). Charge/poll calls are quick; keep this tight so
# a hung portal doesn't freeze the TUI.
DEFAULT_TIMEOUT = 15.0
# Scope the privileged billing endpoints require. Mirrored from
# hermes_cli.auth.NOUS_BILLING_MANAGE_SCOPE (kept here too so this module has no
# import-time dependency on the much heavier auth module).
BILLING_MANAGE_SCOPE = "billing:manage"
# =============================================================================
# Typed errors
# =============================================================================
class BillingError(Exception):
"""A billing HTTP call failed.
Carries everything a surface needs to render the right message + affordance:
the server ``error`` code, HTTP ``status``, an optional human ``message``, the
``portalUrl`` deep-link (present on every gate denial), and ``retry_after``
seconds (429/503). ``payload`` is the full parsed JSON body when available.
"""
def __init__(
self,
message: str,
*,
status: Optional[int] = None,
error: Optional[str] = None,
portal_url: Optional[str] = None,
retry_after: Optional[int] = None,
payload: Optional[dict[str, Any]] = None,
) -> None:
super().__init__(message)
self.status = status
self.error = error
self.portal_url = portal_url
self.retry_after = retry_after
self.payload = payload or {}
class BillingScopeRequired(BillingError):
"""``403 insufficient_scope`` — the held token lacks ``billing:manage``.
The lazy step-up trigger: catching this kicks off a fresh device-connect that
requests ``billing:manage`` (and tells the user an ADMIN must tick "Allow
terminal billing"). Also fires mid-session if the scope is stripped on refresh
after the user loses ADMIN.
"""
class BillingRateLimited(BillingError):
"""``429 rate_limited`` or ``503 temporarily_unavailable``.
NOT a payment failure. Carries ``retry_after`` (seconds) — back off and tell
the user "try again in N min"; never auto-retry-spam (the limiter is
5/org/hr + 5/token/hr and easy to dig deeper into).
"""
class BillingAuthError(BillingError):
"""``401`` — missing/invalid bearer token (not logged in / expired)."""
# =============================================================================
# Base-URL + auth resolution
# =============================================================================
def resolve_portal_base_url(state: Optional[dict[str, Any]] = None) -> str:
"""Resolve the portal base URL with login-time precedence.
``HERMES_PORTAL_BASE_URL`` → ``NOUS_PORTAL_BASE_URL`` → stored auth-state
``portal_base_url`` → registry default. Trailing slash stripped.
"""
env = os.getenv("HERMES_PORTAL_BASE_URL") or os.getenv("NOUS_PORTAL_BASE_URL")
if env and env.strip():
return env.strip().rstrip("/")
if state:
stored = state.get("portal_base_url")
if isinstance(stored, str) and stored.strip():
return stored.strip().rstrip("/")
return DEFAULT_PORTAL_BASE_URL
def _absolutize_portal_url(portal_url: Optional[str]) -> Optional[str]:
"""Resolve a (possibly relative) server portalUrl to an absolute URL.
The server emits ``portalUrl`` relative by design (e.g. ``/billing?topup=open``)
— it doesn't know which deployment the client points at. Resolve it against the
client's portal base (preview / staging / prod) so deep-links are clickable.
Idempotent: an already-absolute URL is returned unchanged (urljoin keeps it).
"""
if not (isinstance(portal_url, str) and portal_url.strip()):
return portal_url
base = resolve_portal_base_url()
# urljoin needs a trailing slash on the base to treat it as a directory and
# join an absolute path like "/billing?..." against the host. An already-
# absolute portal_url (with its own scheme/host) is returned as-is.
return urllib.parse.urljoin(base.rstrip("/") + "/", portal_url)
# Short-lived cache for the resolved (token, base). `resolve_nous_access_token`
# acquires two cross-process file locks + reads two files on every call (even on
# its fast path), which is wasteful when the 2s/5-min charge poll loop calls a
# billing endpoint ~150x per purchase. Cache the result briefly: the resolver
# only ever returns a token with >=120s of life (its refresh skew), so a 30s
# cache can never hand back an about-to-expire token. A 401 still surfaces
# normally (the cache holds a valid token, not the HTTP outcome).
_TOKEN_CACHE_TTL_SECONDS = 30.0
_token_cache: tuple[float, str, str] | None = None # (cached_at, token, base)
def _billing_not_logged_in(exc: Optional[BaseException] = None) -> "BillingAuthError":
"""Build the canonical 'not logged in' BillingAuthError (single source)."""
err = BillingAuthError(
"Not logged into Nous Portal — run `hermes portal` to log in.",
status=401,
error="invalid_token",
)
if exc is not None:
err.__cause__ = exc
return err
def _resolve_token_and_base(*, use_cache: bool = True) -> tuple[str, str]:
"""Return ``(access_token, portal_base_url)`` for billing calls.
Uses the same refresh-aware resolver the inference path uses
(``resolve_nous_access_token``), so a short-lived (~15 min) access token that
has expired is transparently refreshed via the stored ``refresh_token``
instead of failing as "not logged in". Raises :class:`BillingAuthError` only
when there is no usable Nous session at all.
The result is cached for ``_TOKEN_CACHE_TTL_SECONDS`` to keep the charge poll
loop from re-locking + re-reading the auth store on every 2s tick. Pass
``use_cache=False`` to force a fresh resolution (e.g. after a 401).
"""
global _token_cache
import time as _time
if use_cache and _token_cache is not None:
cached_at, token, base = _token_cache
if (_time.time() - cached_at) < _TOKEN_CACHE_TTL_SECONDS:
return token, base
try:
from hermes_cli.auth import get_provider_auth_state
state = get_provider_auth_state("nous") or {}
except Exception:
state = {}
base = resolve_portal_base_url(state)
try:
from hermes_cli.auth import AuthError, resolve_nous_access_token
except ImportError:
# auth module unavailable — fall back to the raw stored token.
token = state.get("access_token")
if isinstance(token, str) and token.strip():
resolved = (token.strip(), base)
_token_cache = (_time.time(), *resolved)
return resolved
raise _billing_not_logged_in()
try:
token = resolve_nous_access_token()
except AuthError as exc:
raise _billing_not_logged_in(exc) from exc
resolved = (token.strip(), base)
_token_cache = (_time.time(), *resolved)
return resolved
# =============================================================================
# HTTP plumbing
# =============================================================================
def _retry_after_seconds(headers: Any) -> Optional[int]:
"""Parse a ``Retry-After`` header (integer seconds) — None if absent/bad."""
if headers is None:
return None
try:
raw = headers.get("Retry-After")
except Exception:
raw = None
if raw is None:
return None
try:
return int(str(raw).strip())
except (TypeError, ValueError):
return None
def _raise_for_error(
status: int, payload: dict[str, Any], headers: Any = None
) -> None:
"""Map an HTTP error response to the right typed :class:`BillingError`."""
error = payload.get("error") if isinstance(payload, dict) else None
message = payload.get("message") if isinstance(payload, dict) else None
portal_url = _absolutize_portal_url(
payload.get("portalUrl") if isinstance(payload, dict) else None
)
retry_after = _retry_after_seconds(headers)
common = {
"status": status,
"error": error,
"portal_url": portal_url,
"retry_after": retry_after,
"payload": payload if isinstance(payload, dict) else None,
}
if status == 401:
raise BillingAuthError(message or "Authentication required.", **common)
if status == 403 and error == "insufficient_scope":
raise BillingScopeRequired(
message or "This action needs the billing:manage scope.", **common
)
if status in (429, 503):
raise BillingRateLimited(
message or "Rate limited — try again shortly.", **common
)
raise BillingError(message or error or f"Billing request failed ({status}).", **common)
def _request(
method: str,
path: str,
*,
body: Optional[dict[str, Any]] = None,
extra_headers: Optional[dict[str, str]] = None,
timeout: float = DEFAULT_TIMEOUT,
_retried_auth: bool = False,
) -> dict[str, Any]:
"""Make an authenticated billing request; return the parsed JSON dict.
Raises a typed :class:`BillingError` on any non-2xx response (or transport
failure). 2xx with an empty body returns ``{}``. A 401 triggers exactly one
retry with a freshly-resolved token (bypassing the short token cache) so a
cached-but-just-expired token self-heals instead of failing the call.
"""
token, base = _resolve_token_and_base(use_cache=not _retried_auth)
url = f"{base}{path}"
headers = {
"Authorization": f"Bearer {token}",
"Accept": "application/json",
}
if body is not None:
headers["Content-Type"] = "application/json"
if extra_headers:
headers.update(extra_headers)
data = json.dumps(body).encode("utf-8") if body is not None else None
req = urllib.request.Request(url, data=data, headers=headers, method=method)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
raw = resp.read().decode("utf-8")
return json.loads(raw) if raw.strip() else {}
except urllib.error.HTTPError as exc:
# A 401 on a cached token → drop the cache and retry once with a fresh
# (refresh-aware) resolve before surfacing the auth error.
if exc.code == 401 and not _retried_auth:
global _token_cache
_token_cache = None
return _request(
method,
path,
body=body,
extra_headers=extra_headers,
timeout=timeout,
_retried_auth=True,
)
raw = ""
try:
raw = exc.read().decode("utf-8")
except Exception:
raw = ""
try:
payload = json.loads(raw) if raw.strip() else {}
except json.JSONDecodeError:
payload = {}
_raise_for_error(exc.code, payload, getattr(exc, "headers", None))
raise # unreachable; _raise_for_error always raises
except urllib.error.URLError as exc:
raise BillingError(
f"Could not reach Nous Portal: {exc.reason}", error="network_error"
) from exc
# =============================================================================
# The four endpoints
# =============================================================================
def get_billing_state(*, timeout: float = DEFAULT_TIMEOUT) -> dict[str, Any]:
"""``GET /api/billing/state`` — role-tiered overview (no scope required)."""
return _request("GET", "/api/billing/state", timeout=timeout)
def patch_auto_top_up(
*,
enabled: bool,
threshold: float | str,
top_up_amount: float | str,
timeout: float = DEFAULT_TIMEOUT,
) -> dict[str, Any]:
"""``PATCH /api/billing/auto-top-up`` — configure auto-reload (scope required).
Body is strict server-side: extra keys (``maxMonthlySpend``, a payment method)
are rejected with 400. Numbers are sent as JSON numbers per the contract.
"""
return _request(
"PATCH",
"/api/billing/auto-top-up",
body={
"enabled": bool(enabled),
"threshold": float(threshold),
"topUpAmount": float(top_up_amount),
},
timeout=timeout,
)
def post_charge(
*,
amount_usd: float | str,
idempotency_key: str,
timeout: float = DEFAULT_TIMEOUT,
) -> dict[str, Any]:
"""``POST /api/billing/charge`` — buy credits (scope required).
``Idempotency-Key`` header is MANDATORY (a missing header is a server 400, not
a default): generate a UUID per user-confirmed purchase and reuse it on retry.
Returns ``202 {chargeId}`` — money is NOT confirmed yet; poll with
:func:`get_charge_status`.
"""
if not (isinstance(idempotency_key, str) and idempotency_key.strip()):
raise BillingError(
"Idempotency-Key is required for a charge.",
error="idempotency_key_required",
)
return _request(
"POST",
"/api/billing/charge",
body={"amountUsd": float(amount_usd)},
extra_headers={"Idempotency-Key": idempotency_key.strip()},
timeout=timeout,
)
def get_charge_status(
charge_id: str, *, timeout: float = DEFAULT_TIMEOUT
) -> dict[str, Any]:
"""``GET /api/billing/charge/{id}`` — poll a charge (scope required).
Returns ``{status: "pending"|"settled"|"failed", ...}``. An unknown or foreign
id returns ``{status:"pending"}`` (never 404, never another org's data) — so a
``pending`` that never resolves past the 5-min cap is a *timeout*, not an error.
"""
if not (isinstance(charge_id, str) and charge_id.strip()):
raise BillingError("A charge id is required.", error="invalid_charge_id")
# urllib does not need manual quoting for the opaque ids the server mints, but
# guard against a stray slash that would change the path shape.
safe_id = urllib.parse.quote(charge_id.strip(), safe="")
return _request("GET", f"/api/billing/charge/{safe_id}", timeout=timeout)