mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-09 08:21:50 +00:00
Parse x-ratelimit-* headers from inference API responses (Nous Portal,
OpenRouter, OpenAI-compatible) and display them in the /usage command.
- New agent/rate_limit_tracker.py: parse 12 rate limit headers (RPM/RPH/
TPM/TPH limits, remaining, reset timers), format as progress bars (CLI)
or compact one-liner (gateway)
- Hook into streaming path in run_agent.py: stream.response.headers is
available on the OpenAI SDK Stream object before chunks are consumed
- CLI /usage: appends rate limit section with progress bars + warnings
when any bucket exceeds 80%
- Gateway /usage: appends compact rate limit summary
- 24 unit tests covering parsing, formatting, edge cases
Headers captured per response:
x-ratelimit-{limit,remaining,reset}-{requests,tokens}{,-1h}
Example CLI display:
Nous Rate Limits (captured just now):
Requests/min [░░░░░░░░░░░░░░░░░░░░] 0.1% 1/800 used (799 left, resets in 59s)
Tokens/hr [░░░░░░░░░░░░░░░░░░░░] 0.0% 49/336.0M (336.0M left, resets in 52m)
|
||
|---|---|---|
| .. | ||
| __init__.py | ||
| anthropic_adapter.py | ||
| auxiliary_client.py | ||
| builtin_memory_provider.py | ||
| context_compressor.py | ||
| context_references.py | ||
| copilot_acp_client.py | ||
| credential_pool.py | ||
| display.py | ||
| insights.py | ||
| memory_manager.py | ||
| memory_provider.py | ||
| model_metadata.py | ||
| models_dev.py | ||
| prompt_builder.py | ||
| prompt_caching.py | ||
| rate_limit_tracker.py | ||
| redact.py | ||
| retry_utils.py | ||
| skill_commands.py | ||
| skill_utils.py | ||
| smart_model_routing.py | ||
| subdirectory_hints.py | ||
| title_generator.py | ||
| trajectory.py | ||
| usage_pricing.py | ||