Commit graph

28 commits

Author SHA1 Message Date
Teknium
febc4cfec0
remove Vercel AI Gateway and Vercel Sandbox (#33067)
* remove Vercel AI Gateway provider and Vercel Sandbox terminal backend

Both Vercel-hosted integrations are removed end-to-end. Users on the AI
Gateway should switch to OpenRouter or one of the other aggregators
(Nous Portal, Kilo Code). Users on the Vercel Sandbox backend should
switch to Docker, Modal, Daytona, or SSH.

What's removed:
- `plugins/model-providers/ai-gateway/` provider plugin
- `hermes_cli/vercel_auth.py` Vercel-Sandbox auth helper
- `tools/environments/vercel_sandbox.py` terminal backend
- `ai-gateway` provider wiring across auth, doctor, setup, models,
  config, status, providers, main, web_server, model_normalize, dump
- `vercel_sandbox` backend wiring across terminal_tool, file_tools,
  code_execution_tool, file_operations, approval, skills_tool,
  environments/local, credential_files, lazy_deps, prompt_builder,
  cli, gateway/run
- `AI_GATEWAY_BASE_URL` constant, `_AI_GATEWAY_HEADERS` auxiliary-client
  header set, run_agent base-URL header/reasoning special-cases
- `[vercel]` pyproject extra and `vercel`/`vercel-workers` from uv.lock
- env vars: `AI_GATEWAY_API_KEY`, `AI_GATEWAY_BASE_URL`, `VERCEL_TOKEN`,
  `VERCEL_PROJECT_ID`, `VERCEL_TEAM_ID`, `VERCEL_OIDC_TOKEN`,
  `TERMINAL_VERCEL_RUNTIME`
- Tests: deletes test_ai_gateway_models.py and
  test_vercel_sandbox_environment.py; scrubs references across 23
  surviving test files (no entire tests deleted unless they were
  dedicated to AI Gateway / Sandbox)
- Docs: provider tables, env-var reference, setup guides, security
  notes, tool config, terminal-backend tables — English plus zh-Hans
  i18n parity
- `hermes-agent` skill: provider table entry and remote-backend list

What stays (intentional):
- `popular-web-designs/templates/vercel.md` — CSS design reference,
  unrelated to Vercel-the-AI-product
- `x-vercel-id` in `stream_diag.py` headers — generic Vercel CDN
  response header, useful diag signal on any Vercel-hosted endpoint
- `vercel-labs/agent-browser` URL in browser config — lightpanda
  browser project, different OSS effort
- `userStories.json` historical contributor entry mentioning Vercel
  Sandbox — archive, not active docs

Validation:
- 1153 tests in the 22 targeted files pass (`scripts/run_tests.sh`)
- Full repo `py_compile` clean
- Live import of every touched module + invariant check (no
  `ai-gateway` in `PROVIDER_REGISTRY`, no `_AI_GATEWAY_HEADERS`, no
  `vercel_sandbox` in `_REMOTE_TERMINAL_BACKENDS`)

* test: convert profile-count check from change-detector to invariant

The hardcoded "== 34" assertion broke when ai-gateway was removed.
Per AGENTS.md change-detector-test guidance, assert the relationship
(registry count >= number of plugin dirs) instead of a literal count.
Counts shift when providers are added/removed; that's expected.
2026-05-27 00:43:32 -07:00
Julien Talbot
09afafb87e fix(xai): resolve Grok Build context for OAuth 2026-05-22 13:05:36 -07:00
Alex-wuhu
c76e879574 feat: add NovitaAI as LLM provider
Add NovitaAI as a first-class provider with dedicated model selection
flow, live pricing, and authoritative context length resolution.

- Register provider in PROVIDER_REGISTRY, HERMES_OVERLAYS, and all
  alias/label maps (ID: novita, aliases: novita-ai, novitaai)
- Add dedicated _model_flow_novita() with 3-tier model list fallback:
  Novita API → models.dev → static curated list
- Fetch live pricing from /v1/models with correct unit conversion
  (input_token_price_per_m is 0.0001 USD per Mtok)
- Add Novita-specific context length resolution (step 4b) in
  get_model_context_length(), prioritized over models.dev/OpenRouter
- Register api.novita.ai in _URL_TO_PROVIDER to prevent early return
  from the custom-endpoint code path
- Add models.dev mapping (novita → novita-ai)
- Add default auxiliary model (deepseek/deepseek-v3-0324)
- Add NOVITA_API_KEY to test isolation (conftest.py)
- Update docs: providers page, env vars reference, CLI reference,
  .env.example, README, and landing page
2026-05-13 23:51:15 -07:00
nicoechaniz
e2b713cced fix(model-metadata): skip OpenRouter for known providers, add kimi/moonshot to PROVIDER_TO_MODELS_DEV
Based on PR #23950 by @nicoechaniz.

- Add "kimi" and "moonshot" to PROVIDER_TO_MODELS_DEV → kimi-for-coding
- Gate OpenRouter metadata step behind "if not effective_provider":
  known providers should not be overridden by community-maintained OR data
- Keep the targeted Kimi-family 32k guard as a secondary safety net
  inside the OR gate (for unknown providers with Kimi models)

Co-authored-by: nicoechaniz <nicoechaniz@altermundi.net>
2026-05-11 13:16:07 -07:00
kshitijk4poor
91eef6255e fix: correct context-length resolution for kimi-k2.6 on Ollama Cloud and Kimi Coding
Kimi-k2.6 (which supports 262K context) was incorrectly resolved as 32K,
tripping the 64K minimum-context guard and preventing use of the model on
Ollama Cloud and Kimi Coding / Moonshot providers.

Three fixes in the context-length resolution chain:

1. Ollama Cloud native /api/show query: new _query_ollama_api_show()
   queries the Ollama native API for authoritative GGUF model_info
   context_length.  For hosted Ollama, prefers model_info over num_ctx
   since users can't set their own num_ctx on Cloud.  Added at step 5e
   in get_model_context_length(), before the models.dev fallback.

2. models.dev :cloud/-cloud suffix fallback: lookup_models_dev_context()
   now also tries appending :cloud and -cloud suffixes when the bare
   model name doesn't match.  models.dev stores 'kimi-k2.6:cloud' but
   users and the live API use bare 'kimi-k2.6'.

3. Kimi-family 32K guard: after the OpenRouter metadata step, reject
   exactly 32768 for Kimi-named models (kimi-*, moonshot*) and fall
   through to hardcoded defaults ('kimi': 262144).  OpenRouter reports
   32768 for moonshotai/kimi-k2.6 but the model actually supports 262K.
   Narrow filter — only 32768, only Kimi-family — becomes dead code
   when OpenRouter updates its metadata.

---
2026-05-11 13:16:07 -07:00
Teknium
775c0e22cf
perf(models_dev): cache-first lookup, skip network when disk cache is fresh (#22808)
`fetch_models_dev()` is on the hot path of every `AIAgent.__init__`
(via `context_compressor → get_model_context_length`). The previous
policy was "always try network first, only fall back to disk if
network fails," so every fresh `hermes chat` / `hermes gateway` /
batch / cron process paid 250-500 ms re-fetching a 2 MB JSON registry
that was already on disk from earlier runs.

Add a stage 2 between in-mem and network: if
`models_dev_cache.json` exists and its mtime is younger than the
existing `_MODELS_DEV_CACHE_TTL` (1 hour, same TTL the in-mem cache
already uses), load from disk and skip the network call.

The in-mem TTL is anchored to the disk file's age, so a 50-min-old
cache stays in-memory for only 10 more minutes — no surprise
extension of staleness window.

Invariants preserved:
- `force_refresh=True` still always hits the network and only falls
  back to disk on failure (`hermes config refresh` semantics).
- Missing disk cache → fall through to network (first-ever run).
- Stale disk cache (mtime > TTL) → fall through to network.
- Negative file age (clock skew) → fall through to network.
- Network failure → existing stage-4 stale-disk fallback unchanged.

Measured impact (3-run medians, 9950X3D, fresh process per run):
  fetch_models_dev cold:  256 → 17 ms  (-93%)
  hermes chat -q wall:   4.00 → 3.73 s (-7% median)
                         3.99 → 3.60 s (-10% min)

The chat-end-to-end win is bounded below by API latency variance, but
the fetch_models_dev microbenchmark is the cleanest signal: 239 ms
shaved off every fresh-process agent construction.

Win compounds with the previous perf PRs:
  #22681 google_chat lazy-load
  #22766 doctor parallel + IMDS off
  #22790 gateway.platforms PEP 562

Tests: all 30 `tests/agent/test_models_dev.py` pass (added 4 new ones
covering the new disk-cache-first path, force_refresh override, stale
disk fallback, and missing-disk-cache fall-through). Full `tests/agent/`
suite: 2560 passed, 0 failed.
2026-05-09 13:32:38 -07:00
LeonSGP43
14f38822fa fix(models): prefer image modalities for vision routing 2026-05-07 05:54:12 -07:00
teknium1
40a98fb0fa feat(minimax-oauth): full integration with peer OAuth providers
Close integration gaps discovered by auditing qwen-oauth's file coverage.
These are surfaces the original salvage missed — they all existed on
main and were added in the 747 commits since PR #15203 was opened.

Coverage added:
- agent/credential_pool.py: seed pool from auth.json providers.minimax-oauth
  so `hermes auth list` reflects logged-in state and
  `hermes auth remove minimax-oauth <N>` works through the standard flow.
- agent/credential_sources.py: register RemovalStep for minimax-oauth
  with suppression-aware `_clear_auth_store_provider`.
- agent/models_dev.py: PROVIDER_TO_MODELS_DEV mapping (-> 'minimax' family).
- hermes_cli/providers.py: HermesOverlay entry (anthropic_messages transport,
  oauth_external auth_type, api.minimax.io/anthropic base).
- hermes_cli/model_normalize.py: add to _MATCHING_PREFIX_STRIP_PROVIDERS so
  `minimax-oauth/MiniMax-M2.7` in config.yaml gets correctly repaired.
- hermes_cli/status.py: render MiniMax OAuth block in `hermes doctor`
  (logged-in / region / expires_at / error).
- hermes_cli/web_server.py: register in OAUTH_PROVIDER_REGISTRY + dispatch
  branch in _resolve_provider_status so the dashboard auth page shows it.
- website/docs/integrations/providers.md: full 'MiniMax (OAuth)' section.
- website/docs/reference/cli-commands.md: --provider enum.
- website/docs/user-guide/features/fallback-providers.md: fallback table row.
- scripts/release.py AUTHOR_MAP: amanning3390 mapping (CI gate).
2026-04-29 09:53:42 -07:00
zhzouxiaoya12
3d90292eda fix: normalize provider in list_provider_models to support aliases 2026-04-23 01:59:20 -07:00
hengm3467
c6b1ef4e58 feat: add Step Plan provider support (salvage #6005)
Adds a first-class 'stepfun' API-key provider surfaced as Step Plan:

- Support Step Plan setup for both International and China regions
- Discover Step Plan models live from /step_plan/v1/models, with a
  small coding-focused fallback catalog when discovery is unavailable
- Thread StepFun through provider metadata, setup persistence, status
  and doctor output, auxiliary routing, and model normalization
- Add tests for provider resolution, model validation, metadata
  mapping, and StepFun region/model persistence

Based on #6005 by @hengm3467.

Co-authored-by: hengm3467 <100685635+hengm3467@users.noreply.github.com>
2026-04-22 02:59:58 -07:00
helix4u
a7dd6a3449 fix(gemini): hide stale and low-TPM Google models 2026-04-18 12:52:01 -07:00
helix4u
2eab7ee15f fix(gemini): hide low-TPM Gemma models from exposed lists 2026-04-18 12:52:01 -07:00
kshitijk4poor
1b61ec470b feat: add Ollama Cloud as built-in provider
Add ollama-cloud as a first-class provider with full parity to existing
API-key providers (gemini, zai, minimax, etc.):

- PROVIDER_REGISTRY entry with OLLAMA_API_KEY env var
- Provider aliases: ollama -> custom (local), ollama_cloud -> ollama-cloud
- models.dev integration for accurate context lengths
- URL-to-provider mapping (ollama.com -> ollama-cloud)
- Passthrough model normalization (preserves Ollama model:tag format)
- Default auxiliary model (nemotron-3-nano:30b)
- HermesOverlay in providers.py
- CLI --provider choices, CANONICAL_PROVIDERS entry
- Dynamic model discovery with disk caching (1hr TTL)
- 37 provider-specific tests

Cherry-picked from PR #6038 by kshitijk4poor. Closes #3926
2026-04-16 02:22:09 -07:00
Teknium
8d023e43ed
refactor: remove dead code — 1,784 lines across 77 files (#9180)
Deep scan with vulture, pyflakes, and manual cross-referencing identified:
- 41 dead functions/methods (zero callers in production)
- 7 production-dead functions (only test callers, tests deleted)
- 5 dead constants/variables
- ~35 unused imports across agent/, hermes_cli/, tools/, gateway/

Categories of dead code removed:
- Refactoring leftovers: _set_default_model, _setup_copilot_reasoning_selection,
  rebuild_lookups, clear_session_context, get_logs_dir, clear_session
- Unused API surface: search_models_dev, get_pricing, skills_categories,
  get_read_files_summary, clear_read_tracker, menu_labels, get_spinner_list
- Dead compatibility wrappers: schedule_cronjob, list_cronjobs, remove_cronjob
- Stale debug helpers: get_debug_session_info copies in 4 tool files
  (centralized version in debug_helpers.py already exists)
- Dead gateway methods: send_emote, send_notice (matrix), send_reaction
  (bluebubbles), _normalize_inbound_text (feishu), fetch_room_history
  (matrix), _start_typing_indicator (signal), parse_feishu_post_content
- Dead constants: NOUS_API_BASE_URL, SKILLS_TOOL_DESCRIPTION,
  FILE_TOOLS, VALID_ASPECT_RATIOS, MEMORY_DIR
- Unused UI code: _interactive_provider_selection,
  _interactive_model_selection (superseded by prompt_toolkit picker)

Test suite verified: 609 tests covering affected files all pass.
Tests for removed functions deleted. Tests using removed utilities
(clear_read_tracker, MEMORY_DIR) updated to use internal APIs directly.
2026-04-13 16:32:04 -07:00
Teknium
0e60a9dc25 fix: add kimi-coding-cn to remaining provider touchpoints
Follow-up for salvaged PR #7637. Adds kimi-coding-cn to:
- model_normalize.py (prefix strip)
- providers.py (models.dev mapping)
- runtime_provider.py (credential resolution)
- setup.py (model list + setup label)
- doctor.py (health check)
- trajectory_compressor.py (URL detection)
- models_dev.py (registry mapping)
- integrations/providers.md (docs)
2026-04-13 11:20:37 -07:00
Teknium
078dba015d
fix: three provider-related bugs (#8161, #8181, #8147) (#8243)
- Add openai/openai-codex -> openai mapping to PROVIDER_TO_MODELS_DEV
  so context-length lookups use models.dev data instead of 128k fallback.
  Fixes #8161.

- Set api_mode from custom_providers entry when switching via hermes model,
  and clear stale api_mode when the entry has none. Also extract api_mode
  in _named_custom_provider_map(). Fixes #8181.

- Convert OpenAI image_url content blocks to Anthropic image blocks when
  the endpoint is Anthropic-compatible (MiniMax, MiniMax-CN, or any URL
  containing /anthropic). Fixes #8147.
2026-04-12 01:44:18 -07:00
kshitijk4poor
6693e2a497 feat(xiaomi): add Xiaomi MiMo as first-class provider
Cherry-picked from PR #7702 by kshitijk4poor.

Adds Xiaomi MiMo as a direct provider (XIAOMI_API_KEY) with models:
- mimo-v2-pro (1M context), mimo-v2-omni (256K, multimodal), mimo-v2-flash (256K, cheapest)

Standard OpenAI-compatible provider checklist: auth.py, config.py, models.py,
main.py, providers.py, doctor.py, model_normalize.py, model_metadata.py,
models_dev.py, auxiliary_client.py, .env.example, cli-config.yaml.example.

Follow-up: vision tasks use mimo-v2-omni (multimodal) instead of the user's
main model. Non-vision aux uses the user's selected model. Added
_PROVIDER_VISION_MODELS dict for provider-specific vision model overrides.
On failure, falls back to aggregators (gemini flash) via existing fallback chain.

Corrects pre-existing context lengths: mimo-v2-pro 1048576→1000000,
mimo-v2-omni 1048576→256000, adds mimo-v2-flash 256000.

36 tests covering registry, aliases, auto-detect, credentials, models.dev,
normalization, URL mapping, providers module, doctor, aux client, vision
model override, and agent init.
2026-04-11 11:17:52 -07:00
kshitijk4poor
50bb4fe010 fix(vision): auto-resize oversized images, increase default timeout, fix vision capability detection
Cherry-picked from PR #7749 by kshitijk4poor with modifications:

- Raise hard image limit from 5 MB to 20 MB (matches most restrictive provider)
- Send images at full resolution first; only auto-resize to 5 MB on API failure
- Add _is_image_size_error() helper to detect size-related API rejections
- Auto-resize uses Pillow (soft dep) with progressive downscale + JPEG quality reduction
- Fix get_model_capabilities() to check modalities.input for vision support
- Increase default vision timeout from 30s to 120s (matches hardcoded fallback intent)
- Applied retry-with-resize to both vision_analyze_tool and browser_vision

Closes #7740
2026-04-11 11:12:50 -07:00
alt-glitch
96c060018a fix: remove 115 verified dead code symbols across 46 production files
Automated dead code audit using vulture + coverage.py + ast-grep intersection,
confirmed by Opus deep verification pass. Every symbol verified to have zero
production callers (test imports excluded from reachability analysis).

Removes ~1,534 lines of dead production code across 46 files and ~1,382 lines
of stale test code. 3 entire files deleted (agent/builtin_memory_provider.py,
hermes_cli/checklist.py, tests/hermes_cli/test_setup_model_selection.py).

Co-authored-by: alt-glitch <balyan.sid@gmail.com>
2026-04-10 03:44:43 -07:00
kshitijk4poor
3377017eb4 feat(qwen): add Qwen OAuth provider with portal request support
Based on #6079 by @tunamitom with critical fixes and comprehensive tests.

Changes from #6079:
- Fix: sanitization overwrite bug — Qwen message prep now runs AFTER codex
  field sanitization, not before (was silently discarding Qwen transforms)
- Fix: missing try/except AuthError in runtime_provider.py — stale Qwen
  credentials now fall through to next provider on auto-detect
- Fix: 'qwen' alias conflict — bare 'qwen' stays mapped to 'alibaba'
  (DashScope); use 'qwen-portal' or 'qwen-cli' for the OAuth provider
- Fix: hardcoded ['coder-model'] replaced with live API fetch + curated
  fallback list (qwen3-coder-plus, qwen3-coder)
- Fix: extract _is_qwen_portal() helper + _qwen_portal_headers() to replace
  5 inline 'portal.qwen.ai' string checks and share headers between init
  and credential swap
- Fix: add Qwen branch to _apply_client_headers_for_base_url for mid-session
  credential swaps
- Fix: remove suspicious TypeError catch blocks around _prompt_provider_choice
- Fix: handle bare string items in content lists (were silently dropped)
- Fix: remove redundant dict() copies after deepcopy in message prep
- Revert: unrelated ai-gateway test mock removal and model_switch.py comment deletion

New tests (30 test functions):
- _qwen_cli_auth_path, _read_qwen_cli_tokens (success + 3 error paths)
- _save_qwen_cli_tokens (roundtrip, parent creation, permissions)
- _qwen_access_token_is_expiring (5 edge cases: fresh, expired, within skew,
  None, non-numeric)
- _refresh_qwen_cli_tokens (success, preserve old refresh, 4 error paths,
  default expires_in, disk persistence)
- resolve_qwen_runtime_credentials (fresh, auto-refresh, force-refresh,
  missing token, env override)
- get_qwen_auth_status (logged in, not logged in)
- Runtime provider resolution (direct, pool entry, alias)
- _build_api_kwargs (metadata, vl_high_resolution_images, message formatting,
  max_tokens suppression)
2026-04-08 13:46:30 -07:00
Teknium
187e90e425
refactor: replace inline HERMES_HOME re-implementations with get_hermes_home()
16 callsites across 14 files were re-deriving the hermes home path
via os.environ.get('HERMES_HOME', ...) instead of using the canonical
get_hermes_home() from hermes_constants. This breaks profiles — each
profile has its own HERMES_HOME, and the inline fallback defaults to
~/.hermes regardless.

Fixed by importing and calling get_hermes_home() at each site. For
files already inside the hermes process (agent/, hermes_cli/, tools/,
gateway/, plugins/), this is always safe. Files that run outside the
process context (mcp_serve.py, mcp_oauth.py) already had correct
try/except ImportError fallbacks and were left alone.

Skipped: hermes_constants.py (IS the implementation), env_loader.py
(bootstrap), profiles.py (intentionally manipulates the env var),
standalone scripts (optional-skills/, skills/), and tests.
2026-04-07 10:40:34 -07:00
Teknium
d0ffb111c2
refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821)
Comprehensive cleanup across 80 files based on automated (ruff, pyflakes, vulture)
and manual analysis of the entire codebase.

Changes by category:

Unused imports removed (~95 across 55 files):
- Removed genuinely unused imports from all major subsystems
- agent/, hermes_cli/, tools/, gateway/, plugins/, cron/
- Includes imports in try/except blocks that were truly unused
  (vs availability checks which were left alone)

Unused variables removed (~25):
- Removed dead variables: connected, inner, channels, last_exc,
  source, new_server_names, verify, pconfig, default_terminal,
  result, pending_handled, temperature, loop
- Dropped unused argparse subparser assignments in hermes_cli/main.py
  (12 instances of add_parser() where result was never used)

Dead code removed:
- run_agent.py: Removed dead ternary (None if False else None) and
  surrounding unreachable branch in identity fallback
- run_agent.py: Removed write-only attribute _last_reported_tool
- hermes_cli/providers.py: Removed dead @property decorator on
  module-level function (decorator has no effect outside a class)
- gateway/run.py: Removed unused MCP config load before reconnect
- gateway/platforms/slack.py: Removed dead SessionSource construction

Undefined name bugs fixed (would cause NameError at runtime):
- batch_runner.py: Added missing logger = logging.getLogger(__name__)
- tools/environments/daytona.py: Added missing Dict and Path imports

Unnecessary global statements removed (14):
- tools/terminal_tool.py: 5 functions declared global for dicts
  they only mutated via .pop()/[key]=value (no rebinding)
- tools/browser_tool.py: cleanup thread loop only reads flag
- tools/rl_training_tool.py: 4 functions only do dict mutations
- tools/mcp_oauth.py: only reads the global
- hermes_time.py: only reads cached values

Inefficient patterns fixed:
- startswith/endswith tuple form: 15 instances of
  x.startswith('a') or x.startswith('b') consolidated to
  x.startswith(('a', 'b'))
- len(x)==0 / len(x)>0: 13 instances replaced with pythonic
  truthiness checks (not x / bool(x))
- in dict.keys(): 5 instances simplified to in dict
- Redefined unused name: removed duplicate _strip_mdv2 import in
  send_message_tool.py

Other fixes:
- hermes_cli/doctor.py: Replaced undefined logger.debug() with pass
- hermes_cli/config.py: Consolidated chained .endswith() calls

Test results: 3934 passed, 17 failed (all pre-existing on main),
19 skipped. Zero regressions.
2026-04-07 10:25:31 -07:00
Teknium
cc7136b1ac fix: update Gemini model catalog + wire models.dev as live model source
Follow-up for salvaged PR #5494:
- Update model catalog to Gemini 3.x + Gemma 4 (drop deprecated 2.0)
- Add list_agentic_models() to models_dev.py with noise filter
- Wire models.dev into _model_flow_api_key_provider as primary source
  (static curated list serves as offline fallback)
- Add gemini -> google mapping in PROVIDER_TO_MODELS_DEV
- Fix Gemma 4 context lengths to 256K (models.dev values)
- Update auxiliary model to gemini-3-flash-preview
- Expand tests: 3.x catalog, context lengths, models.dev integration
2026-04-06 10:28:03 -07:00
Teknium
4976a8b066
feat: /model command — models.dev primary database + --provider flag (#5181)
Full overhaul of the model/provider system.

## What changed
- models.dev (109 providers, 4000+ models) as primary database for provider identity AND model metadata
- --provider flag replaces colon syntax for explicit provider switching
- Full ModelInfo/ProviderInfo dataclasses with context, cost, capabilities, modalities
- HermesOverlay system merges models.dev + Hermes-specific transport/auth/aggregator flags
- User-defined endpoints via config.yaml providers: section
- /model (no args) lists authenticated providers with curated model catalog
- Rich metadata display: context window, max output, cost/M tokens, capabilities
- Config migration: custom_providers list → providers dict (v11→v12)
- AIAgent.switch_model() for in-place model swap preserving conversation

## Files
agent/models_dev.py, hermes_cli/providers.py, hermes_cli/model_switch.py,
hermes_cli/model_normalize.py, cli.py, gateway/run.py, run_agent.py,
hermes_cli/config.py, hermes_cli/commands.py
2026-04-05 01:04:44 -07:00
Teknium
3a68ec3172
feat: add Fireworks context length detection support (#4158)
- Add api.fireworks.ai to _URL_TO_PROVIDER for automatic provider detection
- Add fireworks to PROVIDER_TO_MODELS_DEV mapped to 'fireworks-ai' (the
  correct models.dev provider key — original PR used 'fireworks' which
  would silently fail the lookup)


Cherry-picked from PR #3989 with models.dev key fix.

Co-authored-by: sroecker <sroecker@users.noreply.github.com>
2026-03-30 20:37:08 -07:00
Teknium
2dd286c162
fix: write models.dev disk cache atomically (#3588)
Use atomic_json_write() from utils.py instead of plain open()/json.dump()
for the models.dev disk cache. Prevents corrupted cache if the process is
killed mid-write — _load_disk_cache() silently returns {} on corrupt JSON,
losing all model metadata until the next successful API fetch.

Co-authored-by: memosr <memosr@users.noreply.github.com>
2026-03-28 14:20:30 -07:00
Test
55ce601502 fix: 6 bugs in model metadata, reasoning detection, and delegate tool
Cherry-picked from PR #2169 by @0xbyt4.

1. _strip_provider_prefix: skip Ollama model:tag names (qwen:0.5b)
2. Fuzzy match: remove reverse direction that made claude-sonnet-4
   resolve to 1M instead of 200K
3. _has_content_after_think_block: reuse _strip_think_blocks() to
   handle all tag variants (thinking, reasoning, REASONING_SCRATCHPAD)
4. models.dev lookup: elif→if so nous provider also queries models.dev
5. Disk cache fallback: use 5-min TTL instead of full hour so network
   is retried soon
6. Delegate build: wrap child construction in try/finally so
   _last_resolved_tool_names is always restored on exception
2026-03-20 08:52:37 -07:00
Teknium
88643a1ba9
feat: overhaul context length detection with models.dev and provider-aware resolution (#2158)
Replace the fragile hardcoded context length system with a multi-source
resolution chain that correctly identifies context windows per provider.

Key changes:

- New agent/models_dev.py: Fetches and caches the models.dev registry
  (3800+ models across 100+ providers with per-provider context windows).
  In-memory cache (1hr TTL) + disk cache for cold starts.

- Rewritten get_model_context_length() resolution chain:
  0. Config override (model.context_length)
  1. Custom providers per-model context_length
  2. Persistent disk cache
  3. Endpoint /models (local servers)
  4. Anthropic /v1/models API (max_input_tokens, API-key only)
  5. OpenRouter live API (existing, unchanged)
  6. Nous suffix-match via OpenRouter (dot/dash normalization)
  7. models.dev registry lookup (provider-aware)
  8. Thin hardcoded defaults (broad family patterns)
  9. 128K fallback (was 2M)

- Provider-aware context: same model now correctly resolves to different
  context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic,
  128K on GitHub Copilot). Provider name flows through ContextCompressor.

- DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns.
  models.dev replaces the per-model hardcoding.

- CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K]
  to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M.

- hermes model: prompts for context_length when configuring custom
  endpoints. Supports shorthand (32k, 128K). Saved to custom_providers
  per-model config.

- custom_providers schema extended with optional models dict for
  per-model context_length (backward compatible).

- Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against
  OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash
  normalization. Handles all 15 current Nous models.

- Anthropic direct: queries /v1/models for max_input_tokens. Only works
  with regular API keys (sk-ant-api*), not OAuth tokens. Falls through
  to models.dev for OAuth users.

Tests: 5574 passed (18 new tests for models_dev + updated probe tiers)
Docs: Updated configuration.md context length section, AGENTS.md

Co-authored-by: Test <test@test.com>
2026-03-20 06:04:33 -07:00