hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-03 12:23:08 +00:00

History

Teknium 3800972dd0 feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955 ) When the active main model has native vision and the provider supports multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini 3, OpenRouter, Nous), vision_analyze loads the image bytes and returns them to the model as a multimodal tool-result envelope. The model then sees the pixels directly on its next turn instead of receiving a lossy text description from an auxiliary LLM. Falls back to the legacy aux-LLM text path for non-vision models and unverified providers. Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and Cline. All four converge on the same pattern: tool results carry image content blocks for vision-capable provider/model combinations. Changes - tools/vision_tools.py: _vision_analyze_native fast path + provider capability table (_supports_media_in_tool_results). Schema description updated to reflect new behaviour. - agent/codex_responses_adapter.py: function_call_output.output now accepts the array form for multimodal tool results (was string-only). Preflight validates input_text/input_image parts. - agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so tools see the live CLI/gateway override, not the stale config.yaml default. set_runtime_main()/clear_runtime_main() helpers. - run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn start so vision_analyze's fast-path check sees the actual runtime. - tests/conftest.py: clear runtime-main override between tests. Tests - tests/tools/test_vision_native_fast_path.py: provider capability table, envelope shape, fast-path gating (vision-capable model uses fast path; non-vision model falls through to aux). - tests/run_agent/test_codex_multimodal_tool_result.py: list tool content becomes function_call_output.output array; preflight preserves arrays and drops unknown part types. Live verified - Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a typed filepath, gets pixels back, reads exact text from images that no aux description could capture (font color irony, multi-line fruit-count list, etc.). PR replaces the closed prior efforts (#16506 shipped the inbound user- attached path; this PR closes the gap for tool-discovered images).		2026-05-09 21:06:19 -07:00
..
transports	feat(openrouter): wire Pareto Code router with min_coding_score knob (#22838 )	2026-05-09 14:47:00 -07:00
__init__.py	Refactor Terminal and AIAgent cleanup	2026-02-21 22:31:43 -08:00
account_usage.py	feat(account-usage): add per-provider account limits module	2026-04-21 01:56:35 -07:00
anthropic_adapter.py	feat(computer-use): cua-driver backend, universal any-model schema	2026-05-08 11:07:38 -07:00
auxiliary_client.py	feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955 )	2026-05-09 21:06:19 -07:00
bedrock_adapter.py	fix(bedrock): preserve reasoningContent across converse normalization	2026-05-07 05:17:16 -07:00
codex_responses_adapter.py	feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955 )	2026-05-09 21:06:19 -07:00
context_compressor.py	fix(context_compressor): treat streaming premature-close as transient error	2026-05-09 17:52:51 -07:00
context_engine.py	fix(compress): don't reach into ContextCompressor privates from /compress (#15039 )	2026-04-24 02:55:43 -07:00
context_references.py	fix(agent): fall back when rg is blocked for @folder references	2026-04-20 01:56:41 -07:00
copilot_acp_client.py	feat(cross-platform): psutil for PID/process management + Windows footgun checker	2026-05-08 14:27:40 -07:00
credential_pool.py	fix(auth): shorten credential 401 cooldown	2026-05-07 06:15:33 -07:00
credential_sources.py	feat(minimax-oauth): full integration with peer OAuth providers	2026-04-29 09:53:42 -07:00
curator.py	feat(curator): show rename map in user-visible summary (#22910 )	2026-05-09 18:43:40 -07:00
curator_backup.py	fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671 ) (#18731 )	2026-05-02 01:29:57 -07:00
display.py	feat(computer-use): background focus-safe backend — set_value, structured windows, MIME detection	2026-05-08 11:07:38 -07:00
error_classifier.py	fix(error_classifier): classify generic-typed timeout messages as transient (carve-out of #22664 )	2026-05-09 17:54:07 -07:00
file_safety.py	fix(security): apply file safety to copilot acp fs	2026-04-21 01:31:58 -07:00
gemini_cloudcode_adapter.py	chore: remove unused imports and dead locals (ruff F401, F841) (#17010 )	2026-04-28 06:46:45 -07:00
gemini_native_adapter.py	fix(gemini): extract usageMetadata from streaming chunks for token tracking	2026-05-04 02:33:30 -07:00
gemini_schema.py	chore: remove unused imports and dead locals (ruff F401, F841) (#17010 )	2026-04-28 06:46:45 -07:00
google_code_assist.py	chore: remove unused imports and dead locals (ruff F401, F841) (#17010 )	2026-04-28 06:46:45 -07:00
google_oauth.py	fix(google_oauth): close TOCTOU window when saving credentials	2026-05-04 03:16:19 -07:00
i18n.py	feat(i18n): add Turkish (tr) locale	2026-05-05 17:29:12 -07:00
image_gen_provider.py	feat(plugins): pluggable image_gen backends + OpenAI provider (#13799 )	2026-04-21 21:30:10 -07:00
image_gen_registry.py	feat(plugins): pluggable image_gen backends + OpenAI provider (#13799 )	2026-04-21 21:30:10 -07:00
image_routing.py	fix(image-routing): sniff magic bytes for image MIME, ignore misleading suffix	2026-05-07 05:58:11 -07:00
insights.py	Merge branch 'main' into feat/dashboard-skill-analytics	2026-04-20 05:25:49 -07:00
lmstudio_reasoning.py	feat(agent): add lmstudio integration	2026-04-28 12:27:36 -07:00
manual_compression_feedback.py	fix(compression): include system prompt + tool schemas in token estimates (#18265 )	2026-04-30 23:03:54 -07:00
memory_manager.py	fix: salvage batch — compaction guidance, memory authority, cache eviction after compression	2026-05-05 22:33:45 -07:00
memory_provider.py	docs(agent): remove stale BuiltinMemoryProvider references from memory module docstrings	2026-05-05 13:33:49 -07:00
model_metadata.py	fix(model-metadata): align hy3-preview static fallback + delete change-detector test (#22805 )	2026-05-09 13:37:19 -07:00
models_dev.py	perf(models_dev): cache-first lookup, skip network when disk cache is fresh (#22808 )	2026-05-09 13:32:38 -07:00
moonshot_schema.py	fix(moonshot): also strip nullable/enum after anyOf collapse	2026-04-30 23:14:31 -07:00
nous_rate_guard.py	codebase: add encoding='utf-8' to all bare open() calls (PLW1514)	2026-05-08 14:27:40 -07:00
onboarding.py	docs(onboarding): lead OpenClaw residue banner with migrate, warn that cleanup breaks OpenClaw (#17507 )	2026-04-29 08:08:36 -07:00
prompt_builder.py	fix(memory): tighten MEMORY_GUIDANCE against ephemeral PR/issue/SHA notes (#22781 )	2026-05-09 12:48:25 -07:00
prompt_caching.py	fix(prompt-caching): skip top-level cache_control on role:tool for OpenRouter	2026-03-21 16:54:43 -07:00
rate_limit_tracker.py	refactor: remove dead code — 1,784 lines across 77 files (#9180 )	2026-04-13 16:32:04 -07:00
redact.py	feat(security): enable secret redaction by default (#17691 , #20785 ) (#21193 )	2026-05-07 05:10:33 -07:00
retry_utils.py	feat(agent): add jittered retry backoff	2026-04-08 00:41:36 -07:00
shell_hooks.py	codebase: add encoding='utf-8' to all bare open() calls (PLW1514)	2026-05-08 14:27:40 -07:00
skill_commands.py	fix(skills): rescan skill_commands cache when platform scope changes (#18739 )	2026-05-02 01:36:53 -07:00
skill_preprocessing.py	fix(skills): apply inline shell in skill_view	2026-04-24 15:15:07 -07:00
skill_utils.py	perf(cli): cut ~19s from 'hermes' cold start (skills cache + lazy Feishu + no Nous HTTP) (#22138 )	2026-05-08 16:39:32 -07:00
subdirectory_hints.py	fix(agent): catch PermissionError in subdirectory hint discovery	2026-04-09 03:10:30 -07:00
think_scrubber.py	fix(agent): stateful streaming scrubber for reasoning-block leaks (#17924 ) (#20184 )	2026-05-05 04:33:38 -07:00
title_generator.py	fix: improve telegram topic mode setup	2026-05-04 12:07:17 -07:00
tool_guardrails.py	fix(guardrails): preserve display _detect_tool_failure semantics	2026-04-30 20:43:15 -07:00
trajectory.py	Refactor Terminal and AIAgent cleanup	2026-02-21 22:31:43 -08:00
usage_pricing.py	fix(analytics): prevent silent token loss and add Claude 4.5–4.7 pricing (#21455 )	2026-05-07 13:24:31 -07:00