hermes-agent/tests/integration
Teknium ee8cbfdc03
feat(web_extract): truncate-and-store instead of LLM summarization (#54843)
* feat(web_extract): truncate-and-store instead of LLM summarization

web_extract no longer runs an auxiliary LLM over scraped pages. The extract
backends (Firecrawl/Tavily/Exa/Parallel) already return clean, boilerplate-
stripped markdown, so we return it directly: pages within a char budget
(default 15000, web.extract_char_limit) come back whole; larger pages get a
head+tail window plus an explicit footer giving the stored full-text path and
the read_file call to page through the omitted middle. The full clean text is
written to cache/web (mounted read-only into remote backends like the other
cache dirs), so nothing is lost.

Inline base64 images are converted to [IMAGE: alt] placeholders (token bombs
dropped) while real http(s) image URLs are preserved as links so the agent can
still web_extract/vision_analyze them.

Removes process_content_with_llm + the chunked summarizer + check_auxiliary_model
+ _resolve_web_extract_auxiliary. context_references._default_url_fetcher is
updated to the truncate path and its stale data.documents shape read is fixed
to results (it was silently returning empty).

Live before/after eval (firecrawl, 4 URLs): 11.7x faster overall (176.6s ->
15.1s); 10-60x on large pages. Quality identical; findability 4/4 (answer
recoverable from stored full text on every truncated page). web_search is
unchanged.

No own scraper added; no changes to web_search.

* fix(web_extract): add char_limit to execute_code web_extract stub

The new web_extract char_limit param must appear in the code_execution_tool
_TOOL_STUBS signature (and doc line) or test_stubs_cover_all_schema_params
fails — the stub schema must cover every real schema param.
2026-06-29 10:00:49 -07:00
..
__init__.py test: reorganize test structure and add missing unit tests 2026-02-26 03:20:08 +03:00
test_batch_runner.py test: reorganize test structure and add missing unit tests 2026-02-26 03:20:08 +03:00
test_checkpoint_resumption.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_daytona_terminal.py test(daytona): add unit and integration tests for Daytona backend 2026-03-05 10:26:22 -08:00
test_ha_integration.py refactor(gateway): migrate Home Assistant adapter to bundled plugin 2026-06-06 11:46:24 -07:00
test_modal_terminal.py refactor: remove dead code — 1,784 lines across 77 files (#9180) 2026-04-13 16:32:04 -07:00
test_voice_channel_flow.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_web_tools.py feat(web_extract): truncate-and-store instead of LLM summarization (#54843) 2026-06-29 10:00:49 -07:00