hermes-agent

mirrors/hermes-agent

Fork 0

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-01 01:51:44 +00:00

Commit graph

Author	SHA1	Message	Date
0xbyt4	3b43f7267a	fix: count actual tool calls instead of tool-related messages tool_call_count was inaccurate in two ways: 1. Under-counting: an assistant message with N parallel tool calls (e.g. "kill the light and shut off the fan" = 2 ha_call_service) only incremented tool_call_count by 1 instead of N. 2. Over-counting: tool response messages (role=tool) also incremented tool_call_count, double-counting every tool interaction. Combined: 2 parallel tool calls produced tool_call_count=3 (1 from assistant + 2 from tool responses) instead of the correct value of 2. Fix: only count from assistant messages with tool_calls, incrementing by len(tool_calls) to handle parallel calls correctly. Tool response messages no longer affect tool_call_count. This impacts /insights and /usage accuracy for sessions with tool use.	2026-03-07 04:07:52 +03:00
teknium1	a44e041acf	test: strengthen assertions across 7 test files (batch 1) Replaced weak 'is not None' / '> 0' / 'len >= 1' assertions with concrete value checks across the most flagged test files: gateway/test_pairing.py (11 weak → 0): - Code assertions verify isinstance + len == CODE_LENGTH - Approval results verify dict structure + specific user_id/user_name - Added code2 != code1 check in rate_limit_expires test_hermes_state.py (6 weak → 0): - ended_at verified as float timestamp - Search result counts exact (== 2, not >= 1) - Context verified as non-empty list - Export verified as dict, session ID verified test_cli_init.py (4 weak → 0): - max_turns asserts exact value (60) - model asserts string with provider/name format gateway/test_hooks.py (2 zero-assert tests → fixed): - test_no_handlers_for_event: verifies no handler registered - test_handler_error_does_not_propagate: verifies handler count + return gateway/test_platform_base.py (9 weak image tests → fixed): - extract_images tests now verify actual URL and alt_text - truncate_message verifies content preservation after splitting cron/test_scheduler.py (1 weak → 0): - resolve_origin verifies dict equality, not just existence cron/test_jobs.py (2 weak → 0 + 4 new tests): - Schedule parsing verifies ISO timestamp type - Cron expression verifies result is valid datetime string - NEW: 4 tests for update_job() (was completely untested)	2026-03-05 18:39:37 -08:00
0xbyt4	0ac3af8776	test: add unit tests for 8 untested modules Add comprehensive test coverage for: - cron/jobs.py: schedule parsing, job CRUD, due-job detection (34 tests) - tools/memory_tool.py: security scanning, MemoryStore ops, dispatcher (32 tests) - toolsets.py: resolution, validation, composition, cycle detection (19 tests) - tools/file_operations.py: write deny list, result dataclasses, helpers (37 tests) - agent/prompt_builder.py: context scanning, truncation, skills index (24 tests) - agent/model_metadata.py: token estimation, context lengths (16 tests) - hermes_state.py: SessionDB SQLite CRUD, FTS5 search, export, prune (28 tests) Total: 210 new tests, all passing (380 total suite).	2026-02-26 13:27:58 +03:00

Author

SHA1

Message

Date

0xbyt4

3b43f7267a

fix: count actual tool calls instead of tool-related messages

tool_call_count was inaccurate in two ways:

1. Under-counting: an assistant message with N parallel tool calls
   (e.g. "kill the light and shut off the fan" = 2 ha_call_service)
   only incremented tool_call_count by 1 instead of N.

2. Over-counting: tool response messages (role=tool) also incremented
   tool_call_count, double-counting every tool interaction.

Combined: 2 parallel tool calls produced tool_call_count=3 (1 from
assistant + 2 from tool responses) instead of the correct value of 2.

Fix: only count from assistant messages with tool_calls, incrementing
by len(tool_calls) to handle parallel calls correctly. Tool response
messages no longer affect tool_call_count.

This impacts /insights and /usage accuracy for sessions with tool use.

2026-03-07 04:07:52 +03:00

teknium1

a44e041acf

test: strengthen assertions across 7 test files (batch 1)

Replaced weak 'is not None' / '> 0' / 'len >= 1' assertions with
concrete value checks across the most flagged test files:

gateway/test_pairing.py (11 weak → 0):
  - Code assertions verify isinstance + len == CODE_LENGTH
  - Approval results verify dict structure + specific user_id/user_name
  - Added code2 != code1 check in rate_limit_expires

test_hermes_state.py (6 weak → 0):
  - ended_at verified as float timestamp
  - Search result counts exact (== 2, not >= 1)
  - Context verified as non-empty list
  - Export verified as dict, session ID verified

test_cli_init.py (4 weak → 0):
  - max_turns asserts exact value (60)
  - model asserts string with provider/name format

gateway/test_hooks.py (2 zero-assert tests → fixed):
  - test_no_handlers_for_event: verifies no handler registered
  - test_handler_error_does_not_propagate: verifies handler count + return

gateway/test_platform_base.py (9 weak image tests → fixed):
  - extract_images tests now verify actual URL and alt_text
  - truncate_message verifies content preservation after splitting

cron/test_scheduler.py (1 weak → 0):
  - resolve_origin verifies dict equality, not just existence

cron/test_jobs.py (2 weak → 0 + 4 new tests):
  - Schedule parsing verifies ISO timestamp type
  - Cron expression verifies result is valid datetime string
  - NEW: 4 tests for update_job() (was completely untested)

2026-03-05 18:39:37 -08:00

0xbyt4

0ac3af8776

test: add unit tests for 8 untested modules

Add comprehensive test coverage for:
- cron/jobs.py: schedule parsing, job CRUD, due-job detection (34 tests)
- tools/memory_tool.py: security scanning, MemoryStore ops, dispatcher (32 tests)
- toolsets.py: resolution, validation, composition, cycle detection (19 tests)
- tools/file_operations.py: write deny list, result dataclasses, helpers (37 tests)
- agent/prompt_builder.py: context scanning, truncation, skills index (24 tests)
- agent/model_metadata.py: token estimation, context lengths (16 tests)
- hermes_state.py: SessionDB SQLite CRUD, FTS5 search, export, prune (28 tests)

Total: 210 new tests, all passing (380 total suite).

2026-02-26 13:27:58 +03:00

3 commits