hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

History

Teknium 9f5c13f874 design: compression eval harness — add three scrubbed fixtures + scrubber Adds scripts/compression_eval/ with a design doc, README, a placeholder run_eval.py, and three checked-in scrubbed session fixtures. No working eval yet — PR is for design review before implementation. Motivation: we edit agent/context_compressor.py prompts and _template_sections by hand and ship without any automated check that compression still preserves file paths, error codes, or the active task. Factory.ai's Dec 2025 write-up (https://factory.ai/news/evaluating-compression) documents a probe-based eval scored on six dimensions. We adopt the methodology; we do not publish scores. Contents: - DESIGN.md — fixture format, probe format (recall / artifact / continuation / decision), six grading dimensions, report format, cost expectations, scrubber pipeline, open questions, and staged follow-up PR plan. - README.md — short 'what this is / when to run it' page. - run_eval.py — placeholder that prints 'not implemented, see DESIGN.md' and exits 1. - scrub_fixtures.py — reproducible pipeline that converts real sessions from ~/.hermes/sessions/*.jsonl into public-safe JSON fixtures. Applies: redact_sensitive_text, username path normalization, personal handle scrubbing, email and git-author normalization, reasoning scratchpad / <think> stripping, platform user-mention scrubbing, first-user paraphrase, system-prompt placeholder, orphan-message pruning, and tool-output size truncation for fixture readability. - fixtures/feature-impl-context-priority.json — 75 msgs / ~17k tokens. Investigate → patch → test → PR → merge. - fixtures/debug-session-feishu-id-model.json — 59 msgs / ~13k tokens. PR triage + upstream docs + decision. - fixtures/config-build-competitive-scouts.json — 61 msgs / ~23k tokens. Iterative config accumulation (11 cron jobs across 7 weekdays). PII audit: zero matches across the three fixtures for the maintainer's handle (all case variants), personal email domains, and known contributor emails. Only 'contributor@example.com' placeholder remains. Why scripts/: requires API credentials, costs ~\$1 per run, LLM-graded (non-deterministic), must not run in CI. scripts/sample_and_compress.py is the existing precedent for offline credentialed tooling.		2026-04-24 07:40:42 -07:00
..
compression_eval	design: compression eval harness — add three scrubbed fixtures + scrubber	2026-04-24 07:40:42 -07:00
lib	feat: lazy bootstrap node	2026-04-16 10:47:37 -05:00
whatsapp-bridge	Update whatsapp-bridge package-lock.json	2026-04-22 18:16:08 -07:00
build_skills_index.py	feat(skills): centralized skills index — eliminate GitHub API calls for search/install	2026-04-12 16:39:04 -07:00
contributor_audit.py	feat(ci): add contributor attribution check on PRs (#9376 )	2026-04-13 21:13:08 -07:00
discord-voice-doctor.py	fix(scripts): read gateway_voice_mode.json as UTF-8	2026-04-22 17:34:29 -07:00
hermes-gateway	fix: prevent systemd restart storm on gateway connection failure	2026-03-21 09:26:39 -07:00
install.cmd	feat: Windows native support via Git Bash	2026-03-02 22:03:29 -08:00
install.ps1	chore: defer WhatsApp bridge install to first use (#12992 )	2026-04-20 04:55:33 -07:00
install.sh	fix(install): quote PYTHON_PATH and UV_CMD for paths with spaces on macOS (#10009 )	2026-04-20 05:03:14 -07:00
kill_modal.sh	refactor: replace swe-rex with native Modal SDK for Modal backend (#3538 )	2026-03-28 11:21:44 -07:00
release.py	chore(release): map Group B contributors in AUTHOR_MAP	2026-04-24 07:14:00 -07:00
run_tests.sh	test: make test env hermetic; enforce CI parity via scripts/run_tests.sh (#11577 )	2026-04-17 06:09:09 -07:00
sample_and_compress.py	refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821 )	2026-04-07 10:25:31 -07:00