hermes-agent/.gitignore
Teknium 9f5c13f874
design: compression eval harness — add three scrubbed fixtures + scrubber
Adds scripts/compression_eval/ with a design doc, README, a placeholder
run_eval.py, and three checked-in scrubbed session fixtures. No working
eval yet — PR is for design review before implementation.

Motivation: we edit agent/context_compressor.py prompts and
_template_sections by hand and ship without any automated check that
compression still preserves file paths, error codes, or the active task.
Factory.ai's Dec 2025 write-up
(https://factory.ai/news/evaluating-compression) documents a probe-based
eval scored on six dimensions. We adopt the methodology; we do not publish
scores.

Contents:
- DESIGN.md — fixture format, probe format (recall / artifact /
  continuation / decision), six grading dimensions, report format,
  cost expectations, scrubber pipeline, open questions, and staged
  follow-up PR plan.
- README.md — short 'what this is / when to run it' page.
- run_eval.py — placeholder that prints 'not implemented, see
  DESIGN.md' and exits 1.
- scrub_fixtures.py — reproducible pipeline that converts real sessions
  from ~/.hermes/sessions/*.jsonl into public-safe JSON fixtures.
  Applies: redact_sensitive_text, username path normalization, personal
  handle scrubbing, email and git-author normalization, reasoning
  scratchpad / <think> stripping, platform user-mention scrubbing,
  first-user paraphrase, system-prompt placeholder, orphan-message
  pruning, and tool-output size truncation for fixture readability.
- fixtures/feature-impl-context-priority.json — 75 msgs / ~17k tokens.
  Investigate → patch → test → PR → merge.
- fixtures/debug-session-feishu-id-model.json — 59 msgs / ~13k tokens.
  PR triage + upstream docs + decision.
- fixtures/config-build-competitive-scouts.json — 61 msgs / ~23k tokens.
  Iterative config accumulation (11 cron jobs across 7 weekdays).

PII audit: zero matches across the three fixtures for the maintainer's
handle (all case variants), personal email domains, and known contributor
emails. Only 'contributor@example.com' placeholder remains.

Why scripts/: requires API credentials, costs ~\$1 per run, LLM-graded
(non-deterministic), must not run in CI. scripts/sample_and_compress.py
is the existing precedent for offline credentialed tooling.
2026-04-24 07:40:42 -07:00

75 lines
1.3 KiB
Text

.DS_Store
/venv/
/_pycache/
*.pyc*
__pycache__/
.venv/
.vscode/
.env
.env.local
.env.development.local
.env.test.local
.env.production.local
.env.development
.env.test
export*
__pycache__/model_tools.cpython-310.pyc
__pycache__/web_tools.cpython-310.pyc
logs/
data/
.pytest_cache/
tmp/
temp_vision_images/
hermes-*/*
examples/
tests/quick_test_dataset.jsonl
tests/sample_dataset.jsonl
run_datagen_kimik2-thinking.sh
run_datagen_megascience_glm4-6.sh
run_datagen_sonnet.sh
source-data/*
run_datagen_megascience_glm4-6.sh
data/*
node_modules/
browser-use/
agent-browser/
# Private keys
*.ppk
*.pem
privvy*
images/
__pycache__/
hermes_agent.egg-info/
wandb/
testlogs
# CLI config (may contain sensitive SSH paths)
cli-config.yaml
# Skills Hub state (lives in ~/.hermes/skills/.hub/ at runtime, but just in case)
skills/.hub/
ignored/
.worktrees/
environments/benchmarks/evals/
# Compression eval run outputs (harness lives in scripts/compression_eval/)
scripts/compression_eval/results/*
!scripts/compression_eval/results/.gitkeep
# Web UI build output
hermes_cli/web_dist/
# Web UI assets — synced from @nous-research/ui at build time via
# `npm run sync-assets` (see web/package.json).
web/public/fonts/
web/public/ds-assets/
# Release script temp files
.release_notes.md
mini-swe-agent/
# Nix
.direnv/
.nix-stamps/
result
website/static/api/skills-index.json