## What does this PR do?
The trajectory compressor could corrupt training trajectories by cutting a
conversation in the middle of a tool-call/tool-response pair. In the from/value
trajectory format a `tool` turn (carrying `<tool_response>` markers) is always
emitted immediately after the `gpt` turn whose `<tool_call>` it answers, so the
two turns must stay together. The compressible region's end boundary, however,
was chosen purely by token accumulation: the loop stopped at the first turn where
the accumulated tokens met the savings target, with no regard for turn roles. For
any over-budget trajectory whose savings boundary happened to land between a `gpt`
turn and its `tool` turn, the `gpt` (with its `<tool_call>`) was summarised away
into the replacement `human` message while the now-orphaned `tool` turn (with its
`<tool_response>`) was kept verbatim in the tail — producing an unmatched marker
and silently corrupting the training signal. The head boundary had the mirror
problem when the first tool turn was not protected.
This change snaps both compression boundaries to a clean turn boundary before the
region is extracted and replaced, so the summary always covers whole gpt+tool
blocks and a `tool` turn is never separated from the `gpt` turn that precedes it.
The boundary is moved forward when possible (folding an orphaned tool turn into
the region that already holds its gpt) and falls back to moving backward when no
clean boundary exists ahead, such as when the protected tail itself begins on a
tool turn.
## Related Issue
N/A
## Type of Change
- [x] 🐛 Bug fix (non-breaking change that fixes an issue)
## Changes Made
- `trajectory_compressor.py`: added `_is_boundary_clean()` and `_snap_boundary()`
helpers on `TrajectoryCompressor`, and applied them to both the head and tail
compression boundaries in `compress_trajectory()` and
`compress_trajectory_async()`. When snapping collapses the region to nothing
safe to compress, the trajectory is returned unchanged and flagged as still
over the limit rather than being corrupted.
- `tests/test_trajectory_compressor.py`: added `TestCompressionToolPairIntegrity`
covering the sync and async paths plus direct unit tests for the boundary
snapping (forward skip and backward fallback).
## How to Test
1. Run the focused tests: `pytest tests/test_trajectory_compressor.py -q`.
2. The new sync/async cases build a trajectory of gpt/tool pairs with an oversized
middle gpt turn and choose a token target that forces the accumulation
boundary to stop between a `<tool_call>` and its `<tool_response>`. They assert
that `<tool_call>` and `<tool_response>` markers stay balanced after
compression and that every kept `tool` turn is immediately preceded by a `gpt`
turn (never the inserted summary or another tool turn).
## Checklist
### Code
- [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md)
- [x] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.)
- [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate
- [x] My PR contains **only** changes related to this fix/feature (no unrelated commits)
- [x] I've run `pytest tests/ -q` and all tests pass
- [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features)
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)
### Documentation & Housekeeping
- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
Remove unused imports (F401) and duplicate/shadowed import
redefinitions (F811) across the codebase using ruff's safe
autofixes. No behavioral changes -- imports only.
- ~1400 safe autofixes applied across 644 files (net -1072 lines)
- __init__.py re-exports preserved (excluded from F401 removal so
public re-export surfaces stay intact)
- Re-exports that are imported or monkeypatched by tests but look
unused in their defining module are kept with explicit # noqa:
F401 (gateway/run.py load_dotenv; run_agent re-exports from
agent.message_sanitization, agent.context_compressor,
agent.retry_utils, agent.prompt_builder, agent.process_bootstrap,
agent.codex_responses_adapter)
- Unsafe F841 (unused-variable) fixes deliberately skipped -- those
can change behavior when the RHS has side effects
- ruff lints remain disabled in pyproject.toml (only PLW1514 is
selected); this is a one-time cleanup, not a config change
Verification:
- python -m compileall: clean
- pytest --collect-only: all 27161 tests collect (zero import errors)
- core entry points import clean (run_agent, model_tools, cli,
toolsets, hermes_state, batch_runner, gateway)
- static scan: every name any test imports directly from an edited
module still resolves
Kimi's gateway selects the correct temperature server-side based on the
active mode (thinking -> 1.0, non-thinking -> 0.6). Sending any
temperature value — even the previously "correct" one — conflicts with
gateway-managed defaults.
Replaces the old approach of forcing specific temperature values (0.6
for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel
that tells all call sites to strip the temperature key from API kwargs
entirely.
Changes:
- agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model()
prefix check (covers all kimi-* models), _fixed_temperature_for_model()
returns sentinel for kimi models. _build_call_kwargs() strips temp.
- run_agent.py: _build_api_kwargs, flush_memories, and summary generation
paths all handle the sentinel by popping/omitting temperature.
- trajectory_compressor.py: _effective_temperature_for_model returns None
for kimi (sentinel mapped), direct client calls use kwargs dict to
conditionally include temperature.
- mini_swe_runner.py: same sentinel handling via wrapper function.
- 6 test files updated: all 'forces temperature X' assertions replaced
with 'temperature not in kwargs' assertions.
Net: -76 lines (171 added, 247 removed).
Inspired by PR #13137 (@kshitijk4poor).
Follow up salvaged PR #12668 by threading base_url through the
remaining direct-call sites so kimi-k2.5 uses temperature=1.0 on
api.moonshot.ai and keeps 0.6 on api.kimi.com/coding. Add focused
regression tests for run_agent, trajectory_compressor, and
mini_swe_runner.
Normalize summary-model content before stripping so empty or non-string
responses do not trigger retry/fallback paths. Adds sync and async
regression tests for None content.