mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-25 00:51:20 +00:00
Add context compression feature for long conversations
- Implemented automatic context compression to manage long conversations that approach the model's context limit. - Configured the feature to summarize middle turns while protecting the first three and last four turns, ensuring important context is retained. - Added configuration options in `cli-config.yaml` and environment variables for enabling/disabling compression and setting thresholds. - Updated documentation in `README.md`, `cli.md`, and `.env.example` to explain the context compression functionality and its configuration. - Enhanced the `cli.py` to load compression settings into environment variables, ensuring seamless integration with the CLI. - Completed the implementation of context compression as outlined in the TODO list, marking it as a significant enhancement to conversation management.
This commit is contained in:
parent
bbeed5b5d1
commit
9b4d9452ba
7 changed files with 614 additions and 12 deletions
|
|
@ -112,6 +112,33 @@ browser:
|
|||
# after this period of no activity between agent loops (default: 120 = 2 minutes)
|
||||
inactivity_timeout: 120
|
||||
|
||||
# =============================================================================
|
||||
# Context Compression (Auto-shrinks long conversations)
|
||||
# =============================================================================
|
||||
# When conversation approaches model's context limit, middle turns are
|
||||
# automatically summarized to free up space while preserving important context.
|
||||
#
|
||||
# HOW IT WORKS:
|
||||
# 1. Tracks actual token usage from API responses (not estimates)
|
||||
# 2. When prompt_tokens >= threshold% of model's context_length, triggers compression
|
||||
# 3. Protects first 3 turns (system prompt, initial request, first response)
|
||||
# 4. Protects last 4 turns (recent context is most relevant)
|
||||
# 5. Summarizes middle turns using a fast/cheap model
|
||||
# 6. Inserts summary as a user message, continues conversation seamlessly
|
||||
#
|
||||
compression:
|
||||
# Enable automatic context compression (default: true)
|
||||
# Set to false if you prefer to manage context manually or want errors on overflow
|
||||
enabled: true
|
||||
|
||||
# Trigger compression at this % of model's context limit (default: 0.85 = 85%)
|
||||
# Lower values = more aggressive compression, higher values = compress later
|
||||
threshold: 0.85
|
||||
|
||||
# Model to use for generating summaries (fast/cheap recommended)
|
||||
# This model compresses the middle turns into a concise summary
|
||||
summary_model: "google/gemini-2.0-flash-001"
|
||||
|
||||
# =============================================================================
|
||||
# Agent Behavior
|
||||
# =============================================================================
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue