hermes-agent/plugins/memory/hindsight
Teknium 4e89c53082
fix(async): close unscheduled coroutines in all threadsafe bridges (#26584)
Wraps every sync->async coroutine-scheduling site in the codebase with a
new agent.async_utils.safe_schedule_threadsafe() helper that closes the
coroutine on scheduling failure (closed loop, shutdown race, etc.)
instead of leaking it as 'coroutine was never awaited' RuntimeWarnings
plus reference leaks.

22 production call sites migrated across the codebase:
- acp_adapter/events.py, acp_adapter/permissions.py
- agent/lsp/manager.py
- cron/scheduler.py (media + text delivery paths)
- gateway/platforms/feishu.py (5 sites, via existing _submit_on_loop helper
  which now delegates to safe_schedule_threadsafe)
- gateway/run.py (10 sites: telegram rename, agent:step hook, status
  callback, interim+bg-review, clarify send, exec-approval button+text,
  temp-bubble cleanup, channel-directory refresh)
- plugins/memory/hindsight, plugins/platforms/google_chat
- tools/browser_supervisor.py (3), browser_cdp_tool.py,
  computer_use/cua_backend.py, slash_confirm.py
- tools/environments/modal.py (_AsyncWorker)
- tools/mcp_tool.py (2 + 8 _run_on_mcp_loop callers converted to
  factory-style so the coroutine is never constructed on a dead loop)
- tui_gateway/ws.py

Tests: new tests/agent/test_async_utils.py covers helper behavior under
live loop, dead loop, None loop, and scheduling exceptions. Regression
tests added at three PR-original sites (acp events, acp permissions,
mcp loop runner) mirroring contributor's intent.

Live-tested end-to-end:
- Helper stress test: 1500 schedules across live/dead/race scenarios,
  zero leaked coroutines
- Race exercised: 5000 schedules with loop killed mid-flight, 100 ok /
  4900 None returns, zero leaks
- hermes chat -q with terminal tool call (exercises step_callback bridge)
- MCP probe against failing subprocess servers + factory path
- Real gateway daemon boot + SIGINT shutdown across multiple platform
  adapter inits
- WSTransport 100 live + 50 dead-loop writes
- Cron delivery path live + dead loop

Salvages PR #2657 — adopts contributor's intent over a much wider site
list and a single centralized helper instead of inline try/except at
each site. 3 of the original PR's 6 sites no longer exist on main
(environments/patches.py deleted, DingTalk refactored to native async);
the equivalent fix lives in tools/environments/modal.py instead.

Co-authored-by: JithendraNara <jithendranaidunara@gmail.com>
2026-05-15 14:00:01 -07:00
..
__init__.py fix(async): close unscheduled coroutines in all threadsafe bridges (#26584) 2026-05-15 14:00:01 -07:00
plugin.yaml feat(hindsight): feature parity, setup wizard, and config improvements 2026-04-08 23:54:15 -07:00
README.md feat(hindsight): optional bank_id_template for per-agent / per-user banks 2026-04-24 03:38:17 -07:00

Hindsight Memory Provider

Long-term memory with knowledge graph, entity resolution, and multi-strategy retrieval. Supports cloud, local embedded, and local external modes.

Requirements

  • Cloud: API key from ui.hindsight.vectorize.io
  • Local Embedded: API key for a supported LLM provider (OpenAI, Anthropic, Gemini, Groq, OpenRouter, MiniMax, Ollama, or any OpenAI-compatible endpoint). Embeddings and reranking run locally — no additional API keys needed.
  • Local External: A running Hindsight instance (Docker or self-hosted) reachable over HTTP.

Setup

hermes memory setup    # select "hindsight"

The setup wizard will install dependencies automatically via uv and walk you through configuration.

Or manually (cloud mode with defaults):

hermes config set memory.provider hindsight
echo "HINDSIGHT_API_KEY=your-key" >> ~/.hermes/.env

Cloud

Connects to the Hindsight Cloud API. Requires an API key from ui.hindsight.vectorize.io.

Local Embedded

Hermes spins up a local Hindsight daemon with built-in PostgreSQL. Requires an LLM API key for memory extraction and synthesis. The daemon starts automatically in the background on first use and stops after 5 minutes of inactivity.

Supports any OpenAI-compatible LLM endpoint (llama.cpp, vLLM, LM Studio, etc.) — pick openai_compatible as the provider and enter the base URL.

Daemon startup logs: ~/.hermes/logs/hindsight-embed.log Daemon runtime logs: ~/.hindsight/profiles/<profile>.log

To open the Hindsight web UI (local embedded mode only):

hindsight-embed -p hermes ui start

Local External

Points the plugin at an existing Hindsight instance you're already running (Docker, self-hosted, etc.). No daemon management — just a URL and an optional API key.

Config

Config file: ~/.hermes/hindsight/config.json

Connection

Key Default Description
mode cloud cloud, local_embedded, or local_external
api_url https://api.hindsight.vectorize.io API URL (cloud and local_external modes)

Memory Bank

Key Default Description
bank_id hermes Memory bank name (static fallback used when bank_id_template is unset or resolves empty)
bank_id_template Optional template to derive the bank name dynamically. Placeholders: {profile}, {workspace}, {platform}, {user}, {session}. Example: hermes-{profile} isolates memory per active Hermes profile. Empty placeholders collapse cleanly (e.g. hermes-{user} with no user becomes hermes).
bank_mission Reflect mission (identity/framing for reflect reasoning). Applied via Banks API.
bank_retain_mission Retain mission (steers what gets extracted). Applied via Banks API.

Recall

Key Default Description
recall_budget mid Recall thoroughness: low / mid / high
recall_prefetch_method recall Auto-recall method: recall (raw facts) or reflect (LLM synthesis)
recall_max_tokens 4096 Maximum tokens for recall results
recall_max_input_chars 800 Maximum input query length for auto-recall
recall_prompt_preamble Custom preamble for recalled memories in context
recall_tags Tags to filter when searching memories
recall_tags_match any Tag matching mode: any / all / any_strict / all_strict
auto_recall true Automatically recall memories before each turn

Retain

Key Default Description
auto_retain true Automatically retain conversation turns
retain_async true Process retain asynchronously on the Hindsight server
retain_every_n_turns 1 Retain every N turns (1 = every turn)
retain_context conversation between Hermes Agent and the User Context label for retained memories
retain_tags Default tags applied to retained memories; merged with per-call tool tags
retain_source Optional metadata.source attached to retained memories
retain_user_prefix User Label used before user turns in auto-retained transcripts
retain_assistant_prefix Assistant Label used before assistant turns in auto-retained transcripts

Integration

Key Default Description
memory_mode hybrid How memories are integrated into the agent

memory_mode:

  • hybrid — automatic context injection + tools available to the LLM
  • context — automatic injection only, no tools exposed
  • tools — tools only, no automatic injection

Local Embedded LLM

Key Default Description
llm_provider openai openai, anthropic, gemini, groq, openrouter, minimax, ollama, lmstudio, openai_compatible
llm_model per-provider Model name (e.g. gpt-4o-mini, qwen/qwen3.5-9b)
llm_base_url Endpoint URL for openai_compatible (e.g. http://192.168.1.10:8080/v1)

The LLM API key is stored in ~/.hermes/.env as HINDSIGHT_LLM_API_KEY.

Tools

Available in hybrid and tools memory modes:

Tool Description
hindsight_retain Store information with auto entity extraction; supports optional per-call tags
hindsight_recall Multi-strategy search (semantic + entity graph)
hindsight_reflect Cross-memory synthesis (LLM-powered)

Environment Variables

Variable Description
HINDSIGHT_API_KEY API key for Hindsight Cloud
HINDSIGHT_LLM_API_KEY LLM API key for local mode
HINDSIGHT_API_LLM_BASE_URL LLM Base URL for local mode (e.g. OpenRouter)
HINDSIGHT_API_URL Override API endpoint
HINDSIGHT_BANK_ID Override bank name
HINDSIGHT_BUDGET Override recall budget
HINDSIGHT_MODE Override mode (cloud, local_embedded, local_external)

Client Version

Requires hindsight-client >= 0.4.22. The plugin auto-upgrades on session start if an older version is detected.