From 25757d631b493381c22efe45984655b06ae97651 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Nicol=C3=B2=20Boschi?= <boschi1997@gmail.com>
Date: Thu, 9 Apr 2026 07:27:31 +0200
Subject: [PATCH] feat(hindsight): feature parity, setup wizard, and config
 improvements
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Port missing features from the hindsight-hermes external integration
package into the native plugin. Only touches plugin files — no core
changes.

Features:
- Tags on retain/recall (tags, recall_tags, recall_tags_match)
- Recall config (recall_max_tokens, recall_max_input_chars, recall_types,
  recall_prompt_preamble)
- Retain controls (retain_every_n_turns, auto_retain, auto_recall,
  retain_async via aretain_batch, retain_context)
- Bank config via Banks API (bank_mission, bank_retain_mission)
- Structured JSON retain with per-message timestamps
- Full session accumulation with document_id for dedup
- Custom post_setup() wizard with curses picker
- Mode-aware dep install (hindsight-client for cloud, hindsight-all for local)
- local_external mode and openai_compatible LLM provider
- OpenRouter support with auto base URL
- Auto-upgrade of hindsight-client to >=0.4.22 on session start
- Comprehensive debug logging across all operations
- 46 unit tests
- Updated README and website docs
---
 plugins/memory/hindsight/README.md            |  74 ++-
 plugins/memory/hindsight/__init__.py          | 449 +++++++++++--
 plugins/memory/hindsight/plugin.yaml          |   6 +-
 .../plugins/memory/test_hindsight_provider.py | 598 ++++++++++++++++++
 .../user-guide/features/memory-providers.md   |  18 +-
 5 files changed, 1072 insertions(+), 73 deletions(-)
 create mode 100644 tests/plugins/memory/test_hindsight_provider.py
diff --git a/plugins/memory/hindsight/README.md b/plugins/memory/hindsight/README.md
index 3a1df59e4..024a99303 100644
--- a/plugins/memory/hindsight/README.md
+++ b/plugins/memory/hindsight/README.md
@@ -1,11 +1,12 @@
 # Hindsight Memory Provider
 
-Long-term memory with knowledge graph, entity resolution, and multi-strategy retrieval. Supports cloud and local (embedded) modes.
+Long-term memory with knowledge graph, entity resolution, and multi-strategy retrieval. Supports cloud, local embedded, and local external modes.
 
 ## Requirements
 
 - **Cloud:** API key from [ui.hindsight.vectorize.io](https://ui.hindsight.vectorize.io)
-- **Local:** API key for a supported LLM provider (OpenAI, Anthropic, Gemini, Groq, MiniMax, or Ollama). Embeddings and reranking run locally — no additional API keys needed.
+- **Local Embedded:** API key for a supported LLM provider (OpenAI, Anthropic, Gemini, Groq, OpenRouter, MiniMax, Ollama, or any OpenAI-compatible endpoint). Embeddings and reranking run locally — no additional API keys needed.
+- **Local External:** A running Hindsight instance (Docker or self-hosted) reachable over HTTP.
 
 ## Setup
 
@@ -21,17 +22,28 @@ hermes config set memory.provider hindsight
 echo "HINDSIGHT_API_KEY=your-key" >> ~/.hermes/.env
 ```
 
-### Cloud Mode
+### Cloud
 
 Connects to the Hindsight Cloud API. Requires an API key from [ui.hindsight.vectorize.io](https://ui.hindsight.vectorize.io).
 
-### Local Mode
+### Local Embedded
 
-Runs an embedded Hindsight server with built-in PostgreSQL. Requires an LLM API key (e.g. Groq, OpenAI, Anthropic) for memory extraction and synthesis. The daemon starts automatically in the background on first use and stops after 5 minutes of inactivity.
+Hermes spins up a local Hindsight daemon with built-in PostgreSQL. Requires an LLM API key for memory extraction and synthesis. The daemon starts automatically in the background on first use and stops after 5 minutes of inactivity.
+
+Supports any OpenAI-compatible LLM endpoint (llama.cpp, vLLM, LM Studio, etc.) — pick `openai_compatible` as the provider and enter the base URL.
 
 Daemon startup logs: `~/.hermes/logs/hindsight-embed.log`
 Daemon runtime logs: `~/.hindsight/profiles/<profile>.log`
 
+To open the Hindsight web UI (local embedded mode only):
+```bash
+hindsight-embed -p hermes ui start
+```
+
+### Local External
+
+Points the plugin at an existing Hindsight instance you're already running (Docker, self-hosted, etc.). No daemon management — just a URL and an optional API key.
+
 ## Config
 
 Config file: `~/.hermes/hindsight/config.json`
@@ -40,40 +52,58 @@ Config file: `~/.hermes/hindsight/config.json`
 
 | Key | Default | Description |
 |-----|---------|-------------|
-| `mode` | `cloud` | `cloud` or `local` |
-| `api_url` | `https://api.hindsight.vectorize.io` | API URL (cloud mode) |
-| `api_url` | `http://localhost:8888` | API URL (local mode, unused — daemon manages its own port) |
+| `mode` | `cloud` | `cloud`, `local_embedded`, or `local_external` |
+| `api_url` | `https://api.hindsight.vectorize.io` | API URL (cloud and local_external modes) |
 
-### Memory
+### Memory Bank
 
 | Key | Default | Description |
 |-----|---------|-------------|
 | `bank_id` | `hermes` | Memory bank name |
-| `budget` | `mid` | Recall thoroughness: `low` / `mid` / `high` |
+| `bank_mission` | — | Reflect mission (identity/framing for reflect reasoning). Applied via Banks API. |
+| `bank_retain_mission` | — | Retain mission (steers what gets extracted). Applied via Banks API. |
+
+### Recall
+
+| Key | Default | Description |
+|-----|---------|-------------|
+| `recall_budget` | `mid` | Recall thoroughness: `low` / `mid` / `high` |
+| `recall_prefetch_method` | `recall` | Auto-recall method: `recall` (raw facts) or `reflect` (LLM synthesis) |
+| `recall_max_tokens` | `4096` | Maximum tokens for recall results |
+| `recall_max_input_chars` | `800` | Maximum input query length for auto-recall |
+| `recall_prompt_preamble` | — | Custom preamble for recalled memories in context |
+| `recall_tags` | — | Tags to filter when searching memories |
+| `recall_tags_match` | `any` | Tag matching mode: `any` / `all` / `any_strict` / `all_strict` |
+| `auto_recall` | `true` | Automatically recall memories before each turn |
+
+### Retain
+
+| Key | Default | Description |
+|-----|---------|-------------|
+| `auto_retain` | `true` | Automatically retain conversation turns |
+| `retain_async` | `true` | Process retain asynchronously on the Hindsight server |
+| `retain_every_n_turns` | `1` | Retain every N turns (1 = every turn) |
+| `retain_context` | `conversation between Hermes Agent and the User` | Context label for retained memories |
+| `tags` | — | Tags applied when storing memories |
 
 ### Integration
 
 | Key | Default | Description |
 |-----|---------|-------------|
 | `memory_mode` | `hybrid` | How memories are integrated into the agent |
-| `prefetch_method` | `recall` | Method for automatic context injection |
 
 **memory_mode:**
 - `hybrid` — automatic context injection + tools available to the LLM
 - `context` — automatic injection only, no tools exposed
 - `tools` — tools only, no automatic injection
 
-**prefetch_method:**
-- `recall` — injects raw memory facts (fast)
-- `reflect` — injects LLM-synthesized summary (slower, more coherent)
-
-### Local Mode LLM
+### Local Embedded LLM
 
 | Key | Default | Description |
 |-----|---------|-------------|
-| `llm_provider` | `openai` | LLM provider: `openai`, `anthropic`, `gemini`, `groq`, `minimax`, `ollama` |
-| `llm_model` | per-provider | Model name (e.g. `gpt-4o-mini`, `openai/gpt-oss-120b`) |
-| `llm_base_url` | — | LLM Base URL override (e.g. `https://openrouter.ai/api/v1`) |
+| `llm_provider` | `openai` | `openai`, `anthropic`, `gemini`, `groq`, `openrouter`, `minimax`, `ollama`, `lmstudio`, `openai_compatible` |
+| `llm_model` | per-provider | Model name (e.g. `gpt-4o-mini`, `qwen/qwen3.5-9b`) |
+| `llm_base_url` | — | Endpoint URL for `openai_compatible` (e.g. `http://192.168.1.10:8080/v1`) |
 
 The LLM API key is stored in `~/.hermes/.env` as `HINDSIGHT_LLM_API_KEY`.
 
@@ -97,4 +127,8 @@ Available in `hybrid` and `tools` memory modes:
 | `HINDSIGHT_API_URL` | Override API endpoint |
 | `HINDSIGHT_BANK_ID` | Override bank name |
 | `HINDSIGHT_BUDGET` | Override recall budget |
-| `HINDSIGHT_MODE` | Override mode (`cloud` / `local`) |
+| `HINDSIGHT_MODE` | Override mode (`cloud`, `local_embedded`, `local_external`) |
+
+## Client Version
+
+Requires `hindsight-client >= 0.4.22`. The plugin auto-upgrades on session start if an older version is detected.
diff --git a/plugins/memory/hindsight/__init__.py b/plugins/memory/hindsight/__init__.py
index c87497745..c39679b73 100644
--- a/plugins/memory/hindsight/__init__.py
+++ b/plugins/memory/hindsight/__init__.py
@@ -28,21 +28,25 @@ from hermes_constants import get_hermes_home
 from typing import Any, Dict, List
 
 from agent.memory_provider import MemoryProvider
+from hermes_constants import get_hermes_home
 from tools.registry import tool_error
 
 logger = logging.getLogger(__name__)
 
 _DEFAULT_API_URL = "https://api.hindsight.vectorize.io"
 _DEFAULT_LOCAL_URL = "http://localhost:8888"
+_MIN_CLIENT_VERSION = "0.4.22"
 _VALID_BUDGETS = {"low", "mid", "high"}
 _PROVIDER_DEFAULT_MODELS = {
     "openai": "gpt-4o-mini",
     "anthropic": "claude-haiku-4-5",
     "gemini": "gemini-2.5-flash",
     "groq": "openai/gpt-oss-120b",
+    "openrouter": "qwen/qwen3.5-9b",
     "minimax": "MiniMax-M2.7",
     "ollama": "gemma3:12b",
     "lmstudio": "local-model",
+    "openai_compatible": "your-model-name",
 }
 
 
@@ -188,6 +192,7 @@ class HindsightMemoryProvider(MemoryProvider):
         self._bank_id = "hermes"
         self._budget = "mid"
         self._mode = "cloud"
+        self._llm_base_url = ""
         self._memory_mode = "hybrid"  # "context", "tools", or "hybrid"
         self._prefetch_method = "recall"  # "recall" or "reflect"
         self._client = None
@@ -195,6 +200,31 @@ class HindsightMemoryProvider(MemoryProvider):
         self._prefetch_lock = threading.Lock()
         self._prefetch_thread = None
         self._sync_thread = None
+        self._session_id = ""
+
+        # Tags
+        self._tags: list[str] | None = None
+        self._recall_tags: list[str] | None = None
+        self._recall_tags_match = "any"
+
+        # Retain controls
+        self._auto_retain = True
+        self._retain_every_n_turns = 1
+        self._retain_context = "conversation between Hermes Agent and the User"
+        self._turn_counter = 0
+        self._session_turns: list[str] = []  # accumulates ALL turns for the session
+
+        # Recall controls
+        self._auto_recall = True
+        self._recall_max_tokens = 4096
+        self._recall_types: list[str] | None = None
+        self._recall_prompt_preamble = ""
+        self._recall_max_input_chars = 800
+
+        # Bank
+        self._bank_mission = ""
+        self._bank_retain_mission: str | None = None
+        self._retain_async = True
 
     @property
     def name(self) -> str:
@@ -204,7 +234,7 @@ class HindsightMemoryProvider(MemoryProvider):
         try:
             cfg = _load_config()
             mode = cfg.get("mode", "cloud")
-            if mode == "local":
+            if mode in ("local", "local_embedded", "local_external"):
                 return True
             has_key = bool(cfg.get("apiKey") or os.environ.get("HINDSIGHT_API_KEY", ""))
             has_url = bool(cfg.get("api_url") or os.environ.get("HINDSIGHT_API_URL", ""))
@@ -228,73 +258,306 @@ class HindsightMemoryProvider(MemoryProvider):
         existing.update(values)
         config_path.write_text(json.dumps(existing, indent=2))
 
+    def post_setup(self, hermes_home: str, config: dict) -> None:
+        """Custom setup wizard — installs only the deps needed for the selected mode."""
+        import getpass
+        import subprocess
+        import shutil
+        import sys
+        from pathlib import Path
+
+        from hermes_cli.config import save_config
+
+        from hermes_cli.memory_setup import _curses_select
+
+        print("\n  Configuring Hindsight memory:\n")
+
+        # Step 1: Mode selection
+        mode_items = [
+            ("Cloud", "Hindsight Cloud API (lightweight, just needs an API key)"),
+            ("Local Embedded", "Run Hindsight locally (downloads ~200MB, needs LLM key)"),
+            ("Local External", "Connect to an existing Hindsight instance"),
+        ]
+        mode_idx = _curses_select("  Select mode", mode_items, default=0)
+        mode = ["cloud", "local_embedded", "local_external"][mode_idx]
+
+        provider_config: dict = {"mode": mode}
+        env_writes: dict = {}
+
+        # Step 2: Install/upgrade deps for selected mode
+        _MIN_CLIENT_VERSION = "0.4.22"
+        cloud_dep = f"hindsight-client>={_MIN_CLIENT_VERSION}"
+        local_dep = "hindsight-all"
+        if mode == "local_embedded":
+            deps_to_install = [local_dep]
+        elif mode == "local_external":
+            deps_to_install = [cloud_dep]
+        else:
+            deps_to_install = [cloud_dep]
+
+        print(f"\n  Checking dependencies...")
+        uv_path = shutil.which("uv")
+        if not uv_path:
+            print("  ⚠ uv not found — install it: curl -LsSf https://astral.sh/uv/install.sh | sh")
+            print(f"  Then run manually: uv pip install --python {sys.executable} {' '.join(deps_to_install)}")
+        else:
+            try:
+                subprocess.run(
+                    [uv_path, "pip", "install", "--python", sys.executable, "--quiet", "--upgrade"] + deps_to_install,
+                    check=True, timeout=120, capture_output=True,
+                )
+                print(f"  ✓ Dependencies up to date")
+            except Exception as e:
+                print(f"  ⚠ Install failed: {e}")
+                print(f"  Run manually: uv pip install --python {sys.executable} {' '.join(deps_to_install)}")
+
+        # Step 3: Mode-specific config
+        if mode == "cloud":
+            print(f"\n  Get your API key at https://ui.hindsight.vectorize.io\n")
+            existing_key = os.environ.get("HINDSIGHT_API_KEY", "")
+            if existing_key:
+                masked = f"...{existing_key[-4:]}" if len(existing_key) > 4 else "set"
+                sys.stdout.write(f"  API key (current: {masked}, blank to keep): ")
+                sys.stdout.flush()
+                api_key = getpass.getpass(prompt="") if sys.stdin.isatty() else sys.stdin.readline().strip()
+            else:
+                sys.stdout.write("  API key: ")
+                sys.stdout.flush()
+                api_key = getpass.getpass(prompt="") if sys.stdin.isatty() else sys.stdin.readline().strip()
+            if api_key:
+                env_writes["HINDSIGHT_API_KEY"] = api_key
+
+            val = input(f"  API URL [{_DEFAULT_API_URL}]: ").strip()
+            if val:
+                provider_config["api_url"] = val
+
+        elif mode == "local_external":
+            val = input(f"  Hindsight API URL [{_DEFAULT_LOCAL_URL}]: ").strip()
+            provider_config["api_url"] = val or _DEFAULT_LOCAL_URL
+
+            sys.stdout.write("  API key (optional, blank to skip): ")
+            sys.stdout.flush()
+            api_key = getpass.getpass(prompt="") if sys.stdin.isatty() else sys.stdin.readline().strip()
+            if api_key:
+                env_writes["HINDSIGHT_API_KEY"] = api_key
+
+        else:  # local_embedded
+            providers_list = list(_PROVIDER_DEFAULT_MODELS.keys())
+            llm_items = [
+                (p, f"default model: {_PROVIDER_DEFAULT_MODELS[p]}")
+                for p in providers_list
+            ]
+            llm_idx = _curses_select("  Select LLM provider", llm_items, default=0)
+            llm_provider = providers_list[llm_idx]
+
+            provider_config["llm_provider"] = llm_provider
+
+            if llm_provider == "openai_compatible":
+                val = input("  LLM endpoint URL (e.g. http://192.168.1.10:8080/v1): ").strip()
+                if val:
+                    provider_config["llm_base_url"] = val
+            elif llm_provider == "openrouter":
+                provider_config["llm_base_url"] = "https://openrouter.ai/api/v1"
+
+            default_model = _PROVIDER_DEFAULT_MODELS.get(llm_provider, "gpt-4o-mini")
+            val = input(f"  LLM model [{default_model}]: ").strip()
+            provider_config["llm_model"] = val or default_model
+
+            sys.stdout.write("  LLM API key: ")
+            sys.stdout.flush()
+            llm_key = getpass.getpass(prompt="") if sys.stdin.isatty() else sys.stdin.readline().strip()
+            if llm_key:
+                env_writes["HINDSIGHT_LLM_API_KEY"] = llm_key
+
+        # Step 4: Save everything
+        provider_config["bank_id"] = "hermes"
+        provider_config["recall_budget"] = "mid"
+        bank_id = "hermes"
+        config["memory"]["provider"] = "hindsight"
+        save_config(config)
+
+        self.save_config(provider_config, hermes_home)
+
+        if env_writes:
+            env_path = Path(hermes_home) / ".env"
+            env_path.parent.mkdir(parents=True, exist_ok=True)
+            existing_lines = []
+            if env_path.exists():
+                existing_lines = env_path.read_text().splitlines()
+            updated_keys = set()
+            new_lines = []
+            for line in existing_lines:
+                key_match = line.split("=", 1)[0].strip() if "=" in line and not line.startswith("#") else None
+                if key_match and key_match in env_writes:
+                    new_lines.append(f"{key_match}={env_writes[key_match]}")
+                    updated_keys.add(key_match)
+                else:
+                    new_lines.append(line)
+            for k, v in env_writes.items():
+                if k not in updated_keys:
+                    new_lines.append(f"{k}={v}")
+            env_path.write_text("\n".join(new_lines) + "\n")
+
+        print(f"\n  ✓ Hindsight memory configured ({mode} mode)")
+        if env_writes:
+            print(f"  API keys saved to .env")
+        print(f"\n  Start a new session to activate.\n")
+
     def get_config_schema(self):
         return [
-            {"key": "mode", "description": "Cloud API or local embedded mode", "default": "cloud", "choices": ["cloud", "local"]},
-            {"key": "api_url", "description": "Hindsight API URL", "default": _DEFAULT_API_URL, "when": {"mode": "cloud"}},
+            {"key": "mode", "description": "Connection mode", "default": "cloud", "choices": ["cloud", "local_embedded", "local_external"]},
+            # Cloud mode
+            {"key": "api_url", "description": "Hindsight Cloud API URL", "default": _DEFAULT_API_URL, "when": {"mode": "cloud"}},
             {"key": "api_key", "description": "Hindsight Cloud API key", "secret": True, "env_var": "HINDSIGHT_API_KEY", "url": "https://ui.hindsight.vectorize.io", "when": {"mode": "cloud"}},
-            {"key": "llm_provider", "description": "LLM provider for local mode", "default": "openai", "choices": ["openai", "anthropic", "gemini", "groq", "minimax", "ollama"], "when": {"mode": "local"}},
-            {"key": "llm_api_key", "description": "LLM API key for local Hindsight", "secret": True, "env_var": "HINDSIGHT_LLM_API_KEY", "when": {"mode": "local"}},
-            {"key": "llm_base_url", "description": "LLM Base URL (e.g. for OpenRouter)", "default": "", "env_var": "HINDSIGHT_API_LLM_BASE_URL", "when": {"mode": "local"}},
-            {"key": "llm_model", "description": "LLM model for local mode", "default": "gpt-4o-mini", "default_from": {"field": "llm_provider", "map": _PROVIDER_DEFAULT_MODELS}, "when": {"mode": "local"}},
+            # Local external mode
+            {"key": "api_url", "description": "Hindsight API URL", "default": _DEFAULT_LOCAL_URL, "when": {"mode": "local_external"}},
+            {"key": "api_key", "description": "API key (optional)", "secret": True, "env_var": "HINDSIGHT_API_KEY", "when": {"mode": "local_external"}},
+            # Local embedded mode
+            {"key": "llm_provider", "description": "LLM provider", "default": "openai", "choices": ["openai", "anthropic", "gemini", "groq", "openrouter", "minimax", "ollama", "lmstudio", "openai_compatible"], "when": {"mode": "local_embedded"}},
+            {"key": "llm_base_url", "description": "Endpoint URL (e.g. http://192.168.1.10:8080/v1)", "default": "", "when": {"mode": "local_embedded", "llm_provider": "openai_compatible"}},
+            {"key": "llm_api_key", "description": "LLM API key (optional for openai_compatible)", "secret": True, "env_var": "HINDSIGHT_LLM_API_KEY", "when": {"mode": "local_embedded"}},
+            {"key": "llm_model", "description": "LLM model", "default": "gpt-4o-mini", "default_from": {"field": "llm_provider", "map": _PROVIDER_DEFAULT_MODELS}, "when": {"mode": "local_embedded"}},
             {"key": "bank_id", "description": "Memory bank name", "default": "hermes"},
-            {"key": "budget", "description": "Recall thoroughness", "default": "mid", "choices": ["low", "mid", "high"]},
+            {"key": "bank_mission", "description": "Mission/purpose description for the memory bank"},
+            {"key": "bank_retain_mission", "description": "Custom extraction prompt for memory retention"},
+            {"key": "recall_budget", "description": "Recall thoroughness", "default": "mid", "choices": ["low", "mid", "high"]},
             {"key": "memory_mode", "description": "Memory integration mode", "default": "hybrid", "choices": ["hybrid", "context", "tools"]},
-            {"key": "prefetch_method", "description": "Auto-recall method", "default": "recall", "choices": ["recall", "reflect"]},
+            {"key": "recall_prefetch_method", "description": "Auto-recall method", "default": "recall", "choices": ["recall", "reflect"]},
+            {"key": "tags", "description": "Tags applied when storing memories (comma-separated)", "default": ""},
+            {"key": "recall_tags", "description": "Tags to filter when searching memories (comma-separated)", "default": ""},
+            {"key": "recall_tags_match", "description": "Tag matching mode for recall", "default": "any", "choices": ["any", "all", "any_strict", "all_strict"]},
+            {"key": "auto_recall", "description": "Automatically recall memories before each turn", "default": True},
+            {"key": "auto_retain", "description": "Automatically retain conversation turns", "default": True},
+            {"key": "retain_every_n_turns", "description": "Retain every N turns (1 = every turn)", "default": 1},
+            {"key": "retain_async","description": "Process retain asynchronously on the Hindsight server", "default": True},
+            {"key": "retain_context", "description": "Context label for retained memories", "default": "conversation between Hermes Agent and the User"},
+            {"key": "recall_max_tokens", "description": "Maximum tokens for recall results", "default": 4096},
+            {"key": "recall_max_input_chars", "description": "Maximum input query length for auto-recall", "default": 800},
+            {"key": "recall_prompt_preamble", "description": "Custom preamble for recalled memories in context"},
         ]
 
     def _get_client(self):
         """Return the cached Hindsight client (created once, reused)."""
         if self._client is None:
-            if self._mode == "local":
+            if self._mode == "local_embedded":
                 from hindsight import HindsightEmbedded
-                # Disable __del__ on the class to prevent "attached to a
-                # different loop" errors during GC — we handle cleanup in
-                # shutdown() instead.
                 HindsightEmbedded.__del__ = lambda self: None
+                llm_provider = self._config.get("llm_provider", "")
+                if llm_provider in ("openai_compatible", "openrouter"):
+                    llm_provider = "openai"
+                logger.debug("Creating HindsightEmbedded client (profile=%s, provider=%s)",
+                             self._config.get("profile", "hermes"), llm_provider)
                 kwargs = dict(
                     profile=self._config.get("profile", "hermes"),
-                    llm_provider=self._config.get("llm_provider", ""),
-                    llm_api_key=self._config.get("llm_api_key") or os.environ.get("HINDSIGHT_LLM_API_KEY", ""),
+                    llm_provider=llm_provider,
+                    llm_api_key=self._config.get("llmApiKey") or self._config.get("llm_api_key") or os.environ.get("HINDSIGHT_LLM_API_KEY", ""),
                     llm_model=self._config.get("llm_model", ""),
                 )
-                base_url = self._config.get("llm_base_url") or os.environ.get("HINDSIGHT_API_LLM_BASE_URL", "")
-                if base_url:
-                    kwargs["llm_base_url"] = base_url
+                if self._llm_base_url:
+                    kwargs["llm_base_url"] = self._llm_base_url
                 self._client = HindsightEmbedded(**kwargs)
             else:
                 from hindsight_client import Hindsight
                 kwargs = {"base_url": self._api_url, "timeout": 30.0}
                 if self._api_key:
                     kwargs["api_key"] = self._api_key
+                logger.debug("Creating Hindsight cloud client (url=%s, has_key=%s)",
+                             self._api_url, bool(self._api_key))
                 self._client = Hindsight(**kwargs)
         return self._client
 
     def initialize(self, session_id: str, **kwargs) -> None:
+        self._session_id = session_id
+
+        # Check client version and auto-upgrade if needed
+        try:
+            from importlib.metadata import version as pkg_version
+            from packaging.version import Version
+            installed = pkg_version("hindsight-client")
+            if Version(installed) < Version(_MIN_CLIENT_VERSION):
+                logger.warning("hindsight-client %s is outdated (need >=%s), attempting upgrade...",
+                               installed, _MIN_CLIENT_VERSION)
+                import shutil, subprocess, sys
+                uv_path = shutil.which("uv")
+                if uv_path:
+                    try:
+                        subprocess.run(
+                            [uv_path, "pip", "install", "--python", sys.executable,
+                             "--quiet", "--upgrade", f"hindsight-client>={_MIN_CLIENT_VERSION}"],
+                            check=True, timeout=120, capture_output=True,
+                        )
+                        logger.info("hindsight-client upgraded to >=%s", _MIN_CLIENT_VERSION)
+                    except Exception as e:
+                        logger.warning("Auto-upgrade failed: %s. Run: uv pip install 'hindsight-client>=%s'",
+                                       e, _MIN_CLIENT_VERSION)
+                else:
+                    logger.warning("uv not found. Run: pip install 'hindsight-client>=%s'", _MIN_CLIENT_VERSION)
+        except Exception:
+            pass  # packaging not available or other issue — proceed anyway
+
         self._config = _load_config()
         self._mode = self._config.get("mode", "cloud")
-        self._api_key = self._config.get("apiKey") or os.environ.get("HINDSIGHT_API_KEY", "")
-        default_url = _DEFAULT_LOCAL_URL if self._mode == "local" else _DEFAULT_API_URL
+        # "local" is a legacy alias for "local_embedded"
+        if self._mode == "local":
+            self._mode = "local_embedded"
+        self._api_key = self._config.get("apiKey") or self._config.get("api_key") or os.environ.get("HINDSIGHT_API_KEY", "")
+        default_url = _DEFAULT_LOCAL_URL if self._mode in ("local_embedded", "local_external") else _DEFAULT_API_URL
         self._api_url = self._config.get("api_url") or os.environ.get("HINDSIGHT_API_URL", default_url)
+        self._llm_base_url = self._config.get("llm_base_url", "")
 
         banks = self._config.get("banks", {}).get("hermes", {})
         self._bank_id = self._config.get("bank_id") or banks.get("bankId", "hermes")
-        budget = self._config.get("budget") or banks.get("budget", "mid")
+        budget = self._config.get("recall_budget") or self._config.get("budget") or banks.get("budget", "mid")
         self._budget = budget if budget in _VALID_BUDGETS else "mid"
 
         memory_mode = self._config.get("memory_mode", "hybrid")
         self._memory_mode = memory_mode if memory_mode in ("context", "tools", "hybrid") else "hybrid"
 
-        prefetch_method = self._config.get("prefetch_method", "recall")
+        prefetch_method = self._config.get("recall_prefetch_method", "recall")
         self._prefetch_method = prefetch_method if prefetch_method in ("recall", "reflect") else "recall"
 
-        logger.info("Hindsight initialized: mode=%s, api_url=%s, bank=%s, budget=%s, memory_mode=%s, prefetch_method=%s",
-                     self._mode, self._api_url, self._bank_id, self._budget, self._memory_mode, self._prefetch_method)
+        # Bank options
+        self._bank_mission = self._config.get("bank_mission", "")
+        self._bank_retain_mission = self._config.get("bank_retain_mission") or None
+
+        # Tags
+        self._tags = self._config.get("tags") or None
+        self._recall_tags = self._config.get("recall_tags") or None
+        self._recall_tags_match = self._config.get("recall_tags_match", "any")
+
+        # Retain controls
+        self._auto_retain = self._config.get("auto_retain", True)
+        self._retain_every_n_turns = max(1, int(self._config.get("retain_every_n_turns", 1)))
+        self._retain_context = self._config.get("retain_context", "conversation between Hermes Agent and the User")
+
+        # Recall controls
+        self._auto_recall = self._config.get("auto_recall", True)
+        self._recall_max_tokens = int(self._config.get("recall_max_tokens", 4096))
+        self._recall_types = self._config.get("recall_types") or None
+        self._recall_prompt_preamble = self._config.get("recall_prompt_preamble", "")
+        self._recall_max_input_chars = int(self._config.get("recall_max_input_chars", 800))
+        self._retain_async = self._config.get("retain_async", True)
+
+        _client_version = "unknown"
+        try:
+            from importlib.metadata import version as pkg_version
+            _client_version = pkg_version("hindsight-client")
+        except Exception:
+            pass
+        logger.info("Hindsight initialized: mode=%s, api_url=%s, bank=%s, budget=%s, memory_mode=%s, prefetch_method=%s, client=%s",
+                     self._mode, self._api_url, self._bank_id, self._budget, self._memory_mode, self._prefetch_method, _client_version)
+        logger.debug("Hindsight config: auto_retain=%s, auto_recall=%s, retain_every_n=%d, "
+                     "retain_async=%s, retain_context=%s, "
+                     "recall_max_tokens=%d, recall_max_input_chars=%d, tags=%s, recall_tags=%s",
+                     self._auto_retain, self._auto_recall, self._retain_every_n_turns,
+                     self._retain_async, self._retain_context,
+                     self._recall_max_tokens, self._recall_max_input_chars,
+                     self._tags, self._recall_tags)
 
         # For local mode, start the embedded daemon in the background so it
         # doesn't block the chat. Redirect stdout/stderr to a log file to
         # prevent rich startup output from spamming the terminal.
-        if self._mode == "local":
+        if self._mode == "local_embedded":
             def _start_daemon():
                 import traceback
                 log_dir = get_hermes_home() / "logs"
@@ -320,6 +583,8 @@ class HindsightMemoryProvider(MemoryProvider):
                     current_provider = self._config.get("llm_provider", "")
                     current_model = self._config.get("llm_model", "")
                     current_base_url = self._config.get("llm_base_url") or os.environ.get("HINDSIGHT_API_LLM_BASE_URL", "")
+                    # Map openai_compatible/openrouter → openai for the daemon (OpenAI wire format)
+                    daemon_provider = "openai" if current_provider in ("openai_compatible", "openrouter") else current_provider
 
                     # Read saved profile config
                     saved = {}
@@ -330,7 +595,7 @@ class HindsightMemoryProvider(MemoryProvider):
                                 saved[k.strip()] = v.strip()
 
                     config_changed = (
-                        saved.get("HINDSIGHT_API_LLM_PROVIDER") != current_provider or
+                        saved.get("HINDSIGHT_API_LLM_PROVIDER") != daemon_provider or
                         saved.get("HINDSIGHT_API_LLM_MODEL") != current_model or
                         saved.get("HINDSIGHT_API_LLM_API_KEY") != current_key or
                         saved.get("HINDSIGHT_API_LLM_BASE_URL", "") != current_base_url
@@ -340,7 +605,7 @@ class HindsightMemoryProvider(MemoryProvider):
                         # Write updated profile .env
                         profile_env.parent.mkdir(parents=True, exist_ok=True)
                         env_lines = (
-                            f"HINDSIGHT_API_LLM_PROVIDER={current_provider}\n"
+                            f"HINDSIGHT_API_LLM_PROVIDER={daemon_provider}\n"
                             f"HINDSIGHT_API_LLM_API_KEY={current_key}\n"
                             f"HINDSIGHT_API_LLM_MODEL={current_model}\n"
                             f"HINDSIGHT_API_LOG_LEVEL=info\n"
@@ -388,47 +653,118 @@ class HindsightMemoryProvider(MemoryProvider):
 
     def prefetch(self, query: str, *, session_id: str = "") -> str:
         if self._prefetch_thread and self._prefetch_thread.is_alive():
+            logger.debug("Prefetch: waiting for background thread to complete")
             self._prefetch_thread.join(timeout=3.0)
         with self._prefetch_lock:
             result = self._prefetch_result
             self._prefetch_result = ""
         if not result:
+            logger.debug("Prefetch: no results available")
             return ""
-        return f"## Hindsight Memory\n{result}"
+        logger.debug("Prefetch: returning %d chars of context", len(result))
+        header = self._recall_prompt_preamble or (
+            "# Hindsight Memory (persistent cross-session context)\n"
+            "Use this to answer questions about the user and prior sessions. "
+            "Do not call tools to look up information that is already present here."
+        )
+        return f"{header}\n\n{result}"
 
     def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
         if self._memory_mode == "tools":
+            logger.debug("Prefetch: skipped (tools-only mode)")
             return
+        if not self._auto_recall:
+            logger.debug("Prefetch: skipped (auto_recall disabled)")
+            return
+        # Truncate query to max chars
+        if self._recall_max_input_chars and len(query) > self._recall_max_input_chars:
+            query = query[:self._recall_max_input_chars]
+
         def _run():
             try:
                 client = self._get_client()
                 if self._prefetch_method == "reflect":
+                    logger.debug("Prefetch: calling reflect (bank=%s, query_len=%d)", self._bank_id, len(query))
                     resp = _run_sync(client.areflect(bank_id=self._bank_id, query=query, budget=self._budget))
                     text = resp.text or ""
                 else:
-                    resp = _run_sync(client.arecall(bank_id=self._bank_id, query=query, budget=self._budget))
-                    text = "\n".join(r.text for r in resp.results if r.text) if resp.results else ""
+                    recall_kwargs: dict = {
+                        "bank_id": self._bank_id, "query": query,
+                        "budget": self._budget, "max_tokens": self._recall_max_tokens,
+                    }
+                    if self._recall_tags:
+                        recall_kwargs["tags"] = self._recall_tags
+                        recall_kwargs["tags_match"] = self._recall_tags_match
+                    if self._recall_types:
+                        recall_kwargs["types"] = self._recall_types
+                    logger.debug("Prefetch: calling recall (bank=%s, query_len=%d, budget=%s)",
+                                 self._bank_id, len(query), self._budget)
+                    resp = _run_sync(client.arecall(**recall_kwargs))
+                    num_results = len(resp.results) if resp.results else 0
+                    logger.debug("Prefetch: recall returned %d results", num_results)
+                    text = "\n".join(f"- {r.text}" for r in resp.results if r.text) if resp.results else ""
                 if text:
                     with self._prefetch_lock:
                         self._prefetch_result = text
             except Exception as e:
-                logger.debug("Hindsight prefetch failed: %s", e)
+                logger.debug("Hindsight prefetch failed: %s", e, exc_info=True)
 
         self._prefetch_thread = threading.Thread(target=_run, daemon=True, name="hindsight-prefetch")
         self._prefetch_thread.start()
 
     def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
-        """Retain conversation turn in background (non-blocking)."""
-        combined = f"User: {user_content}\nAssistant: {assistant_content}"
+        """Retain conversation turn in background (non-blocking).
+
+        Respects retain_every_n_turns for batching.
+        """
+        if not self._auto_retain:
+            logger.debug("sync_turn: skipped (auto_retain disabled)")
+            return
+
+        from datetime import datetime, timezone
+        now = datetime.now(timezone.utc).isoformat()
+
+        messages = [
+            {"role": "user", "content": user_content, "timestamp": now},
+            {"role": "assistant", "content": assistant_content, "timestamp": now},
+        ]
+
+        turn = json.dumps(messages)
+        self._session_turns.append(turn)
+        self._turn_counter += 1
+
+        # Only retain every N turns
+        if self._turn_counter % self._retain_every_n_turns != 0:
+            logger.debug("sync_turn: buffered turn %d (will retain at turn %d)",
+                         self._turn_counter, self._turn_counter + (self._retain_every_n_turns - self._turn_counter % self._retain_every_n_turns))
+            return
+
+        logger.debug("sync_turn: retaining %d turns, total session content %d chars",
+                     len(self._session_turns), sum(len(t) for t in self._session_turns))
+        # Send the ENTIRE session as a single JSON array (document_id deduplicates).
+        # Each element in _session_turns is a JSON string of that turn's messages.
+        content = "[" + ",".join(self._session_turns) + "]"
 
         def _sync():
             try:
                 client = self._get_client()
-                _run_sync(client.aretain(
-                    bank_id=self._bank_id, content=combined, context="conversation"
+                item: dict = {
+                    "content": content,
+                    "context": self._retain_context,
+                }
+                if self._tags:
+                    item["tags"] = self._tags
+                logger.debug("Hindsight retain: bank=%s, doc=%s, async=%s, content_len=%d, num_turns=%d",
+                             self._bank_id, self._session_id, self._retain_async, len(content), len(self._session_turns))
+                _run_sync(client.aretain_batch(
+                    bank_id=self._bank_id,
+                    items=[item],
+                    document_id=self._session_id,
+                    retain_async=self._retain_async,
                 ))
+                logger.debug("Hindsight retain succeeded")
             except Exception as e:
-                logger.warning("Hindsight sync failed: %s", e)
+                logger.warning("Hindsight sync failed: %s", e, exc_info=True)
 
         if self._sync_thread and self._sync_thread.is_alive():
             self._sync_thread.join(timeout=5.0)
@@ -453,12 +789,18 @@ class HindsightMemoryProvider(MemoryProvider):
                 return tool_error("Missing required parameter: content")
             context = args.get("context")
             try:
-                _run_sync(client.aretain(
-                    bank_id=self._bank_id, content=content, context=context
-                ))
+                retain_kwargs: dict = {
+                    "bank_id": self._bank_id, "content": content, "context": context,
+                }
+                if self._tags:
+                    retain_kwargs["tags"] = self._tags
+                logger.debug("Tool hindsight_retain: bank=%s, content_len=%d, context=%s",
+                             self._bank_id, len(content), context)
+                _run_sync(client.aretain(**retain_kwargs))
+                logger.debug("Tool hindsight_retain: success")
                 return json.dumps({"result": "Memory stored successfully."})
             except Exception as e:
-                logger.warning("hindsight_retain failed: %s", e)
+                logger.warning("hindsight_retain failed: %s", e, exc_info=True)
                 return tool_error(f"Failed to store memory: {e}")
 
         elif tool_name == "hindsight_recall":
@@ -466,15 +808,26 @@ class HindsightMemoryProvider(MemoryProvider):
             if not query:
                 return tool_error("Missing required parameter: query")
             try:
-                resp = _run_sync(client.arecall(
-                    bank_id=self._bank_id, query=query, budget=self._budget
-                ))
+                recall_kwargs: dict = {
+                    "bank_id": self._bank_id, "query": query, "budget": self._budget,
+                    "max_tokens": self._recall_max_tokens,
+                }
+                if self._recall_tags:
+                    recall_kwargs["tags"] = self._recall_tags
+                    recall_kwargs["tags_match"] = self._recall_tags_match
+                if self._recall_types:
+                    recall_kwargs["types"] = self._recall_types
+                logger.debug("Tool hindsight_recall: bank=%s, query_len=%d, budget=%s",
+                             self._bank_id, len(query), self._budget)
+                resp = _run_sync(client.arecall(**recall_kwargs))
+                num_results = len(resp.results) if resp.results else 0
+                logger.debug("Tool hindsight_recall: %d results", num_results)
                 if not resp.results:
                     return json.dumps({"result": "No relevant memories found."})
                 lines = [f"{i}. {r.text}" for i, r in enumerate(resp.results, 1)]
                 return json.dumps({"result": "\n".join(lines)})
             except Exception as e:
-                logger.warning("hindsight_recall failed: %s", e)
+                logger.warning("hindsight_recall failed: %s", e, exc_info=True)
                 return tool_error(f"Failed to search memory: {e}")
 
         elif tool_name == "hindsight_reflect":
@@ -482,24 +835,28 @@ class HindsightMemoryProvider(MemoryProvider):
             if not query:
                 return tool_error("Missing required parameter: query")
             try:
+                logger.debug("Tool hindsight_reflect: bank=%s, query_len=%d, budget=%s",
+                             self._bank_id, len(query), self._budget)
                 resp = _run_sync(client.areflect(
                     bank_id=self._bank_id, query=query, budget=self._budget
                 ))
+                logger.debug("Tool hindsight_reflect: response_len=%d", len(resp.text or ""))
                 return json.dumps({"result": resp.text or "No relevant memories found."})
             except Exception as e:
-                logger.warning("hindsight_reflect failed: %s", e)
+                logger.warning("hindsight_reflect failed: %s", e, exc_info=True)
                 return tool_error(f"Failed to reflect: {e}")
 
         return tool_error(f"Unknown tool: {tool_name}")
 
     def shutdown(self) -> None:
+        logger.debug("Hindsight shutdown: waiting for background threads")
         global _loop, _loop_thread
         for t in (self._prefetch_thread, self._sync_thread):
             if t and t.is_alive():
                 t.join(timeout=5.0)
         if self._client is not None:
             try:
-                if self._mode == "local":
+                if self._mode == "local_embedded":
                     # Use the public close() API. The RuntimeError from
                     # aiohttp's "attached to a different loop" is expected
                     # and harmless — the daemon keeps running independently.
diff --git a/plugins/memory/hindsight/plugin.yaml b/plugins/memory/hindsight/plugin.yaml
index 798518992..b12c09142 100644
--- a/plugins/memory/hindsight/plugin.yaml
+++ b/plugins/memory/hindsight/plugin.yaml
@@ -2,9 +2,7 @@ name: hindsight
 version: 1.0.0
 description: "Hindsight — long-term memory with knowledge graph, entity resolution, and multi-strategy retrieval."
 pip_dependencies:
-  - hindsight-client
-  - hindsight-all
-requires_env:
-  - HINDSIGHT_API_KEY
+  - "hindsight-client>=0.4.22"
+requires_env: []
 hooks:
   - on_session_end
diff --git a/tests/plugins/memory/test_hindsight_provider.py b/tests/plugins/memory/test_hindsight_provider.py
new file mode 100644
index 000000000..5548a29ad
--- /dev/null
+++ b/tests/plugins/memory/test_hindsight_provider.py
@@ -0,0 +1,598 @@
+"""Tests for the Hindsight memory provider plugin.
+
+Tests cover config loading, tool handlers (tags, max_tokens, types),
+prefetch (auto_recall, preamble, query truncation), sync_turn (auto_retain,
+turn counting, tags), and schema completeness.
+"""
+
+import json
+import threading
+from types import SimpleNamespace
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+from plugins.memory.hindsight import (
+    HindsightMemoryProvider,
+    RECALL_SCHEMA,
+    REFLECT_SCHEMA,
+    RETAIN_SCHEMA,
+    _load_config,
+)
+
+
+# ---------------------------------------------------------------------------
+# Fixtures
+# ---------------------------------------------------------------------------
+
+
+@pytest.fixture(autouse=True)
+def _clean_env(monkeypatch):
+    """Ensure no stale env vars leak between tests."""
+    for key in (
+        "HINDSIGHT_API_KEY", "HINDSIGHT_API_URL", "HINDSIGHT_BANK_ID",
+        "HINDSIGHT_BUDGET", "HINDSIGHT_MODE", "HINDSIGHT_LLM_API_KEY",
+    ):
+        monkeypatch.delenv(key, raising=False)
+
+
+def _make_mock_client():
+    """Create a mock Hindsight client with async methods."""
+    client = MagicMock()
+    client.aretain = AsyncMock()
+    client.arecall = AsyncMock(
+        return_value=SimpleNamespace(
+            results=[
+                SimpleNamespace(text="Memory 1"),
+                SimpleNamespace(text="Memory 2"),
+            ]
+        )
+    )
+    client.areflect = AsyncMock(
+        return_value=SimpleNamespace(text="Synthesized answer")
+    )
+    client.aretain_batch = AsyncMock()
+    client.aclose = AsyncMock()
+    return client
+
+
+@pytest.fixture()
+def provider(tmp_path, monkeypatch):
+    """Create an initialized HindsightMemoryProvider with a mock client."""
+    config = {
+        "mode": "cloud",
+        "apiKey": "test-key",
+        "api_url": "http://localhost:9999",
+        "bank_id": "test-bank",
+        "budget": "mid",
+        "memory_mode": "hybrid",
+    }
+    config_path = tmp_path / "hindsight" / "config.json"
+    config_path.parent.mkdir(parents=True, exist_ok=True)
+    config_path.write_text(json.dumps(config))
+
+    monkeypatch.setattr(
+        "plugins.memory.hindsight.get_hermes_home", lambda: tmp_path
+    )
+
+    p = HindsightMemoryProvider()
+    p.initialize(session_id="test-session", hermes_home=str(tmp_path), platform="cli")
+    p._client = _make_mock_client()
+    return p
+
+
+@pytest.fixture()
+def provider_with_config(tmp_path, monkeypatch):
+    """Create a provider factory that accepts custom config overrides."""
+    def _make(**overrides):
+        config = {
+            "mode": "cloud",
+            "apiKey": "test-key",
+            "api_url": "http://localhost:9999",
+            "bank_id": "test-bank",
+            "budget": "mid",
+            "memory_mode": "hybrid",
+        }
+        config.update(overrides)
+        config_path = tmp_path / "hindsight" / "config.json"
+        config_path.parent.mkdir(parents=True, exist_ok=True)
+        config_path.write_text(json.dumps(config))
+
+        monkeypatch.setattr(
+            "plugins.memory.hindsight.get_hermes_home", lambda: tmp_path
+        )
+
+        p = HindsightMemoryProvider()
+        p.initialize(session_id="test-session", hermes_home=str(tmp_path), platform="cli")
+        p._client = _make_mock_client()
+        return p
+    return _make
+
+
+# ---------------------------------------------------------------------------
+# Schema tests
+# ---------------------------------------------------------------------------
+
+
+class TestSchemas:
+    def test_retain_schema_has_content(self):
+        assert RETAIN_SCHEMA["name"] == "hindsight_retain"
+        assert "content" in RETAIN_SCHEMA["parameters"]["properties"]
+        assert "content" in RETAIN_SCHEMA["parameters"]["required"]
+
+    def test_recall_schema_has_query(self):
+        assert RECALL_SCHEMA["name"] == "hindsight_recall"
+        assert "query" in RECALL_SCHEMA["parameters"]["properties"]
+        assert "query" in RECALL_SCHEMA["parameters"]["required"]
+
+    def test_reflect_schema_has_query(self):
+        assert REFLECT_SCHEMA["name"] == "hindsight_reflect"
+        assert "query" in REFLECT_SCHEMA["parameters"]["properties"]
+
+    def test_get_tool_schemas_returns_three(self, provider):
+        schemas = provider.get_tool_schemas()
+        assert len(schemas) == 3
+        names = {s["name"] for s in schemas}
+        assert names == {"hindsight_retain", "hindsight_recall", "hindsight_reflect"}
+
+    def test_context_mode_returns_no_tools(self, provider_with_config):
+        p = provider_with_config(memory_mode="context")
+        assert p.get_tool_schemas() == []
+
+
+# ---------------------------------------------------------------------------
+# Config tests
+# ---------------------------------------------------------------------------
+
+
+class TestConfig:
+    def test_default_values(self, provider):
+        assert provider._auto_retain is True
+        assert provider._auto_recall is True
+        assert provider._retain_every_n_turns == 1
+        assert provider._recall_max_tokens == 4096
+        assert provider._recall_max_input_chars == 800
+        assert provider._tags is None
+        assert provider._recall_tags is None
+        assert provider._bank_mission == ""
+        assert provider._bank_retain_mission is None
+        assert provider._retain_context == "conversation between Hermes Agent and the User"
+
+    def test_custom_config_values(self, provider_with_config):
+        p = provider_with_config(
+            tags=["tag1", "tag2"],
+            recall_tags=["recall-tag"],
+            recall_tags_match="all",
+            auto_retain=False,
+            auto_recall=False,
+            retain_every_n_turns=3,
+            retain_context="custom-ctx",
+            bank_retain_mission="Extract key facts",
+            recall_max_tokens=2048,
+            recall_types=["world", "experience"],
+            recall_prompt_preamble="Custom preamble:",
+            recall_max_input_chars=500,
+            bank_mission="Test agent mission",
+        )
+        assert p._tags == ["tag1", "tag2"]
+        assert p._recall_tags == ["recall-tag"]
+        assert p._recall_tags_match == "all"
+        assert p._auto_retain is False
+        assert p._auto_recall is False
+        assert p._retain_every_n_turns == 3
+        assert p._retain_context == "custom-ctx"
+        assert p._bank_retain_mission == "Extract key facts"
+        assert p._recall_max_tokens == 2048
+        assert p._recall_types == ["world", "experience"]
+        assert p._recall_prompt_preamble == "Custom preamble:"
+        assert p._recall_max_input_chars == 500
+        assert p._bank_mission == "Test agent mission"
+
+    def test_config_from_env_fallback(self, tmp_path, monkeypatch):
+        """When no config file exists, falls back to env vars."""
+        monkeypatch.setattr(
+            "plugins.memory.hindsight.get_hermes_home",
+            lambda: tmp_path / "nonexistent",
+        )
+        monkeypatch.setenv("HINDSIGHT_MODE", "cloud")
+        monkeypatch.setenv("HINDSIGHT_API_KEY", "env-key")
+        monkeypatch.setenv("HINDSIGHT_BANK_ID", "env-bank")
+        monkeypatch.setenv("HINDSIGHT_BUDGET", "high")
+
+        cfg = _load_config()
+        assert cfg["apiKey"] == "env-key"
+        assert cfg["banks"]["hermes"]["bankId"] == "env-bank"
+        assert cfg["banks"]["hermes"]["budget"] == "high"
+
+
+# ---------------------------------------------------------------------------
+# Tool handler tests
+# ---------------------------------------------------------------------------
+
+
+class TestToolHandlers:
+    def test_retain_success(self, provider):
+        result = json.loads(provider.handle_tool_call(
+            "hindsight_retain", {"content": "user likes dark mode"}
+        ))
+        assert result["result"] == "Memory stored successfully."
+        provider._client.aretain.assert_called_once()
+        call_kwargs = provider._client.aretain.call_args.kwargs
+        assert call_kwargs["bank_id"] == "test-bank"
+        assert call_kwargs["content"] == "user likes dark mode"
+
+    def test_retain_with_tags(self, provider_with_config):
+        p = provider_with_config(tags=["pref", "ui"])
+        p.handle_tool_call("hindsight_retain", {"content": "likes dark mode"})
+        call_kwargs = p._client.aretain.call_args.kwargs
+        assert call_kwargs["tags"] == ["pref", "ui"]
+
+    def test_retain_without_tags(self, provider):
+        provider.handle_tool_call("hindsight_retain", {"content": "hello"})
+        call_kwargs = provider._client.aretain.call_args.kwargs
+        assert "tags" not in call_kwargs
+
+    def test_retain_missing_content(self, provider):
+        result = json.loads(provider.handle_tool_call(
+            "hindsight_retain", {}
+        ))
+        assert "error" in result
+
+    def test_recall_success(self, provider):
+        result = json.loads(provider.handle_tool_call(
+            "hindsight_recall", {"query": "dark mode"}
+        ))
+        assert "Memory 1" in result["result"]
+        assert "Memory 2" in result["result"]
+
+    def test_recall_passes_max_tokens(self, provider_with_config):
+        p = provider_with_config(recall_max_tokens=2048)
+        p.handle_tool_call("hindsight_recall", {"query": "test"})
+        call_kwargs = p._client.arecall.call_args.kwargs
+        assert call_kwargs["max_tokens"] == 2048
+
+    def test_recall_passes_tags(self, provider_with_config):
+        p = provider_with_config(recall_tags=["tag1"], recall_tags_match="all")
+        p.handle_tool_call("hindsight_recall", {"query": "test"})
+        call_kwargs = p._client.arecall.call_args.kwargs
+        assert call_kwargs["tags"] == ["tag1"]
+        assert call_kwargs["tags_match"] == "all"
+
+    def test_recall_passes_types(self, provider_with_config):
+        p = provider_with_config(recall_types=["world", "experience"])
+        p.handle_tool_call("hindsight_recall", {"query": "test"})
+        call_kwargs = p._client.arecall.call_args.kwargs
+        assert call_kwargs["types"] == ["world", "experience"]
+
+    def test_recall_no_results(self, provider):
+        provider._client.arecall.return_value = SimpleNamespace(results=[])
+        result = json.loads(provider.handle_tool_call(
+            "hindsight_recall", {"query": "test"}
+        ))
+        assert result["result"] == "No relevant memories found."
+
+    def test_recall_missing_query(self, provider):
+        result = json.loads(provider.handle_tool_call(
+            "hindsight_recall", {}
+        ))
+        assert "error" in result
+
+    def test_reflect_success(self, provider):
+        result = json.loads(provider.handle_tool_call(
+            "hindsight_reflect", {"query": "summarize"}
+        ))
+        assert result["result"] == "Synthesized answer"
+
+    def test_reflect_missing_query(self, provider):
+        result = json.loads(provider.handle_tool_call(
+            "hindsight_reflect", {}
+        ))
+        assert "error" in result
+
+    def test_unknown_tool(self, provider):
+        result = json.loads(provider.handle_tool_call(
+            "hindsight_unknown", {}
+        ))
+        assert "error" in result
+
+    def test_retain_error_handling(self, provider):
+        provider._client.aretain.side_effect = RuntimeError("connection failed")
+        result = json.loads(provider.handle_tool_call(
+            "hindsight_retain", {"content": "test"}
+        ))
+        assert "error" in result
+        assert "connection failed" in result["error"]
+
+    def test_recall_error_handling(self, provider):
+        provider._client.arecall.side_effect = RuntimeError("timeout")
+        result = json.loads(provider.handle_tool_call(
+            "hindsight_recall", {"query": "test"}
+        ))
+        assert "error" in result
+
+
+# ---------------------------------------------------------------------------
+# Prefetch tests
+# ---------------------------------------------------------------------------
+
+
+class TestPrefetch:
+    def test_prefetch_returns_empty_when_no_result(self, provider):
+        assert provider.prefetch("test") == ""
+
+    def test_prefetch_default_preamble(self, provider):
+        provider._prefetch_result = "- some memory"
+        result = provider.prefetch("test")
+        assert "Hindsight Memory" in result
+        assert "- some memory" in result
+
+    def test_prefetch_custom_preamble(self, provider_with_config):
+        p = provider_with_config(recall_prompt_preamble="Custom header:")
+        p._prefetch_result = "- memory line"
+        result = p.prefetch("test")
+        assert result.startswith("Custom header:")
+        assert "- memory line" in result
+
+    def test_queue_prefetch_skipped_in_tools_mode(self, provider_with_config):
+        p = provider_with_config(memory_mode="tools")
+        p.queue_prefetch("test")
+        # Should not start a thread
+        assert p._prefetch_thread is None
+
+    def test_queue_prefetch_skipped_when_auto_recall_off(self, provider_with_config):
+        p = provider_with_config(auto_recall=False)
+        p.queue_prefetch("test")
+        assert p._prefetch_thread is None
+
+    def test_queue_prefetch_truncates_query(self, provider_with_config):
+        p = provider_with_config(recall_max_input_chars=10)
+        # Mock _run_sync to capture the query
+        original_query = None
+
+        def _capture_recall(**kwargs):
+            nonlocal original_query
+            original_query = kwargs.get("query", "")
+            return SimpleNamespace(results=[])
+
+        p._client.arecall = AsyncMock(side_effect=_capture_recall)
+
+        long_query = "a" * 100
+        p.queue_prefetch(long_query)
+        if p._prefetch_thread:
+            p._prefetch_thread.join(timeout=5.0)
+
+        # The query passed to arecall should be truncated
+        if original_query is not None:
+            assert len(original_query) <= 10
+
+    def test_queue_prefetch_passes_recall_params(self, provider_with_config):
+        p = provider_with_config(
+            recall_tags=["t1"],
+            recall_tags_match="all",
+            recall_max_tokens=1024,
+            recall_types=["world"],
+        )
+        p.queue_prefetch("test query")
+        if p._prefetch_thread:
+            p._prefetch_thread.join(timeout=5.0)
+
+        call_kwargs = p._client.arecall.call_args.kwargs
+        assert call_kwargs["max_tokens"] == 1024
+        assert call_kwargs["tags"] == ["t1"]
+        assert call_kwargs["tags_match"] == "all"
+        assert call_kwargs["types"] == ["world"]
+
+
+# ---------------------------------------------------------------------------
+# sync_turn tests
+# ---------------------------------------------------------------------------
+
+
+class TestSyncTurn:
+    def _get_retain_kwargs(self, provider):
+        """Helper to get the kwargs from the aretain_batch call."""
+        return provider._client.aretain_batch.call_args.kwargs
+
+    def _get_retain_content(self, provider):
+        """Helper to get the raw content string from the first item."""
+        kwargs = self._get_retain_kwargs(provider)
+        return kwargs["items"][0]["content"]
+
+    def _get_retain_messages(self, provider):
+        """Helper to parse the first turn's messages from retained content.
+
+        Content is a JSON array of turns: [[msgs...], [msgs...], ...]
+        For single-turn tests, returns the first turn's messages.
+        """
+        content = self._get_retain_content(provider)
+        turns = json.loads(content)
+        return turns[0] if len(turns) == 1 else turns
+
+    def test_sync_turn_retains(self, provider):
+        provider.sync_turn("hello", "hi there")
+        if provider._sync_thread:
+            provider._sync_thread.join(timeout=5.0)
+        provider._client.aretain_batch.assert_called_once()
+        messages = self._get_retain_messages(provider)
+        assert len(messages) == 2
+        assert messages[0]["role"] == "user"
+        assert messages[0]["content"] == "hello"
+        assert "timestamp" in messages[0]
+        assert messages[1]["role"] == "assistant"
+        assert messages[1]["content"] == "hi there"
+        assert "timestamp" in messages[1]
+
+    def test_sync_turn_skipped_when_auto_retain_off(self, provider_with_config):
+        p = provider_with_config(auto_retain=False)
+        p.sync_turn("hello", "hi")
+        assert p._sync_thread is None
+        p._client.aretain_batch.assert_not_called()
+
+    def test_sync_turn_with_tags(self, provider_with_config):
+        p = provider_with_config(tags=["conv", "session1"])
+        p.sync_turn("hello", "hi")
+        if p._sync_thread:
+            p._sync_thread.join(timeout=5.0)
+        item = p._client.aretain_batch.call_args.kwargs["items"][0]
+        assert item["tags"] == ["conv", "session1"]
+
+    def test_sync_turn_uses_aretain_batch(self, provider):
+        """sync_turn should use aretain_batch with retain_async."""
+        provider.sync_turn("hello", "hi")
+        if provider._sync_thread:
+            provider._sync_thread.join(timeout=5.0)
+        provider._client.aretain_batch.assert_called_once()
+        call_kwargs = provider._client.aretain_batch.call_args.kwargs
+        assert call_kwargs["document_id"] == "test-session"
+        assert call_kwargs["retain_async"] is True
+        assert len(call_kwargs["items"]) == 1
+        assert call_kwargs["items"][0]["context"] == "conversation between Hermes Agent and the User"
+
+    def test_sync_turn_custom_context(self, provider_with_config):
+        p = provider_with_config(retain_context="my-agent")
+        p.sync_turn("hello", "hi")
+        if p._sync_thread:
+            p._sync_thread.join(timeout=5.0)
+        item = p._client.aretain_batch.call_args.kwargs["items"][0]
+        assert item["context"] == "my-agent"
+
+    def test_sync_turn_every_n_turns(self, provider_with_config):
+        """With retain_every_n_turns=3, only retains on every 3rd turn."""
+        p = provider_with_config(retain_every_n_turns=3)
+
+        p.sync_turn("turn1-user", "turn1-asst")
+        assert p._sync_thread is None  # not retained yet
+
+        p.sync_turn("turn2-user", "turn2-asst")
+        assert p._sync_thread is None  # not retained yet
+
+        p.sync_turn("turn3-user", "turn3-asst")
+        assert p._sync_thread is not None  # retained!
+        p._sync_thread.join(timeout=5.0)
+
+        p._client.aretain_batch.assert_called_once()
+        content = p._client.aretain_batch.call_args.kwargs["items"][0]["content"]
+        # Should contain all 3 turns
+        assert "turn1-user" in content
+        assert "turn2-user" in content
+        assert "turn3-user" in content
+
+    def test_sync_turn_accumulates_full_session(self, provider_with_config):
+        """Each retain sends the ENTIRE session, not just the latest batch."""
+        p = provider_with_config(retain_every_n_turns=2)
+
+        p.sync_turn("turn1-user", "turn1-asst")
+        p.sync_turn("turn2-user", "turn2-asst")
+        if p._sync_thread:
+            p._sync_thread.join(timeout=5.0)
+
+        p._client.aretain_batch.reset_mock()
+
+        p.sync_turn("turn3-user", "turn3-asst")
+        p.sync_turn("turn4-user", "turn4-asst")
+        if p._sync_thread:
+            p._sync_thread.join(timeout=5.0)
+
+        content = p._client.aretain_batch.call_args.kwargs["items"][0]["content"]
+        # Should contain ALL turns from the session
+        assert "turn1-user" in content
+        assert "turn2-user" in content
+        assert "turn3-user" in content
+        assert "turn4-user" in content
+
+    def test_sync_turn_passes_document_id(self, provider):
+        """sync_turn should pass session_id as document_id for dedup."""
+        provider.sync_turn("hello", "hi")
+        if provider._sync_thread:
+            provider._sync_thread.join(timeout=5.0)
+        call_kwargs = provider._client.aretain_batch.call_args.kwargs
+        assert call_kwargs["document_id"] == "test-session"
+
+    def test_sync_turn_error_does_not_raise(self, provider):
+        """Errors in sync_turn should be swallowed (non-blocking)."""
+        provider._client.aretain_batch.side_effect = RuntimeError("network error")
+        provider.sync_turn("hello", "hi")
+        if provider._sync_thread:
+            provider._sync_thread.join(timeout=5.0)
+        # Should not raise
+
+
+# ---------------------------------------------------------------------------
+# System prompt tests
+# ---------------------------------------------------------------------------
+
+
+class TestSystemPrompt:
+    def test_hybrid_mode_prompt(self, provider):
+        block = provider.system_prompt_block()
+        assert "Hindsight Memory" in block
+        assert "hindsight_recall" in block
+        assert "automatically injected" in block
+
+    def test_context_mode_prompt(self, provider_with_config):
+        p = provider_with_config(memory_mode="context")
+        block = p.system_prompt_block()
+        assert "context mode" in block
+        assert "hindsight_recall" not in block
+
+    def test_tools_mode_prompt(self, provider_with_config):
+        p = provider_with_config(memory_mode="tools")
+        block = p.system_prompt_block()
+        assert "tools mode" in block
+        assert "hindsight_recall" in block
+
+
+# ---------------------------------------------------------------------------
+# Config schema tests
+# ---------------------------------------------------------------------------
+
+
+class TestConfigSchema:
+    def test_schema_has_all_new_fields(self, provider):
+        schema = provider.get_config_schema()
+        keys = {f["key"] for f in schema}
+        expected_keys = {
+            "mode", "api_url", "api_key", "llm_provider", "llm_api_key",
+            "llm_model", "bank_id", "bank_mission", "bank_retain_mission",
+            "recall_budget", "memory_mode", "recall_prefetch_method",
+            "tags", "recall_tags", "recall_tags_match",
+            "auto_recall", "auto_retain",
+            "retain_every_n_turns", "retain_async",
+            "retain_context",
+            "recall_max_tokens", "recall_max_input_chars",
+            "recall_prompt_preamble",
+        }
+        assert expected_keys.issubset(keys), f"Missing: {expected_keys - keys}"
+
+
+# ---------------------------------------------------------------------------
+# Availability tests
+# ---------------------------------------------------------------------------
+
+
+class TestAvailability:
+    def test_available_with_api_key(self, tmp_path, monkeypatch):
+        monkeypatch.setattr(
+            "plugins.memory.hindsight.get_hermes_home",
+            lambda: tmp_path / "nonexistent",
+        )
+        monkeypatch.setenv("HINDSIGHT_API_KEY", "test-key")
+        p = HindsightMemoryProvider()
+        assert p.is_available()
+
+    def test_not_available_without_config(self, tmp_path, monkeypatch):
+        monkeypatch.setattr(
+            "plugins.memory.hindsight.get_hermes_home",
+            lambda: tmp_path / "nonexistent",
+        )
+        p = HindsightMemoryProvider()
+        assert not p.is_available()
+
+    def test_available_in_local_mode(self, tmp_path, monkeypatch):
+        monkeypatch.setattr(
+            "plugins.memory.hindsight.get_hermes_home",
+            lambda: tmp_path / "nonexistent",
+        )
+        monkeypatch.setenv("HINDSIGHT_MODE", "local")
+        p = HindsightMemoryProvider()
+        assert p.is_available()
diff --git a/website/docs/user-guide/features/memory-providers.md b/website/docs/user-guide/features/memory-providers.md
index ad0a17ae4..e76a05414 100644
--- a/website/docs/user-guide/features/memory-providers.md
+++ b/website/docs/user-guide/features/memory-providers.md
@@ -263,12 +263,12 @@ echo "MEM0_API_KEY=your-key" >> ~/.hermes/.env
 
 ### Hindsight
 
-Long-term memory with knowledge graph, entity resolution, and multi-strategy retrieval. The `hindsight_reflect` tool provides cross-memory synthesis that no other provider offers.
+Long-term memory with knowledge graph, entity resolution, and multi-strategy retrieval. The `hindsight_reflect` tool provides cross-memory synthesis that no other provider offers. Automatically retains full conversation turns (including tool calls) with session-level document tracking.
 
 | | |
 |---|---|
 | **Best for** | Knowledge graph-based recall with entity relationships |
-| **Requires** | Cloud: `pip install hindsight-client` + API key. Local: `pip install hindsight` + LLM key |
+| **Requires** | Cloud: API key from [ui.hindsight.vectorize.io](https://ui.hindsight.vectorize.io). Local: LLM API key (OpenAI, Groq, OpenRouter, etc.) |
 | **Data storage** | Hindsight Cloud or local embedded PostgreSQL |
 | **Cost** | Hindsight pricing (cloud) or free (local) |
 
@@ -282,13 +282,25 @@ hermes config set memory.provider hindsight
 echo "HINDSIGHT_API_KEY=your-key" >> ~/.hermes/.env
 ```
 
+The setup wizard installs dependencies automatically and only installs what's needed for the selected mode (`hindsight-client` for cloud, `hindsight-all` for local). Requires `hindsight-client >= 0.4.22` (auto-upgraded on session start if outdated).
+
+**Local mode UI:** `hindsight-embed -p hermes ui start`
+
 **Config:** `$HERMES_HOME/hindsight/config.json`
 
 | Key | Default | Description |
 |-----|---------|-------------|
 | `mode` | `cloud` | `cloud` or `local` |
 | `bank_id` | `hermes` | Memory bank identifier |
-| `budget` | `mid` | Recall thoroughness: `low` / `mid` / `high` |
+| `recall_budget` | `mid` | Recall thoroughness: `low` / `mid` / `high` |
+| `memory_mode` | `hybrid` | `hybrid` (context + tools), `context` (auto-inject only), `tools` (tools only) |
+| `auto_retain` | `true` | Automatically retain conversation turns |
+| `auto_recall` | `true` | Automatically recall memories before each turn |
+| `retain_async` | `true` | Process retain asynchronously on the server |
+| `tags` | — | Tags applied when storing memories |
+| `recall_tags` | — | Tags to filter on recall |
+
+See [plugin README](https://github.com/NousResearch/hermes-agent/blob/main/plugins/memory/hindsight/README.md) for the full configuration reference.
 
 ---