feat(moa): expose MoA presets as selectable virtual models (#46081)

* feat(moa): expose MoA presets as selectable virtual models Reconstructed onto current main (PR #46081's base had diverged with no common ancestor, marking the PR dirty so CI never dispatched). MoA is now a virtual provider: each named preset is a selectable model under provider 'moa', and the preset's aggregator is the acting model that answers and calls tools. Reference models fan out in parallel via a bounded ThreadPoolExecutor (the same batch pattern delegate_task uses) — all references dispatched at once, collected when every one finishes, then handed to the aggregator. Output order is preserved, failures and the MoA-recursion guard stay isolated per reference. - Removed the old mixture_of_agents model tool and moa toolset. - Added moa as a virtual provider in the provider/model inventory. - /moa is shortcut behavior over model selection (default preset / named preset / one-shot prompt). - Dashboard + Desktop manage named presets; presets appear in model pickers. - Parallel reference fan-out in agent/moa_loop.py with regression test. * fix(moa): thread moa_config through _run_agent to _run_agent_inner The reconstructed gateway MoA wiring declared moa_config on _run_agent (the profile-scoping wrapper) and used it inside _run_agent_inner, but the wrapper never forwarded it — _run_agent_inner had no such parameter, so the runtime hit NameError: name 'moa_config' is not defined on the compression-failure session sync path. Add moa_config to _run_agent_inner's signature and forward it from both wrapper call sites (multiplex and non-multiplex). Caught by tests/gateway/test_compression_failure_session_sync.py on CI shard test(4). * fix(moa): classify moa as a virtual provider in the catalog The moa virtual provider has no PROVIDER_REGISTRY/ProviderProfile entry, so provider_catalog() fell through to the default auth_type="api_key" with no env vars — tripping two catalog invariants: - test_provider_catalog: api_key providers must expose a credential env var - test_provider_parity: every hermes-model provider must be desktop-configurable moa already declares auth_type="virtual" in HERMES_OVERLAYS; consult that overlay as an auth_type fallback so the catalog reports moa as virtual (no real credential, no network endpoint). Exempt virtual providers from the desktop parity union check the same way 'custom' is exempt — derived from the catalog, not a hardcoded slug, so future virtual providers are covered too.
2026-06-26 11:12:03 +00:00 · 2026-06-25 13:52:06 -07:00 · 2026-06-25 13:52:06 -07:00 · c6575df927
commit c6575df927
parent f284d85efa
58 changed files with 2264 additions and 765 deletions
--- a/acp_adapter/tools.py
+++ b/acp_adapter/tools.py
@ -74,7 +74,7 @@ _POLISHED_TOOLS = {
    "kanban_create", "kanban_show", "kanban_comment", "kanban_complete",
    "kanban_block", "kanban_link", "kanban_heartbeat",
    "yb_query_group_info", "yb_query_group_members", "yb_search_sticker",
-    "yb_send_dm", "yb_send_sticker", "mixture_of_agents",
+    "yb_send_dm", "yb_send_sticker",
 }


--- a/agent/agent_init.py
+++ b/agent/agent_init.py
@ -719,6 +719,15 @@ def init_agent(
                    print("🔑 Using credentials: Microsoft Entra ID")
                elif isinstance(effective_key, str) and len(effective_key) > 12:
                    print(f"🔑 Using token: {effective_key[:8]}...{effective_key[-4:]}")
+    elif agent.provider == "moa":
+        from agent.moa_loop import MoAClient
+        agent.api_mode = "chat_completions"
+        agent.client = MoAClient(agent.model or "default")
+        agent._client_kwargs = {}
+        agent.api_key = api_key or "moa-virtual-provider"
+        agent.base_url = base_url or "moa://local"
+        if not agent.quiet_mode:
+            print(f"🤖 AI Agent initialized with MoA preset: {agent.model}")
    elif agent.api_mode == "bedrock_converse":
        # AWS Bedrock — uses boto3 directly, no OpenAI client needed.
        # Region is extracted from the base_url or defaults to us-east-1.
--- a/agent/conversation_loop.py
+++ b/agent/conversation_loop.py
@ -502,6 +502,7 @@ def run_conversation(
    stream_callback: Optional[callable] = None,
    persist_user_message: Optional[str] = None,
    persist_user_timestamp: Optional[float] = None,
+    moa_config: Optional[dict[str, Any]] = None,
 ) -> Dict[str, Any]:
    """
    Run a complete conversation with tool calling until completion.
@ -524,6 +525,19 @@ def run_conversation(
    Returns:
        Dict: Complete conversation result with final response and message history
    """
+    if moa_config is None:
+        try:
+            from hermes_cli.moa_config import decode_moa_turn
+
+            _decoded_message, _decoded_moa_config = decode_moa_turn(user_message)
+            if _decoded_moa_config is not None:
+                user_message = _decoded_message
+                moa_config = _decoded_moa_config
+                if persist_user_message is None:
+                    persist_user_message = _decoded_message
+        except Exception:
+            pass
+
    # ── Per-turn setup (the prologue) ──
    # All once-per-turn setup — stdio guarding, retry-counter resets, user
    # message sanitization, todo/nudge hydration, system-prompt restore-or-
@ -802,6 +816,29 @@ def run_conversation(
        if effective_system:
            api_messages = [{"role": "system", "content": effective_system}] + api_messages

+        if moa_config:
+            try:
+                from agent.moa_loop import aggregate_moa_context
+
+                _moa_context = aggregate_moa_context(
+                    user_prompt=original_user_message if isinstance(original_user_message, str) else str(original_user_message),
+                    api_messages=api_messages,
+                    reference_models=moa_config.get("reference_models") or [],
+                    aggregator=moa_config.get("aggregator") or {},
+                    temperature=float(moa_config.get("reference_temperature", 0.6) or 0.6),
+                    aggregator_temperature=float(moa_config.get("aggregator_temperature", 0.4) or 0.4),
+                    max_tokens=int(moa_config.get("max_tokens", 4096) or 4096),
+                )
+                if _moa_context:
+                    for _msg in reversed(api_messages):
+                        if _msg.get("role") == "user":
+                            _base = _msg.get("content", "")
+                            if isinstance(_base, str):
+                                _msg["content"] = _base + "\n\n" + _moa_context
+                            break
+            except Exception as _moa_exc:
+                logger.warning("MoA context aggregation failed: %s", _moa_exc)
+
        # Inject ephemeral prefill messages right after the system prompt
        # but before conversation history. Same API-call-time-only pattern.
        if agent.prefill_messages:
@ -1123,7 +1160,7 @@ def run_conversation(
                # stream.  Mirror the ACP exclusion used for Responses
                # API upgrade (lines ~1083-1085).
                elif (
-                    agent.provider == "copilot-acp"
+                    agent.provider in {"copilot-acp", "moa"}
                    or str(agent.base_url or "").lower().startswith("acp://copilot")
                    or str(agent.base_url or "").lower().startswith("acp+tcp://")
                ):
--- a/agent/display.py
+++ b/agent/display.py
@ -368,7 +368,7 @@ def build_tool_preview(tool_name: str, args: dict, max_len: int | None = None) -
        "search_files": "pattern", "browser_navigate": "url",
        "browser_click": "ref", "browser_type": "text",
        "image_generate": "prompt", "text_to_speech": "text",
-        "vision_analyze": "question", "mixture_of_agents": "user_prompt",
+        "vision_analyze": "question",
        "skill_view": "name", "skills_list": "category",
        "cronjob": "action",
        "execute_code": "code", "delegate_task": "goal",
@ -1216,8 +1216,6 @@ def get_cute_tool_message(
        return _wrap(f"┊ 🔊 speak     {_trunc(args.get('text', ''), 30)}  {dur}")
    if tool_name == "vision_analyze":
        return _wrap(f"┊ 👁️  vision    {_trunc(args.get('question', ''), 30)}  {dur}")
-    if tool_name == "mixture_of_agents":
-        return _wrap(f"┊ 🧠 reason    {_trunc(args.get('user_prompt', ''), 30)}  {dur}")
    if tool_name == "send_message":
        return _wrap(f"┊ 📨 send      {args.get('target', '?')}: \"{_trunc(args.get('message', ''), 25)}\"  {dur}")
    if tool_name == "cronjob":
--- a/agent/moa_loop.py
+++ b/agent/moa_loop.py
@ -0,0 +1,306 @@
+"""Mixture-of-Agents runtime helpers for /moa turns.
+
+The slash command is deliberately not a model tool. It marks one user turn as
+MoA-enabled; the normal Hermes agent loop still owns tool calling and turn
+termination, while this module gathers reference-model context before each model
+iteration.
+"""
+
+from __future__ import annotations
+
+import logging
+from concurrent.futures import ThreadPoolExecutor
+from typing import Any
+
+from agent.auxiliary_client import call_llm
+from agent.transports import get_transport
+
+logger = logging.getLogger(__name__)
+
+# Upper bound on concurrent reference-model calls. References are independent
+# advisory calls (no tools, no inter-dependence), so we fan them out the same
+# way delegate_task runs a batch: all in flight at once, results collected when
+# every reference finishes. Presets rarely list more than a handful of
+# references; this cap just protects against a pathologically large preset
+# opening dozens of sockets at once.
+_MAX_REFERENCE_WORKERS = 8
+
+
+def _slot_label(slot: dict[str, str]) -> str:
+    return f"{slot.get('provider', '').strip()}:{slot.get('model', '').strip()}"
+
+
+def _run_reference(
+    slot: dict[str, str],
+    ref_messages: list[dict[str, Any]],
+    *,
+    temperature: float,
+    max_tokens: int,
+) -> tuple[str, str]:
+    """Call one reference model and return ``(label, text)``.
+
+    Never raises: a failed reference becomes a labelled note so the aggregator
+    can still act with partial context. Designed to run inside a thread pool —
+    ``call_llm`` is synchronous/blocking, so threads (not asyncio) are the right
+    concurrency primitive, mirroring ``delegate_task``'s batch fan-out.
+    """
+    label = _slot_label(slot)
+    try:
+        response = call_llm(
+            task="moa_reference",
+            provider=slot["provider"],
+            model=slot["model"],
+            messages=ref_messages,
+            temperature=temperature,
+            max_tokens=max_tokens,
+        )
+        return label, _extract_text(response) or "(empty response)"
+    except Exception as exc:
+        logger.warning("MoA reference model %s failed: %s", label, exc)
+        return label, f"[failed: {exc}]"
+
+
+def _run_references_parallel(
+    reference_models: list[dict[str, str]],
+    ref_messages: list[dict[str, Any]],
+    *,
+    temperature: float,
+    max_tokens: int,
+) -> list[tuple[str, str]]:
+    """Fan out all reference models in parallel, returning outputs in order.
+
+    Like ``delegate_task``'s batch mode, every reference is dispatched at once
+    and we block until all of them finish before handing the joined results to
+    the aggregator. Output order matches ``reference_models`` so the
+    ``Reference {idx}`` labelling stays stable. MoA presets that reference
+    another MoA preset are skipped here (recursion guard) with a labelled note.
+    """
+    if not reference_models:
+        return []
+
+    results: list[tuple[str, str] | None] = [None] * len(reference_models)
+    futures = {}
+    workers = min(_MAX_REFERENCE_WORKERS, len(reference_models))
+    with ThreadPoolExecutor(max_workers=workers) as executor:
+        for idx, slot in enumerate(reference_models):
+            if slot.get("provider") == "moa":
+                results[idx] = (
+                    _slot_label(slot),
+                    "[skipped: MoA presets cannot recursively reference MoA]",
+                )
+                continue
+            futures[
+                executor.submit(
+                    _run_reference,
+                    slot,
+                    ref_messages,
+                    temperature=temperature,
+                    max_tokens=max_tokens,
+                )
+            ] = idx
+        # Collect every reference before returning — the aggregator needs the
+        # complete set, so there is no early-exit / first-completed path here.
+        for future, idx in futures.items():
+            results[idx] = future.result()
+
+    return [r for r in results if r is not None]
+
+
+def _reference_messages(messages: list[dict[str, Any]]) -> list[dict[str, Any]]:
+    """Build an advisory-safe view of the conversation for reference models.
+
+    Reference calls are advisory: they never call tools and never emit the
+    ``tool_calls`` the main model did. Replaying the full transcript verbatim
+    (a) re-bills the ~8K-token Hermes system prompt per reference per
+    iteration and (b) risks 400s from strict providers (Mistral, Fireworks)
+    that reject orphan ``tool`` messages or ``tool_calls`` the reference never
+    produced. We keep only the user/assistant *text* turns, dropping the
+    system prompt, any ``tool``-role messages, and any ``tool_calls`` payloads.
+    """
+    trimmed: list[dict[str, Any]] = []
+    for msg in messages:
+        role = msg.get("role")
+        if role not in ("user", "assistant"):
+            # Drop system prompt and tool-result messages.
+            continue
+        content = msg.get("content")
+        if not isinstance(content, str):
+            # Skip non-text (multimodal/tool-call-only) assistant turns.
+            if not content:
+                continue
+        text = content if isinstance(content, str) else ""
+        if role == "assistant" and not text.strip():
+            # Assistant turn that was purely tool calls — nothing advisory.
+            continue
+        trimmed.append({"role": role, "content": text})
+    if not trimmed:
+        # Degenerate case (e.g. first turn was stripped): fall back to a
+        # minimal user turn so the reference still has something to answer.
+        for msg in reversed(messages):
+            if msg.get("role") == "user" and isinstance(msg.get("content"), str):
+                return [{"role": "user", "content": msg["content"]}]
+    return trimmed
+
+
+
+def _extract_text(response: Any) -> str:
+    try:
+        transport = get_transport("chat_completions")
+        if transport is None:
+            raise RuntimeError("chat_completions transport unavailable")
+        normalized = transport.normalize_response(response)
+        text = (normalized.content or "").strip()
+        if text:
+            return text
+    except Exception:
+        pass
+    try:
+        content = response.choices[0].message.content
+        return (content or "").strip()
+    except Exception:
+        return ""
+
+
+def aggregate_moa_context(
+    *,
+    user_prompt: str,
+    api_messages: list[dict[str, Any]],
+    reference_models: list[dict[str, str]],
+    aggregator: dict[str, str],
+    temperature: float = 0.6,
+    aggregator_temperature: float = 0.4,
+    max_tokens: int = 4096,
+) -> str:
+    """Run configured reference models and synthesize their advice.
+
+    Failures are returned as model-specific notes instead of aborting the normal
+    agent loop; the main model can still act with partial context.
+    """
+    reference_outputs: list[tuple[str, str]] = []
+    ref_messages = _reference_messages(api_messages)
+    reference_outputs = _run_references_parallel(
+        reference_models,
+        ref_messages,
+        temperature=temperature,
+        max_tokens=max_tokens,
+    )
+
+    joined = "\n\n".join(
+        f"Reference {idx} — {label}:\n{text}"
+        for idx, (label, text) in enumerate(reference_outputs, start=1)
+    )
+    synth_prompt = (
+        "You are the aggregator in a Mixture of Agents process. Synthesize the "
+        "reference responses into concise, actionable guidance for the main "
+        "Hermes agent. Focus on next steps, tool-use strategy, risks, and any "
+        "disagreements. Do not answer the user directly unless that is all that "
+        "is needed; produce context the main agent should use in its normal loop.\n\n"
+        f"Original user prompt:\n{user_prompt}\n\n"
+        f"Reference responses:\n{joined}"
+    )
+
+    agg_label = _slot_label(aggregator)
+    try:
+        response = call_llm(
+            task="moa_aggregator",
+            provider=aggregator["provider"],
+            model=aggregator["model"],
+            messages=[{"role": "user", "content": synth_prompt}],
+            temperature=aggregator_temperature,
+            max_tokens=max_tokens,
+        )
+        synthesis = _extract_text(response)
+    except Exception as exc:
+        logger.warning("MoA aggregator model %s failed: %s", agg_label, exc)
+        synthesis = ""
+
+    if not synthesis:
+        synthesis = joined
+
+    return (
+        "[Mixture of Agents context — use this as private guidance for the "
+        "normal Hermes agent loop. You may call tools, continue reasoning, or "
+        "finish normally.]\n"
+        f"Aggregator: {agg_label}\n"
+        f"References: {', '.join(_slot_label(slot) for slot in reference_models)}\n\n"
+        f"{synthesis.strip()}"
+    )
+
+
+class MoAChatCompletions:
+    """OpenAI-chat-compatible facade where the aggregator is the acting model."""
+
+    def __init__(self, preset_name: str):
+        self.preset_name = preset_name or "default"
+
+    def create(self, **api_kwargs: Any) -> Any:
+        from hermes_cli.config import load_config
+        from hermes_cli.moa_config import resolve_moa_preset
+
+        preset = resolve_moa_preset(load_config().get("moa") or {}, self.preset_name)
+        messages = list(api_kwargs.get("messages") or [])
+        reference_models = preset.get("reference_models") or []
+        aggregator = preset.get("aggregator") or {}
+        max_tokens = int(preset.get("max_tokens", api_kwargs.get("max_tokens") or 4096) or 4096)
+        temperature = float(preset.get("reference_temperature", 0.6) or 0.6)
+        aggregator_temperature = float(preset.get("aggregator_temperature", api_kwargs.get("temperature") or 0.4) or 0.4)
+
+        # When the preset is disabled, skip the reference fan-out and let the
+        # configured aggregator act alone — it is the preset's acting model, so
+        # a disabled MoA preset is simply "use the aggregator directly."
+        if not preset.get("enabled", True):
+            reference_models = []
+
+        reference_outputs: list[tuple[str, str]] = []
+        ref_messages = _reference_messages(messages)
+        reference_outputs = _run_references_parallel(
+            reference_models,
+            ref_messages,
+            temperature=temperature,
+            max_tokens=max_tokens,
+        )
+
+        agg_messages = [dict(m) for m in messages]
+        if reference_outputs:
+            joined = "\n\n".join(
+                f"Reference {idx} — {label}:\n{text}"
+                for idx, (label, text) in enumerate(reference_outputs, start=1)
+            )
+            guidance = (
+                "[Mixture of Agents reference context]\n"
+                f"Preset: {self.preset_name}\n"
+                f"Aggregator/acting model: {_slot_label(aggregator)}\n"
+                f"References: {', '.join(label for label, _ in reference_outputs)}\n\n"
+                "Use the reference responses below as private context. You are the aggregator and acting model: "
+                "answer the user directly or call tools as needed.\n\n"
+                f"{joined}"
+            )
+            for msg in reversed(agg_messages):
+                if msg.get("role") == "user" and isinstance(msg.get("content"), str):
+                    msg["content"] = msg["content"] + "\n\n" + guidance
+                    break
+            else:
+                agg_messages.append({"role": "user", "content": guidance})
+
+        if aggregator.get("provider") == "moa":
+            raise RuntimeError("MoA aggregator cannot be another MoA preset")
+        agg_kwargs = dict(api_kwargs)
+        agg_kwargs["messages"] = agg_messages
+        agg_kwargs["model"] = aggregator.get("model")
+        agg_kwargs["temperature"] = aggregator_temperature
+        return call_llm(
+            task="moa_aggregator",
+            provider=aggregator.get("provider"),
+            model=aggregator.get("model"),
+            messages=agg_messages,
+            temperature=aggregator_temperature,
+            max_tokens=agg_kwargs.get("max_tokens"),
+            tools=agg_kwargs.get("tools"),
+            extra_body=agg_kwargs.get("extra_body"),
+        )
+
+
+class MoAClient:
+    def __init__(self, preset_name: str):
+        self.chat = type("_MoAChat", (), {})()
+        self.chat.completions = MoAChatCompletions(preset_name)
--- a/apps/desktop/src/app/settings/model-settings.tsx
+++ b/apps/desktop/src/app/settings/model-settings.tsx
@ -8,13 +8,15 @@ import {
  getAuxiliaryModels,
  getGlobalModelInfo,
  getGlobalModelOptions,
-  getHermesConfigRecord,
+  getMoaModels,
  getRecommendedDefaultModel,
+  saveMoaModels,
+  getHermesConfigRecord,
  saveHermesConfig,
  setEnvVar,
  setModelAssignment
 } from '@/hermes'
-import type { AuxiliaryModelsResponse, ModelOptionProvider, StaleAuxAssignment } from '@/hermes'
+import type { AuxiliaryModelsResponse, MoaConfigResponse, MoaModelSlot, ModelOptionProvider, StaleAuxAssignment } from '@/hermes'
 import { useI18n } from '@/i18n'
 import { AlertTriangle, Cpu, Loader2 } from '@/lib/icons'
 import { cn } from '@/lib/utils'
@ -115,6 +117,9 @@ export function ModelSettings({ onMainModelChanged }: ModelSettingsProps) {
  const [selectedProvider, setSelectedProvider] = useState('')
  const [selectedModel, setSelectedModel] = useState('')
  const [auxiliary, setAuxiliary] = useState<AuxiliaryModelsResponse | null>(null)
+  const [moa, setMoa] = useState<MoaConfigResponse | null>(null)
+  const [selectedMoaPreset, setSelectedMoaPreset] = useState('')
+  const [newMoaPresetName, setNewMoaPresetName] = useState('')
  // Full profile config, kept so the reasoning/speed defaults round-trip
  // (read agent.* → write back the whole record) like the generic config page.
  const [config, setConfig] = useState<HermesConfigRecord | null>(null)
@ -134,10 +139,11 @@ export function ModelSettings({ onMainModelChanged }: ModelSettingsProps) {
    setError('')

    try {
-      const [modelInfo, modelOptions, auxiliaryModels, cfg] = await Promise.all([
+      const [modelInfo, modelOptions, auxiliaryModels, moaModels, cfg] = await Promise.all([
        getGlobalModelInfo(),
        getGlobalModelOptions(),
        getAuxiliaryModels(),
+        getMoaModels().catch(() => null),
        getHermesConfigRecord()
      ])

@ -146,6 +152,11 @@ export function ModelSettings({ onMainModelChanged }: ModelSettingsProps) {
      setSelectedProvider(prev => prev || modelInfo.provider)
      setSelectedModel(prev => prev || modelInfo.model)
      setAuxiliary(auxiliaryModels)
+      setMoa(moaModels)
+
+      if (moaModels) {
+        setSelectedMoaPreset(prev => prev && moaModels.presets[prev] ? prev : moaModels.default_preset)
+      }
      setConfig(cfg)
    } catch (err) {
      setError(err instanceof Error ? err.message : String(err))
@ -183,6 +194,62 @@ export function ModelSettings({ onMainModelChanged }: ModelSettingsProps) {
    [auxDraft.provider, providers]
  )

+  const modelsForProvider = useCallback(
+    (provider: string) => providers.find(row => row.slug === provider)?.models ?? [],
+    [providers]
+  )
+
+  const currentMoaPreset = useMemo(() => {
+    if (!moa) {
+      return null
+    }
+
+    return moa.presets[selectedMoaPreset] || moa.presets[moa.default_preset] || Object.values(moa.presets)[0] || null
+  }, [moa, selectedMoaPreset])
+
+  const updateMoaPreset = useCallback(
+    (updater: (preset: NonNullable<typeof currentMoaPreset>) => NonNullable<typeof currentMoaPreset>) => {
+      setMoa(prev => {
+        if (!prev || !selectedMoaPreset || !prev.presets[selectedMoaPreset]) {
+          return prev
+        }
+
+        return {
+          ...prev,
+          presets: {
+            ...prev.presets,
+            [selectedMoaPreset]: updater(prev.presets[selectedMoaPreset])
+          }
+        }
+      })
+    },
+    [selectedMoaPreset]
+  )
+
+  const updateMoaSlot = useCallback((slot: MoaModelSlot, patch: Partial<MoaModelSlot>): MoaModelSlot => {
+    const next = { ...slot, ...patch }
+
+    if (patch.provider) {
+      next.model = ''
+    }
+
+    return next
+  }, [])
+
+  const saveMoa = useCallback(async (next: MoaConfigResponse) => {
+    setApplying(true)
+    setError('')
+
+    try {
+      const saved = await saveMoaModels(next)
+      setMoa(saved)
+    } catch (err) {
+      setError(err instanceof Error ? err.message : String(err))
+    } finally {
+      setApplying(false)
+    }
+  }, [])
+
  const auxiliaryTaskLabel = useCallback((key: string) => m.tasks[key]?.label ?? key, [m.tasks])

  // Persistent mismatch: any aux slot pinned to a provider different from the
@ -658,6 +725,115 @@ export function ModelSettings({ onMainModelChanged }: ModelSettingsProps) {
          })}
        </div>
      </section>
+      {moa && currentMoaPreset && (
+        <section>
+          <div className="mb-2.5 flex items-center justify-between">
+            <SectionHeading icon={Cpu} title="Mixture of Agents" />
+            <Button disabled={applying} onClick={() => void saveMoa(moa)} size="sm" variant="textStrong">
+              {applying ? m.applying : t.common.save}
+            </Button>
+          </div>
+          <p className="mb-2 text-xs text-muted-foreground">
+            Configure named presets that appear as models under the Mixture of Agents provider. The aggregator is the acting model.
+          </p>
+          <div className="mb-2 flex flex-wrap items-center gap-2">
+            <Select onValueChange={setSelectedMoaPreset} value={selectedMoaPreset || moa.default_preset}>
+              <SelectTrigger className={cn('min-w-40', CONTROL_TEXT)}><SelectValue placeholder="Preset" /></SelectTrigger>
+              <SelectContent>{Object.keys(moa.presets).map(name => <SelectItem key={name} value={name}>{name}</SelectItem>)}</SelectContent>
+            </Select>
+            <Button disabled={applying} onClick={() => setMoa(prev => prev && ({ ...prev, default_preset: selectedMoaPreset || prev.default_preset }))} size="sm" variant="text">
+              Set default
+            </Button>
+            <Button
+              disabled={Object.keys(moa.presets).length <= 1 || applying}
+              onClick={() => {
+                setMoa(prev => {
+                  if (!prev || Object.keys(prev.presets).length <= 1) {
+                    return prev
+                  }
+
+                  const next = { ...prev.presets }
+                  delete next[selectedMoaPreset]
+                  const fallback = Object.keys(next)[0]
+
+                  return {
+                    ...prev,
+                    presets: next,
+                    default_preset: prev.default_preset === selectedMoaPreset ? fallback : prev.default_preset,
+                    active_preset: prev.active_preset === selectedMoaPreset ? '' : prev.active_preset
+                  }
+                })
+                setSelectedMoaPreset(Object.keys(moa.presets).find(name => name !== selectedMoaPreset) || '')
+              }}
+              size="sm"
+              variant="ghost"
+            >
+              Delete
+            </Button>
+            <Input className={cn('w-40', CONTROL_TEXT)} onChange={event => setNewMoaPresetName(event.target.value)} placeholder="new preset" value={newMoaPresetName} />
+            <Button
+              disabled={!newMoaPresetName.trim() || !!moa.presets[newMoaPresetName.trim()] || applying}
+              onClick={() => {
+                const name = newMoaPresetName.trim()
+                setMoa(prev => prev && ({
+                  ...prev,
+                  presets: { ...prev.presets, [name]: { ...currentMoaPreset, reference_models: [...currentMoaPreset.reference_models] } }
+                }))
+                setSelectedMoaPreset(name)
+                setNewMoaPresetName('')
+              }}
+              size="sm"
+              variant="textStrong"
+            >
+              Add preset
+            </Button>
+          </div>
+          <div className="mb-2 text-xs text-muted-foreground">Default: <span className="font-mono">{moa.default_preset}</span></div>
+          <div className="grid gap-1">
+            {currentMoaPreset.reference_models.map((slot, index) => (
+              <ListRow
+                below={
+                  <div className="mt-2 flex flex-wrap items-center gap-2 pt-1">
+                    <Select onValueChange={value => updateMoaPreset(prev => ({ ...prev, reference_models: prev.reference_models.map((s, i) => i === index ? updateMoaSlot(s, { provider: value }) : s) }))} value={slot.provider}>
+                      <SelectTrigger className={cn('min-w-32', CONTROL_TEXT)}><SelectValue placeholder={m.provider} /></SelectTrigger>
+                      <SelectContent>{providerOptions.map(provider => <SelectItem key={provider.slug || 'none'} value={provider.slug || 'none'}>{provider.name}</SelectItem>)}</SelectContent>
+                    </Select>
+                    <Select onValueChange={value => updateMoaPreset(prev => ({ ...prev, reference_models: prev.reference_models.map((s, i) => i === index ? updateMoaSlot(s, { model: value }) : s) }))} value={slot.model}>
+                      <SelectTrigger className={cn('min-w-48', CONTROL_TEXT)}><SelectValue placeholder={m.model} /></SelectTrigger>
+                      <SelectContent>{modelsForProvider(slot.provider).map(model => <SelectItem key={model} value={model}>{model}</SelectItem>)}</SelectContent>
+                    </Select>
+                    <Button disabled={currentMoaPreset.reference_models.length <= 1 || applying} onClick={() => updateMoaPreset(prev => ({ ...prev, reference_models: prev.reference_models.filter((_, i) => i !== index) }))} size="sm" variant="ghost">
+                      Remove
+                    </Button>
+                  </div>
+                }
+                description={<span className="font-mono text-[0.68rem]">{slot.provider} · {slot.model}</span>}
+                key={`${selectedMoaPreset}-${slot.provider}-${slot.model}-${index}`}
+                title={`Reference ${index + 1}`}
+              />
+            ))}
+            <Button disabled={applying} onClick={() => updateMoaPreset(prev => ({ ...prev, reference_models: [...prev.reference_models, prev.aggregator] }))} size="sm" variant="textStrong">
+              Add reference model
+            </Button>
+            <ListRow
+              below={
+                <div className="mt-2 flex flex-wrap items-center gap-2 pt-1">
+                  <Select onValueChange={value => updateMoaPreset(prev => ({ ...prev, aggregator: updateMoaSlot(prev.aggregator, { provider: value }) }))} value={currentMoaPreset.aggregator.provider}>
+                    <SelectTrigger className={cn('min-w-32', CONTROL_TEXT)}><SelectValue placeholder={m.provider} /></SelectTrigger>
+                    <SelectContent>{providerOptions.map(provider => <SelectItem key={provider.slug || 'none'} value={provider.slug || 'none'}>{provider.name}</SelectItem>)}</SelectContent>
+                  </Select>
+                  <Select onValueChange={value => updateMoaPreset(prev => ({ ...prev, aggregator: updateMoaSlot(prev.aggregator, { model: value }) }))} value={currentMoaPreset.aggregator.model}>
+                    <SelectTrigger className={cn('min-w-48', CONTROL_TEXT)}><SelectValue placeholder={m.model} /></SelectTrigger>
+                    <SelectContent>{modelsForProvider(currentMoaPreset.aggregator.provider).map(model => <SelectItem key={model} value={model}>{model}</SelectItem>)}</SelectContent>
+                  </Select>
+                </div>
+              }
+              description={<span className="font-mono text-[0.68rem]">{currentMoaPreset.aggregator.provider} · {currentMoaPreset.aggregator.model}</span>}
+              title="Aggregator"
+            />
+          </div>
+        </section>
+      )}
    </div>
  )
 }
--- a/apps/desktop/src/app/shell/model-menu-panel.tsx
+++ b/apps/desktop/src/app/shell/model-menu-panel.tsx
@ -16,7 +16,7 @@ import {
 } from '@/components/ui/dropdown-menu'
 import { Skeleton } from '@/components/ui/skeleton'
 import type { HermesGateway } from '@/hermes'
-import { getGlobalModelOptions } from '@/hermes'
+import { getGlobalModelOptions, getMoaModels } from '@/hermes'
 import { useI18n } from '@/i18n'
 import { currentPickerSelection, displayModelName, modelDisplayParts, reasoningEffortLabel } from '@/lib/model-status-label'
 import { cn } from '@/lib/utils'
@ -37,7 +37,7 @@ import {
  $currentProvider,
  $currentReasoningEffort
 } from '@/store/session'
-import type { ModelOptionProvider, ModelOptionsResponse } from '@/types/hermes'
+import type { MoaConfigResponse, ModelOptionProvider, ModelOptionsResponse } from '@/types/hermes'

 import { ModelEditSubmenu, resolveFastControl } from './model-edit-submenu'

@ -64,6 +64,7 @@ export function ModelMenuPanel({ gateway, onSelectModel, requestGateway }: Model
  const [search, setSearch] = useState('')
  const [refreshing, setRefreshing] = useState(false)
  const queryClient = useQueryClient()
+  const [activeMoaPreset, setActiveMoaPreset] = useState('')
  // Reactive session state is read from the stores here (not drilled in), so
  // toggling effort/fast/model re-renders this panel in place without forcing
  // the parent to rebuild the menu content (which would close the dropdown).
@ -86,6 +87,11 @@ export function ModelMenuPanel({ gateway, onSelectModel, requestGateway }: Model
    }
  })

+  const moaOptions = useQuery({
+    queryKey: ['moa-presets'],
+    queryFn: (): Promise<MoaConfigResponse> => getMoaModels()
+  })
+
  const { model: optionsModel, provider: optionsProvider } = currentPickerSelection(
    !!activeSessionId,
    { model: currentModel, provider: currentProvider },
@ -169,6 +175,15 @@ export function ModelMenuPanel({ gateway, onSelectModel, requestGateway }: Model
    )
  }

+  const toggleMoaPreset = async (preset: string) => {
+    if (!activeSessionId) {
+      return
+    }
+
+    await requestGateway('command.dispatch', { name: 'moa', arg: preset, session_id: activeSessionId })
+    setActiveMoaPreset(current => (current === preset ? '' : preset))
+  }
+
  const groups = useMemo(
    () => groupModels(providers ?? [], search, { model: optionsModel, provider: optionsProvider }, effectiveVisibleModels),
    [providers, search, optionsModel, optionsProvider, effectiveVisibleModels]
@ -302,6 +317,27 @@ export function ModelMenuPanel({ gateway, onSelectModel, requestGateway }: Model

      <DropdownMenuSeparator className="mx-0" />

+      {moaOptions.data && Object.keys(moaOptions.data.presets ?? {}).length > 0 ? (
+        <>
+          <DropdownMenuLabel className={dropdownMenuSectionLabel}>MoA presets</DropdownMenuLabel>
+          {Object.keys(moaOptions.data.presets).map(preset => (
+            <DropdownMenuItem
+              className={dropdownMenuRow}
+              disabled={!activeSessionId}
+              key={`moa:${preset}`}
+              onSelect={event => {
+                event.preventDefault()
+                void toggleMoaPreset(preset)
+              }}
+            >
+              <span className="min-w-0 flex-1 truncate">MoA: {preset}</span>
+              {activeMoaPreset === preset ? <Codicon className="ml-auto text-foreground" name="check" size="0.75rem" /> : null}
+            </DropdownMenuItem>
+          ))}
+          <DropdownMenuSeparator className="mx-0" />
+        </>
+      ) : null}
+
      <DropdownMenuItem
        className={cn(dropdownMenuRow, 'text-(--ui-text-tertiary)')}
        disabled={refreshing}
--- a/apps/desktop/src/hermes.ts
+++ b/apps/desktop/src/hermes.ts
@ -23,6 +23,7 @@ import type {
  MessagingPlatformsResponse,
  MessagingPlatformTestResponse,
  MessagingPlatformUpdate,
+  MoaConfigResponse,
  ModelAssignmentRequest,
  ModelAssignmentResponse,
  ModelInfoResponse,
@ -85,6 +86,8 @@ export type {
  MessagingPlatformsResponse,
  MessagingPlatformTestResponse,
  MessagingPlatformUpdate,
+  MoaConfigResponse,
+  MoaModelSlot,
  ModelAssignmentRequest,
  ModelAssignmentResponse,
  ModelInfoResponse,
@ -746,6 +749,22 @@ export function getAuxiliaryModels(): Promise<AuxiliaryModelsResponse> {
  })
 }

+export function getMoaModels(): Promise<MoaConfigResponse> {
+  return window.hermesDesktop.api<MoaConfigResponse>({
+    ...profileScoped(),
+    path: '/api/model/moa'
+  })
+}
+
+export function saveMoaModels(body: MoaConfigResponse): Promise<MoaConfigResponse & { ok: boolean }> {
+  return window.hermesDesktop.api<MoaConfigResponse & { ok: boolean }>({
+    ...profileScoped(),
+    path: '/api/model/moa',
+    method: 'PUT',
+    body
+  })
+}
+
 export function setModelAssignment(body: ModelAssignmentRequest): Promise<ModelAssignmentResponse> {
  return window.hermesDesktop.api<ModelAssignmentResponse>({
    ...profileScoped(),
--- a/apps/desktop/src/types/hermes.ts
+++ b/apps/desktop/src/types/hermes.ts
@ -725,6 +725,30 @@ export interface AuxiliaryModelsResponse {
  tasks: AuxiliaryTaskAssignment[]
 }

+export interface MoaModelSlot {
+  provider: string
+  model: string
+}
+
+export interface MoaConfigResponse {
+  default_preset: string
+  active_preset: string
+  presets: Record<string, {
+    aggregator: MoaModelSlot
+    aggregator_temperature: number
+    enabled: boolean
+    max_tokens: number
+    reference_models: MoaModelSlot[]
+    reference_temperature: number
+  }>
+  aggregator: MoaModelSlot
+  aggregator_temperature: number
+  enabled: boolean
+  max_tokens: number
+  reference_models: MoaModelSlot[]
+  reference_temperature: number
+}
+
 export interface ModelAssignmentRequest {
  /** Optional API key for a custom/local endpoint. Persisted to model.api_key
   *  (where the runtime reads it) for self-hosted endpoints that require auth.
--- a/cli-config.yaml.example
+++ b/cli-config.yaml.example
@ -783,7 +783,6 @@ platform_toolsets:
 #   image_gen    - image_generate  (requires FAL_KEY)
 #   skills       - skills_list, skill_view
 #   skills_hub   - skill_hub (search/install/manage from online registries — user-driven only)
-#   moa          - mixture_of_agents  (requires OPENROUTER_API_KEY)
 #   todo         - todo (in-memory task planning, no deps)
 #   tts          - text_to_speech  (Edge TTS free, or ELEVENLABS/OPENAI/MINIMAX/MISTRAL key)
 #   cronjob      - cronjob (create/list/update/pause/resume/run/remove scheduled tasks)
@ -798,7 +797,7 @@ platform_toolsets:
 #
 # COMPOSITE:
 #   debugging    - terminal + web + file
-#   safe         - web + vision + moa (no terminal access)
+#   safe         - web + vision (no terminal access)
 #   all          - Everything available
 #
 #   web          - Web search and content extraction (web_search, web_extract)
@ -809,7 +808,6 @@ platform_toolsets:
 #   vision       - Image analysis (vision_analyze)
 #   image_gen    - Image generation with FLUX (image_generate)
 #   skills       - Load skill documents (skills_list, skill_view)
-#   moa          - Mixture of Agents reasoning (mixture_of_agents)
 #   todo         - Task planning and tracking for multi-step work
 #   memory       - Persistent memory across sessions (personal notes + user profile)
 #   session_search - Search and recall past conversations (FTS5 + Gemini Flash summarization)
@ -818,7 +816,7 @@ platform_toolsets:
 #
 # Composite toolsets:
 #   debugging    - terminal + web + file (for troubleshooting)
-#   safe         - web + vision + moa (no terminal access)
+#   safe         - web + vision (no terminal access)

 # NOTE: The top-level "toolsets" key is deprecated and ignored.
 # Tool configuration is managed per-platform via platform_toolsets above.
--- a/cli.py
+++ b/cli.py
@ -8422,6 +8422,51 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
                _cprint(f"  No agent running; queued as next turn: {payload[:80]}{'...' if len(payload) > 80 else ''}")
        elif canonical == "goal":
            self._handle_goal_command(cmd_original)
+        elif canonical == "moa":
+            from hermes_cli.moa_config import (
+                exact_moa_preset_name,
+                moa_usage,
+                normalize_moa_config,
+                resolve_moa_preset,
+            )
+
+            parts = cmd_original.split(None, 1)
+            payload = parts[1].strip() if len(parts) > 1 else ""
+            moa_cfg = self.config.get("moa") if isinstance(self.config, dict) else {}
+            normalized = normalize_moa_config(moa_cfg)
+            matched_preset = exact_moa_preset_name(normalized, payload) if payload else normalized["default_preset"]
+            if matched_preset:
+                self.requested_provider = "moa"
+                self.provider = "moa"
+                self.model = matched_preset
+                self.api_key = "moa-virtual-provider"
+                self.base_url = "moa://local"
+                self.api_mode = "chat_completions"
+                self.agent = None
+                _cprint(f"  Model switched to MoA preset: {matched_preset}.")
+            else:
+                if not payload:
+                    _cprint(f"  {moa_usage()}")
+                    return True
+                preset = normalized["default_preset"]
+                self._pending_moa_restore_model = {
+                    "requested_provider": getattr(self, "requested_provider", None),
+                    "provider": getattr(self, "provider", None),
+                    "model": getattr(self, "model", None),
+                    "api_key": getattr(self, "api_key", None),
+                    "base_url": getattr(self, "base_url", None),
+                    "api_mode": getattr(self, "api_mode", None),
+                }
+                self.requested_provider = "moa"
+                self.provider = "moa"
+                self.model = preset
+                self.api_key = "moa-virtual-provider"
+                self.base_url = "moa://local"
+                self.api_mode = "chat_completions"
+                self.agent = None
+                self._pending_moa_disable_after_turn = True
+                self._pending_agent_seed = payload
+                _cprint(f"  MoA one-shot queued with preset {preset}; previous model will be restored after this turn.")
        elif canonical == "subgoal":
            self._handle_subgoal_command(cmd_original)
        elif canonical == "skin":
@ -11672,6 +11717,10 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
                if _srn:
                    agent_message = _prepend_note_to_message(agent_message, _srn)
                    self._pending_skills_reload_note = None
+                _moa_cfg = getattr(self, "_pending_moa_config", None)
+                self._pending_moa_config = None
+                if _moa_cfg is None:
+                    _moa_cfg = None
                try:
                    result = self.agent.run_conversation(
                        user_message=agent_message,
@ -11679,7 +11728,16 @@ class HermesCLI(CLIAgentSetupMixin, CLICommandsMixin):
                        stream_callback=stream_callback,
                        task_id=self.session_id,
                        persist_user_message=message if _voice_prefix else None,
+                        moa_config=_moa_cfg,
                    )
+                    if getattr(self, "_pending_moa_disable_after_turn", False):
+                        _restore = getattr(self, "_pending_moa_restore_model", None) or {}
+                        for _key, _value in _restore.items():
+                            if _value is not None:
+                                setattr(self, _key, _value)
+                        self.agent = None
+                        self._pending_moa_restore_model = None
+                        self._pending_moa_disable_after_turn = False
                except Exception as exc:
                    logging.error("run_conversation raised: %s", exc, exc_info=True)
                    _summary = getattr(self.agent, '_summarize_api_error', lambda e: str(e)[:300])(exc)
--- a/gateway/run.py
+++ b/gateway/run.py
@ -8028,6 +8028,9 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
                    return await self._handle_goal_command(event)
                return "Agent is running — use /goal status / pause / clear / wait mid-run, or /stop before setting a new goal."

+            if _cmd_def_inner and _cmd_def_inner.name == "moa":
+                return "Agent is running — wait or /stop first, then run /moa."
+
            # /subgoal is safe mid-run — it only modifies the goal's
            # subgoals list, which the judge reads at the next turn
            # boundary. No race with the running turn.
@ -8532,6 +8535,50 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
        if canonical == "goal":
            return await self._handle_goal_command(event)

+        if canonical == "moa":
+            from hermes_cli.moa_config import (
+                exact_moa_preset_name,
+                moa_usage,
+                normalize_moa_config,
+                resolve_moa_preset,
+            )
+            from hermes_cli.config import load_config
+
+            moa_payload = event.get_command_args().strip()
+            try:
+                cfg = load_config()
+                moa_cfg = normalize_moa_config(cfg.get("moa") if isinstance(cfg, dict) else {})
+            except Exception:
+                moa_cfg = normalize_moa_config({})
+            matched_preset = exact_moa_preset_name(moa_cfg, moa_payload) if moa_payload else moa_cfg["default_preset"]
+            if matched_preset:
+                self._session_model_overrides[_quick_key] = {
+                    "provider": "moa",
+                    "model": matched_preset,
+                    "base_url": "moa://local",
+                    "api_key": "moa-virtual-provider",
+                    "api_mode": "chat_completions",
+                }
+                self._evict_cached_agent(_quick_key)
+                return f"Model switched to MoA preset: {matched_preset}."
+            if not moa_payload:
+                return moa_usage()
+            preset = moa_cfg["default_preset"]
+            try:
+                event.text = moa_payload
+                event._moa_restore_override = self._session_model_overrides.get(_quick_key)
+                self._session_model_overrides[_quick_key] = {
+                    "provider": "moa",
+                    "model": preset,
+                    "base_url": "moa://local",
+                    "api_key": "moa-virtual-provider",
+                    "api_mode": "chat_completions",
+                }
+                self._evict_cached_agent(_quick_key)
+                event._moa_disable_after_turn = True
+            except Exception:
+                return "Failed to prepare MoA turn."
+
        if canonical == "subgoal":
            return await self._handle_subgoal_command(event)

@ -8741,6 +8788,16 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew

        try:
            _agent_result = await self._handle_message_with_agent(event, source, _quick_key, _run_generation)
+            if getattr(event, "_moa_disable_after_turn", False):
+                try:
+                    _restore = getattr(event, "_moa_restore_override", None)
+                    if _restore is None:
+                        self._session_model_overrides.pop(_quick_key, None)
+                    else:
+                        self._session_model_overrides[_quick_key] = _restore
+                    self._evict_cached_agent(_quick_key)
+                except Exception:
+                    pass
            # Goal continuation: after the agent returns a final response
            # for this turn, check any standing /goal — the judge will
            # either mark it done, pause it (budget), or enqueue a
@ -9866,6 +9923,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
                run_generation=run_generation,
                event_message_id=self._reply_anchor_for_event(event),
                channel_prompt=event.channel_prompt,
+                moa_config=getattr(event, "_moa_config", None),
                persist_user_message=persist_user_message,
                persist_user_timestamp=persist_user_timestamp,
            )
@ -14681,6 +14739,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
        _interrupt_depth: int = 0,
        event_message_id: Optional[str] = None,
        channel_prompt: Optional[str] = None,
+        moa_config: Optional[dict] = None,
        persist_user_message: Optional[str] = None,
        persist_user_timestamp: Optional[float] = None,
    ) -> Dict[str, Any]:
@ -14698,7 +14757,8 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
                message, context_prompt, history, source, session_id,
                session_key=session_key, run_generation=run_generation,
                _interrupt_depth=_interrupt_depth, event_message_id=event_message_id,
-                channel_prompt=channel_prompt, persist_user_message=persist_user_message,
+                channel_prompt=channel_prompt, moa_config=moa_config,
+                persist_user_message=persist_user_message,
                persist_user_timestamp=persist_user_timestamp,
            )

@ -14708,7 +14768,8 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
                message, context_prompt, history, source, session_id,
                session_key=session_key, run_generation=run_generation,
                _interrupt_depth=_interrupt_depth, event_message_id=event_message_id,
-                channel_prompt=channel_prompt, persist_user_message=persist_user_message,
+                channel_prompt=channel_prompt, moa_config=moa_config,
+                persist_user_message=persist_user_message,
                persist_user_timestamp=persist_user_timestamp,
            )

@ -14739,6 +14800,7 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
        _interrupt_depth: int = 0,
        event_message_id: Optional[str] = None,
        channel_prompt: Optional[str] = None,
+        moa_config: Optional[dict] = None,
        persist_user_message: Optional[str] = None,
        persist_user_timestamp: Optional[float] = None,
    ) -> Dict[str, Any]:
@ -16322,6 +16384,8 @@ class GatewayRunner(GatewayAuthorizationMixin, GatewayKanbanWatchersMixin, Gatew
                    _conversation_kwargs["persist_user_message"] = _persist_user_message_override
                elif observed_group_context:
                    _conversation_kwargs["persist_user_message"] = message
+                if moa_config is not None:
+                    _conversation_kwargs["moa_config"] = moa_config
                if _persist_user_timestamp_override is not None:
                    _conversation_kwargs["persist_user_timestamp"] = _persist_user_timestamp_override
                result = agent.run_conversation(_api_run_message, **_conversation_kwargs)
--- a/hermes_cli/commands.py
+++ b/hermes_cli/commands.py
@ -109,6 +109,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
               args_hint="<prompt>"),
    CommandDef("goal", "Set a standing goal Hermes works on across turns until achieved", "Session",
               args_hint="[text | draft <text> | show | pause | resume | clear | status | wait <pid> | unwait]"),
+    CommandDef("moa", "Run one prompt through configured Mixture of Agents models", "Session",
+               args_hint="<prompt>"),
    CommandDef("subgoal", "Add or manage extra criteria on the active goal", "Session",
               args_hint="[text | remove N | clear]"),
    CommandDef("status", "Show session, model, token, and context info", "Session"),
@ -1153,8 +1155,10 @@ _SLACK_PRIORITY_ALIASES = ("btw", "bg")
 # "Slack-via-/hermes" decision, not a silent clamp.
 #   - credits: the billing/top-up surface; reached via /hermes credits on Slack.
 #   - billing: the terminal-billing surface (buy/auto-reload/limit); /hermes billing.
+#   - moa: high-cost slash mode, available through /hermes moa to avoid
+#     displacing existing native Slack slash commands at the 50-command cap.
 #   - debug: the log/report upload surface; reached via /hermes debug on Slack.
-_SLACK_VIA_HERMES_ONLY = frozenset({"credits", "billing", "debug"})
+_SLACK_VIA_HERMES_ONLY = frozenset({"credits", "billing", "moa", "debug"})


 def _sanitize_slack_name(raw: str) -> str:
--- a/hermes_cli/config.py
+++ b/hermes_cli/config.py
@ -1576,6 +1576,22 @@ DEFAULT_CONFIG = {
            "timeout": 120,
            "extra_body": {},
        },
+        "moa_reference": {
+            "provider": "auto",
+            "model": "",
+            "base_url": "",
+            "api_key": "",
+            "timeout": 600,
+            "extra_body": {},
+        },
+        "moa_aggregator": {
+            "provider": "auto",
+            "model": "",
+            "base_url": "",
+            "api_key": "",
+            "timeout": 600,
+            "extra_body": {},
+        },
    },
    
    "display": {
@ -2054,6 +2070,27 @@ DEFAULT_CONFIG = {
        "max_turns": 20,
    },

+    # Mixture of Agents — named presets used by /moa. A preset is an execution
+    # mode around the main model, not a provider/model itself: references +
+    # aggregator synthesize private guidance before each main-model iteration.
+    "moa": {
+        "default_preset": "default",
+        "active_preset": "",
+        "presets": {
+            "default": {
+                "reference_models": [
+                    {"provider": "openai-codex", "model": "gpt-5.5"},
+                    {"provider": "openrouter", "model": "deepseek/deepseek-v4-pro"},
+                ],
+                "aggregator": {"provider": "openrouter", "model": "anthropic/claude-opus-4.8"},
+                "reference_temperature": 0.6,
+                "aggregator_temperature": 0.4,
+                "max_tokens": 4096,
+                "enabled": True,
+            }
+        },
+    },
+
    # Skills — external skill directories for sharing skills across tools/agents.
    # Each path is expanded (~, ${VAR}) and resolved.  Read-only — skill creation
    # always goes to ~/.hermes/skills/.
@ -2953,7 +2990,7 @@ OPTIONAL_ENV_VARS = {
        "prompt": "OpenRouter API key",
        "url": "https://openrouter.ai/keys",
        "password": True,
-        "tools": ["vision_analyze", "mixture_of_agents"],
+        "tools": ["vision_analyze"],
        "category": "provider",
        "advanced": True,
    },
@ -4503,7 +4540,7 @@ _KNOWN_ROOT_KEYS = {
    "_config_version", "model", "providers", "fallback_model",
    "fallback_providers", "credential_pool_strategies", "toolsets",
    "agent", "terminal", "display", "compression", "delegation",
-    "auxiliary", "custom_providers", "context", "memory", "gateway",
+    "auxiliary", "moa", "custom_providers", "context", "memory", "gateway",
    "sessions", "streaming", "updates", "mcp_servers",
 }

--- a/hermes_cli/inventory.py
+++ b/hermes_cli/inventory.py
@ -163,6 +163,10 @@ def build_models_payload(
        refresh=refresh,
    )

+    moa_row = _moa_provider_row(ctx)
+    if moa_row is not None:
+        rows = [moa_row] + [r for r in rows if str(r.get("slug", "")).lower() != "moa"]
+
    # --- Deduplicate: remove models from aggregators that overlap with
    # user-defined providers.  When a local proxy (e.g. litellm-proxy)
    # serves a model whose name also appears in an aggregator's curated
@ -209,7 +213,7 @@ def build_models_payload(
                    row["total_models"] = len(filtered)

    if include_unconfigured:
-        rows = list(rows) + _append_unconfigured_rows(rows, ctx)
+        rows = list(rows) + [r for r in _append_unconfigured_rows(rows, ctx) if str(r.get("slug", "")).lower() != "moa"]
    if picker_hints:
        _apply_picker_hints(rows)
    if canonical_order:
@ -436,3 +440,28 @@ def _apply_pricing(
                # is never blocked from picking a model.
                row["free_tier"] = False
                row["unavailable_models"] = []
+
+
+def _moa_provider_row(ctx: ConfigContext) -> dict | None:
+    try:
+        from hermes_cli.config import load_config
+        from hermes_cli.moa_config import normalize_moa_config
+
+        cfg = normalize_moa_config(load_config().get("moa") or {})
+        models = list(cfg.get("presets", {}).keys())
+        if not models:
+            return None
+        return {
+            "slug": "moa",
+            "name": "Mixture of Agents",
+            "is_current": (ctx.current_provider or "").lower() == "moa",
+            "is_user_defined": False,
+            "models": models,
+            "total_models": len(models),
+            "source": "virtual",
+            "authenticated": True,
+            "auth_type": "virtual",
+            "warning": "Aggregator acts as the selected model; references provide analysis before each call.",
+        }
+    except Exception:
+        return None
--- a/hermes_cli/main.py
+++ b/hermes_cli/main.py
@ -11579,7 +11579,7 @@ _BUILTIN_SUBCOMMANDS = frozenset(
        "computer-use",
        "config", "cron", "curator", "dashboard", "debug", "doctor",
        "dump", "fallback", "gateway", "hooks", "import", "insights",
-        "gui", "desktop", "kanban", "login", "logout", "logs", "lsp", "mcp", "memory", "migrate",
+        "gui", "desktop", "kanban", "login", "logout", "logs", "lsp", "mcp", "memory", "migrate", "moa",
        "model", "pairing", "pets", "plugins", "portal", "postinstall", "profile", "proxy",
        "prompt-size",
        "send", "sessions", "setup",
@ -12104,6 +12104,21 @@ def main():
    # =========================================================================
    build_model_parser(subparsers, cmd_model=cmd_model)

+    from hermes_cli.moa_cmd import cmd_moa
+
+    moa_parser = subparsers.add_parser(
+        "moa",
+        help="Configure Mixture of Agents provider/model slots",
+        description="Configure the provider/model set used by /moa <prompt>.",
+    )
+    moa_subparsers = moa_parser.add_subparsers(dest="moa_command")
+    moa_subparsers.add_parser("list", aliases=["ls"], help="Show current MoA model slots")
+    moa_configure = moa_subparsers.add_parser("configure", aliases=["config"], help="Interactively pick MoA models")
+    moa_configure.add_argument("name", nargs="?", help="Preset name to create or update")
+    moa_delete = moa_subparsers.add_parser("delete", aliases=["rm"], help="Delete a MoA preset")
+    moa_delete.add_argument("name", help="Preset name to delete")
+    moa_parser.set_defaults(func=cmd_moa)
+
    # =========================================================================
    # fallback command — manage the fallback provider chain
    # =========================================================================
--- a/hermes_cli/moa_cmd.py
+++ b/hermes_cli/moa_cmd.py
@ -0,0 +1,135 @@
+"""CLI helpers for configuring Mixture of Agents."""
+
+from __future__ import annotations
+
+from typing import Any
+
+from hermes_cli.config import load_config, save_config
+from hermes_cli.inventory import build_models_payload, load_picker_context
+from hermes_cli.moa_config import DEFAULT_MOA_PRESET_NAME, normalize_moa_config
+
+
+def _prompt_choice(title: str, rows: list[str], default: int = 0) -> int:
+    try:
+        from hermes_cli.curses_ui import curses_radiolist
+
+        return curses_radiolist(title, rows, selected=default, cancel_returns=default)
+    except Exception:
+        for idx, row in enumerate(rows, start=1):
+            print(f"{idx}. {row}")
+        raw = input(f"{title} [{default + 1}]: ").strip()
+        if not raw:
+            return default
+        try:
+            return max(0, min(len(rows) - 1, int(raw) - 1))
+        except ValueError:
+            return default
+
+
+def _model_options() -> list[dict[str, Any]]:
+    payload = build_models_payload(
+        load_picker_context(),
+        include_unconfigured=True,
+        picker_hints=True,
+        canonical_order=True,
+        pricing=True,
+        capabilities=True,
+        max_models=200,
+    )
+    providers = payload.get("providers") or []
+    return [p for p in providers if p.get("slug") and p.get("models")]
+
+
+def _pick_slot(current: dict[str, str] | None = None) -> dict[str, str]:
+    providers = _model_options()
+    if not providers:
+        raise RuntimeError("No configured model providers found. Run `hermes model` first.")
+    current_provider = (current or {}).get("provider", "")
+    provider_default = next(
+        (idx for idx, p in enumerate(providers) if p.get("slug") == current_provider),
+        0,
+    )
+    provider_rows = [f"{p.get('name') or p.get('slug')}  ({p.get('slug')})" for p in providers]
+    provider = providers[_prompt_choice("Select provider", provider_rows, provider_default)]
+    models = list(provider.get("models") or [])
+    if not models:
+        raise RuntimeError(f"Provider {provider.get('slug')} has no selectable models")
+    current_model = (current or {}).get("model", "")
+    model_default = models.index(current_model) if current_model in models else 0
+    model = models[_prompt_choice(f"Select model for {provider.get('slug')}", models, model_default)]
+    return {"provider": str(provider.get("slug") or ""), "model": str(model)}
+
+
+def _print_config(config: dict[str, Any]) -> None:
+    cfg = normalize_moa_config(config.get("moa") if isinstance(config, dict) else {})
+    print("Mixture of Agents presets")
+    print(f"Default: {cfg['default_preset']}")
+    active = cfg.get("active_preset") or "(off)"
+    print(f"Active in config: {active}")
+    for name, preset in cfg["presets"].items():
+        marker = "*" if name == cfg["default_preset"] else " "
+        print(f"\n{marker} {name}")
+        print("  Reference models:")
+        for idx, slot in enumerate(preset["reference_models"], start=1):
+            print(f"    {idx}. {slot['provider']}:{slot['model']}")
+        agg = preset["aggregator"]
+        print(f"  Aggregator: {agg['provider']}:{agg['model']}")
+
+
+def cmd_moa(args) -> None:
+    """Manage Mixture of Agents model presets."""
+    cfg = load_config()
+    sub = getattr(args, "moa_command", None) or "list"
+
+    if sub in {"list", "ls"}:
+        _print_config(cfg)
+        return
+
+    if sub in {"config", "configure"}:
+        moa = normalize_moa_config(cfg.get("moa") if isinstance(cfg, dict) else {})
+        preset_name = (getattr(args, "name", None) or moa.get("default_preset") or DEFAULT_MOA_PRESET_NAME).strip()
+        current = moa["presets"].get(preset_name, moa["presets"][moa["default_preset"]])
+        print(f"Configure MoA preset: {preset_name}")
+        print("Pick at least one reference model; choose Done when finished.")
+        refs: list[dict[str, str]] = []
+        existing = list(current.get("reference_models") or [])
+        idx = 0
+        while True:
+            base = existing[idx] if idx < len(existing) else None
+            refs.append(_pick_slot(base))
+            idx += 1
+            choice = _prompt_choice("Add another reference model?", ["Add another", "Done"], 1)
+            if choice == 1:
+                break
+        print("Configure aggregator model.")
+        current = dict(current)
+        current["reference_models"] = refs
+        current["aggregator"] = _pick_slot(current.get("aggregator"))
+        moa["presets"][preset_name] = current
+        moa.setdefault("default_preset", preset_name)
+        cfg["moa"] = normalize_moa_config(moa)
+        save_config(cfg)
+        print(f"Saved MoA preset: {preset_name}")
+        _print_config(cfg)
+        return
+
+    if sub == "delete":
+        moa = normalize_moa_config(cfg.get("moa") if isinstance(cfg, dict) else {})
+        preset_name = (getattr(args, "name", None) or "").strip()
+        if not preset_name:
+            raise SystemExit("Usage: hermes moa delete <name>")
+        if preset_name not in moa["presets"]:
+            raise SystemExit(f"Unknown MoA preset: {preset_name}")
+        if len(moa["presets"]) <= 1:
+            raise SystemExit("Cannot delete the only MoA preset")
+        del moa["presets"][preset_name]
+        if moa["default_preset"] == preset_name:
+            moa["default_preset"] = next(iter(moa["presets"]))
+        if moa.get("active_preset") == preset_name:
+            moa["active_preset"] = ""
+        cfg["moa"] = normalize_moa_config(moa)
+        save_config(cfg)
+        print(f"Deleted MoA preset: {preset_name}")
+        return
+
+    raise SystemExit(f"Unknown moa subcommand: {sub}")
--- a/hermes_cli/moa_config.py
+++ b/hermes_cli/moa_config.py
@ -0,0 +1,174 @@
+"""Mixture-of-Agents configuration and slash-command helpers."""
+
+from __future__ import annotations
+
+import base64
+import json
+from copy import deepcopy
+from typing import Any
+
+MOA_MARKER_PREFIX = "__HERMES_MOA_TURN_V1__"
+DEFAULT_MOA_PRESET_NAME = "default"
+
+DEFAULT_MOA_REFERENCE_MODELS: list[dict[str, str]] = [
+    {"provider": "openai-codex", "model": "gpt-5.5"},
+    {"provider": "openrouter", "model": "deepseek/deepseek-v4-pro"},
+]
+
+DEFAULT_MOA_AGGREGATOR: dict[str, str] = {
+    "provider": "openrouter",
+    "model": "anthropic/claude-opus-4.8",
+}
+
+
+def _clean_slot(slot: Any) -> dict[str, str] | None:
+    if not isinstance(slot, dict):
+        return None
+    provider = str(slot.get("provider") or "").strip()
+    model = str(slot.get("model") or "").strip()
+    if not provider or not model:
+        return None
+    return {"provider": provider, "model": model}
+
+
+def _default_preset() -> dict[str, Any]:
+    return {
+        "reference_models": deepcopy(DEFAULT_MOA_REFERENCE_MODELS),
+        "aggregator": deepcopy(DEFAULT_MOA_AGGREGATOR),
+        "reference_temperature": 0.6,
+        "aggregator_temperature": 0.4,
+        "max_tokens": 4096,
+        "enabled": True,
+    }
+
+
+def _normalize_preset(raw: Any) -> dict[str, Any]:
+    if not isinstance(raw, dict):
+        raw = {}
+
+    refs = [_clean_slot(item) for item in raw.get("reference_models") or []]
+    refs = [item for item in refs if item is not None]
+    if not refs:
+        refs = deepcopy(DEFAULT_MOA_REFERENCE_MODELS)
+
+    aggregator = _clean_slot(raw.get("aggregator")) or deepcopy(DEFAULT_MOA_AGGREGATOR)
+
+    return {
+        "enabled": bool(raw.get("enabled", True)),
+        "reference_models": refs,
+        "aggregator": aggregator,
+        "reference_temperature": float(raw.get("reference_temperature", 0.6) or 0.6),
+        "aggregator_temperature": float(raw.get("aggregator_temperature", 0.4) or 0.4),
+        "max_tokens": int(raw.get("max_tokens", 4096) or 4096),
+    }
+
+
+def normalize_moa_config(raw: Any) -> dict[str, Any]:
+    """Return validated MoA config with named presets.
+
+    Backward compatible with the first PR shape where ``moa`` itself contained
+    ``reference_models`` and ``aggregator`` directly.
+    """
+    if not isinstance(raw, dict):
+        raw = {}
+
+    presets_raw = raw.get("presets")
+    presets: dict[str, dict[str, Any]] = {}
+    if isinstance(presets_raw, dict):
+        for name, preset in presets_raw.items():
+            clean_name = str(name or "").strip()
+            if clean_name:
+                presets[clean_name] = _normalize_preset(preset)
+
+    # Legacy flat config becomes the default preset.
+    if not presets:
+        presets[DEFAULT_MOA_PRESET_NAME] = _normalize_preset(raw)
+
+    default_name = str(raw.get("default_preset") or "").strip()
+    if not default_name or default_name not in presets:
+        default_name = next(iter(presets), DEFAULT_MOA_PRESET_NAME)
+    if default_name not in presets:
+        presets[default_name] = _default_preset()
+
+    active_name = str(raw.get("active_preset") or "").strip()
+    if active_name not in presets:
+        active_name = ""
+
+    active = presets[default_name]
+    return {
+        "default_preset": default_name,
+        "active_preset": active_name,
+        "presets": presets,
+        # Compatibility/flattened view for existing dashboard/desktop callers.
+        "reference_models": deepcopy(active["reference_models"]),
+        "aggregator": deepcopy(active["aggregator"]),
+        "reference_temperature": active["reference_temperature"],
+        "aggregator_temperature": active["aggregator_temperature"],
+        "max_tokens": active["max_tokens"],
+        "enabled": active["enabled"],
+    }
+
+
+def list_moa_presets(config: Any) -> list[str]:
+    cfg = normalize_moa_config(config)
+    return list(cfg["presets"].keys())
+
+
+def resolve_moa_preset(config: Any, name: str | None = None) -> dict[str, Any]:
+    cfg = normalize_moa_config(config)
+    preset_name = str(name or cfg.get("default_preset") or DEFAULT_MOA_PRESET_NAME).strip()
+    preset = cfg["presets"].get(preset_name)
+    if preset is None:
+        raise KeyError(preset_name)
+    return deepcopy(preset)
+
+
+def exact_moa_preset_name(config: Any, text: str) -> str | None:
+    wanted = str(text or "").strip()
+    if not wanted:
+        return None
+    cfg = normalize_moa_config(config)
+    return wanted if wanted in cfg["presets"] else None
+
+
+def set_active_moa_preset(config: Any, name: str | None) -> dict[str, Any]:
+    cfg = normalize_moa_config(config)
+    clean = str(name or "").strip()
+    if clean and clean not in cfg["presets"]:
+        raise KeyError(clean)
+    cfg["active_preset"] = clean
+    return cfg
+
+
+def encode_moa_turn(prompt: str, config: Any = None, preset: str | None = None) -> str:
+    """Encode a /moa one-shot turn for frontends that can only send text."""
+    payload = {
+        "prompt": str(prompt or ""),
+        "config": resolve_moa_preset(config or {}, preset),
+    }
+    encoded = base64.urlsafe_b64encode(
+        json.dumps(payload, separators=(",", ":"), ensure_ascii=False).encode("utf-8")
+    ).decode("ascii")
+    return f"{MOA_MARKER_PREFIX}{encoded}"
+
+
+def decode_moa_turn(message: Any) -> tuple[str, dict[str, Any] | None]:
+    """Decode a hidden /moa one-shot marker."""
+    if not isinstance(message, str) or not message.startswith(MOA_MARKER_PREFIX):
+        return message, None
+    encoded = message[len(MOA_MARKER_PREFIX):].strip()
+    try:
+        payload = json.loads(base64.urlsafe_b64decode(encoded.encode("ascii")).decode("utf-8"))
+    except Exception:
+        return message, None
+    prompt = str(payload.get("prompt") or "")
+    return prompt, _normalize_preset(payload.get("config") or {})
+
+
+def build_moa_turn_prompt(user_prompt: str, config: Any = None, preset: str | None = None) -> str:
+    """Build the hidden one-shot payload used by TUI/gateway routing."""
+    return encode_moa_turn(user_prompt, config, preset=preset)
+
+
+def moa_usage() -> str:
+    return "Usage: /moa [preset-name | prompt]  (bare /moa toggles the default preset)"
--- a/hermes_cli/model_switch.py
+++ b/hermes_cli/model_switch.py
@ -807,6 +807,7 @@ def switch_model(
    resolved_alias = ""
    new_model = raw_input.strip()
    target_provider = current_provider
+    resolved_moa_preset = False

    # =================================================================
    # PATH A: Explicit --provider given
@ -843,6 +844,14 @@ def switch_model(
            )

        target_provider = pdef.id
+        if target_provider == "moa" and not new_model:
+            try:
+                from hermes_cli.config import load_config
+                from hermes_cli.moa_config import normalize_moa_config
+
+                new_model = normalize_moa_config(load_config().get("moa") or {})["default_preset"]
+            except Exception:
+                new_model = "default"

        # Guard against silent aggregator hops. A vendor name like bare
        # "openai" is an alias that resolves to an aggregator ("openrouter").
@ -925,10 +934,28 @@ def switch_model(
    # PATH B: No explicit provider — resolve from model input
    # =================================================================
    else:
-        # --- Step a: Try alias resolution on current provider ---
-        alias_result = resolve_alias(raw_input, current_provider)
+        try:
+            from hermes_cli.config import load_config
+            from hermes_cli.moa_config import exact_moa_preset_name, normalize_moa_config

-        if alias_result is not None:
+            _moa_cfg = normalize_moa_config(load_config().get("moa") or {})
+            _moa_match = exact_moa_preset_name(_moa_cfg, raw_input)
+            if _moa_match:
+                target_provider = "moa"
+                new_model = _moa_match
+                resolved_alias = ""
+                resolved_moa_preset = True
+                alias_result = None
+            else:
+                alias_result = resolve_alias(raw_input, current_provider)
+        except Exception:
+            alias_result = resolve_alias(raw_input, current_provider)
+
+        # --- Step a: Try alias resolution on current provider ---
+
+        if resolved_moa_preset:
+            pass
+        elif alias_result is not None:
            target_provider, new_model, resolved_alias = alias_result
            logger.debug(
                "Alias '%s' resolved to %s on %s",
@ -961,7 +988,7 @@ def switch_model(
                            f"Try specifying the full model name."
                        ),
                    )
-            else:
+            elif not resolved_moa_preset:
                # --- Step c: On aggregator, convert vendor:model to vendor/model ---
                # Only convert when there's no slash — a slash means the name
                # is already in vendor/model format and the colon is a variant
--- a/hermes_cli/models.py
+++ b/hermes_cli/models.py
@ -173,6 +173,7 @@ def _xai_curated_models() -> list[str]:


 _PROVIDER_MODELS: dict[str, list[str]] = {
+    "moa": ["default"],
    "nous": [
        # Anthropic
        "anthropic/claude-opus-4.8",
@ -1003,6 +1004,7 @@ class ProviderEntry(NamedTuple):
 CANONICAL_PROVIDERS: list[ProviderEntry] = [
    ProviderEntry("nous",           "Nous Portal",              "Nous Portal (Everything your agent needs, 300+ models with bundled tool use)"),
    ProviderEntry("openrouter",     "OpenRouter",               "OpenRouter (Pay-per-use API aggregator)"),
+    ProviderEntry("moa",            "Mixture of Agents",        "Mixture of Agents (named presets; aggregator acts after reference models)"),
    ProviderEntry("novita",         "NovitaAI",                 "NovitaAI (Cloud: Model API, Agent Sandbox, GPU Cloud)"),
    ProviderEntry("lmstudio",       "LM Studio",                "LM Studio (Local desktop app with built-in model server)"),
    ProviderEntry("anthropic",      "Anthropic",                "Anthropic (Claude models via API key or Claude Code)"),
@ -3663,6 +3665,24 @@ def validate_requested_model(
            "message": "Model name cannot be empty.",
        }

+    if normalized == "moa":
+        try:
+            from hermes_cli.config import load_config
+            from hermes_cli.moa_config import normalize_moa_config
+
+            cfg = normalize_moa_config(load_config().get("moa") or {})
+            if requested in cfg["presets"]:
+                return {"accepted": True, "persist": True, "recognized": True, "message": None}
+            return {
+                "accepted": False, "persist": False, "recognized": False,
+                "message": f"MoA preset `{requested}` was not found. Run `hermes moa list`.",
+            }
+        except Exception as exc:
+            return {
+                "accepted": False, "persist": False, "recognized": False,
+                "message": f"Could not read MoA presets: {exc}",
+            }
+
    if any(ch.isspace() for ch in requested):
        return {
            "accepted": False,
--- a/hermes_cli/provider_catalog.py
+++ b/hermes_cli/provider_catalog.py
@ -111,16 +111,27 @@ def provider_catalog() -> list[ProviderDescriptor]:
    except Exception:
        OPTIONAL_ENV_VARS = {}

+    # Hermes overlays carry auth_type for providers that have no registry/profile
+    # entry of their own — notably the ``moa`` virtual provider (auth_type
+    # "virtual"), which has no real credential and no network endpoint.
+    try:
+        from hermes_cli.providers import HERMES_OVERLAYS
+    except Exception:
+        HERMES_OVERLAYS = {}
+
    out: list[ProviderDescriptor] = []
    for order, entry in enumerate(CANONICAL_PROVIDERS):
        slug = entry.slug
        cfg = PROVIDER_REGISTRY.get(slug)
        prof = profiles.get(slug)
+        overlay = HERMES_OVERLAYS.get(slug)

-        # auth_type: registry is authoritative; fall back to profile, then api_key.
+        # auth_type: registry is authoritative; fall back to profile, then the
+        # Hermes overlay (e.g. moa → "virtual"), then api_key.
        auth_type = (
            (getattr(cfg, "auth_type", "") if cfg else "")
            or (getattr(prof, "auth_type", "") if prof else "")
+            or (getattr(overlay, "auth_type", "") if overlay else "")
            or "api_key"
        )

--- a/hermes_cli/providers.py
+++ b/hermes_cli/providers.py
@ -44,6 +44,11 @@ class HermesOverlay:


 HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
+    "moa": HermesOverlay(
+        transport="openai_chat",
+        auth_type="virtual",
+        base_url_override="moa://local",
+    ),
    "openrouter": HermesOverlay(
        transport="openai_chat",
        is_aggregator=True,
@ -355,6 +360,7 @@ ALIASES: Dict[str, str] = {
 # not in the catalog.

 _LABEL_OVERRIDES: Dict[str, str] = {
+    "moa": "Mixture of Agents",
    "nous": "Nous Portal",
    "openai-codex": "OpenAI Codex",
    "copilot-acp": "GitHub Copilot ACP",
--- a/hermes_cli/runtime_provider.py
+++ b/hermes_cli/runtime_provider.py
@ -1400,6 +1400,16 @@ def resolve_runtime_provider(
    """
    requested_provider = resolve_requested_provider(requested)

+    if requested_provider == "moa":
+        return {
+            "provider": "moa",
+            "api_mode": "chat_completions",
+            "base_url": "http://127.0.0.1/v1",
+            "api_key": "moa-virtual-provider",
+            "source": "moa-virtual-provider",
+            "requested_provider": requested_provider,
+        }
+
    # Azure Anthropic short-circuit: when explicitly targeting an Azure endpoint
    # with provider="anthropic", bypass _resolve_named_custom_runtime (which would
    # return provider="custom" with chat_completions api_mode and no valid key).
--- a/hermes_cli/setup.py
+++ b/hermes_cli/setup.py
@ -408,11 +408,6 @@ def _print_setup_summary(config: dict, hermes_home):
    else:
        tool_status.append(("Vision (image analysis)", False, "run 'hermes setup' to configure"))

-    # Mixture of Agents — requires OpenRouter specifically (calls multiple models)
-    if get_env_value("OPENROUTER_API_KEY"):
-        tool_status.append(("Mixture of Agents", True, None))
-    else:
-        tool_status.append(("Mixture of Agents", False, "OPENROUTER_API_KEY"))

    # Web tools (Exa, Parallel, Firecrawl, or Tavily)
    if subscription_features.web.managed_by_nous:
--- a/hermes_cli/tips.py
+++ b/hermes_cli/tips.py
@ -144,7 +144,7 @@ TIPS = [
    "The todo tool helps the agent track complex multi-step tasks during a session.",
    "session_search performs full-text search across ALL past conversations.",
    "The agent automatically saves preferences, corrections, and environment facts to memory.",
-    "mixture_of_agents routes hard problems through 4 frontier LLMs collaboratively.",
+    "/moa routes one hard prompt through your configured Mixture of Agents model set.",
    "Terminal commands support background mode with notify_on_complete for long-running tasks.",
    "Terminal background processes support watch_patterns to alert on specific output lines.",
    "The terminal tool supports 6 backends: local, Docker, SSH, Modal, Daytona, and Singularity.",
--- a/hermes_cli/tools_config.py
+++ b/hermes_cli/tools_config.py
@ -63,7 +63,6 @@ CONFIGURABLE_TOOLSETS = [
    ("image_gen",       "🎨 Image Generation",          "image_generate"),
    ("video_gen",       "🎬 Video Generation",          "video_generate (text-to-video + image-to-video)"),
    ("x_search",        "🐦 X (Twitter) Search",        "x_search (requires xAI OAuth or XAI_API_KEY)"),
-    ("moa",             "🧠 Mixture of Agents",         "mixture_of_agents"),
    ("tts",             "🔊 Text-to-Speech",            "text_to_speech"),
    ("skills",          "📚 Skills",                    "list, view, manage"),
    ("todo",            "📋 Task Planning",             "todo"),
@ -111,7 +110,7 @@ def gui_toolset_label(label: str) -> str:
 # `hermes tools` → X (Twitter) Search setup walks users through credential
 # setup. The tool's check_fn means the schema still won't appear to the
 # model if the credential later goes missing or expires.
-_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "spotify", "discord", "discord_admin", "video", "video_gen", "x_search"}
+_DEFAULT_OFF_TOOLSETS = {"homeassistant", "spotify", "discord", "discord_admin", "video", "video_gen", "x_search"}


 def _xai_credentials_present() -> bool:
@ -567,10 +566,9 @@ TOOL_CATEGORIES = {
 }

 # Simple env-var requirements for toolsets NOT in TOOL_CATEGORIES.
-# Used as a fallback for tools like vision/moa that just need an API key.
+# Used as a fallback for toolsets like vision that just need an API key.
 TOOLSET_ENV_REQUIREMENTS = {
    "vision":     [("OPENROUTER_API_KEY",   "https://openrouter.ai/keys")],
-    "moa":        [("OPENROUTER_API_KEY",   "https://openrouter.ai/keys")],
 }


--- a/hermes_cli/web_server.py
+++ b/hermes_cli/web_server.py
@ -831,6 +831,35 @@ class ModelAssignment(BaseModel):
    profile: Optional[str] = None


+class MoaModelSlot(BaseModel):
+    provider: str = ""
+    model: str = ""
+
+
+class MoaPresetPayload(BaseModel):
+    reference_models: list[MoaModelSlot] = []
+    aggregator: MoaModelSlot = MoaModelSlot()
+    reference_temperature: float = 0.6
+    aggregator_temperature: float = 0.4
+    max_tokens: int = 4096
+    enabled: bool = True
+
+
+class MoaConfigPayload(BaseModel):
+    default_preset: str = "default"
+    active_preset: str = ""
+    presets: dict[str, MoaPresetPayload] = {}
+    # Backward-compatible flat payload fields used by older dashboard/desktop
+    # clients during this PR's transition window.
+    reference_models: list[MoaModelSlot] = []
+    aggregator: MoaModelSlot = MoaModelSlot()
+    reference_temperature: float = 0.6
+    aggregator_temperature: float = 0.4
+    max_tokens: int = 4096
+    enabled: bool = True
+    profile: Optional[str] = None
+
+
 def _normalize_main_model_assignment(provider: str, model: str) -> tuple[str, str]:
    """Normalize a main-slot (provider, model) pair before persisting.

@ -3786,6 +3815,66 @@ def get_auxiliary_models(profile: Optional[str] = None):
        raise HTTPException(status_code=500, detail="Failed to read auxiliary config")


+@app.get("/api/model/moa")
+def get_moa_models(profile: Optional[str] = None):
+    """Return the configured Mixture-of-Agents provider/model slots."""
+    try:
+        from hermes_cli.moa_config import normalize_moa_config
+
+        with _profile_scope(profile):
+            cfg = load_config()
+            return normalize_moa_config(cfg.get("moa") if isinstance(cfg, dict) else {})
+    except HTTPException:
+        raise
+    except Exception:
+        _log.exception("GET /api/model/moa failed")
+        raise HTTPException(status_code=500, detail="Failed to read MoA config")
+
+
+@app.put("/api/model/moa")
+def set_moa_models(body: MoaConfigPayload, profile: Optional[str] = None):
+    """Persist the Mixture-of-Agents provider/model slots."""
+    try:
+        from hermes_cli.moa_config import normalize_moa_config
+
+        with _profile_scope(body.profile or profile):
+            cfg = load_config()
+            if body.presets:
+                raw = {
+                    "default_preset": body.default_preset,
+                    "active_preset": body.active_preset,
+                    "presets": {
+                        name: {
+                            "reference_models": [slot.dict() for slot in preset.reference_models],
+                            "aggregator": preset.aggregator.dict(),
+                            "reference_temperature": preset.reference_temperature,
+                            "aggregator_temperature": preset.aggregator_temperature,
+                            "max_tokens": preset.max_tokens,
+                            "enabled": preset.enabled,
+                        }
+                        for name, preset in body.presets.items()
+                    },
+                }
+            else:
+                raw = {
+                    "reference_models": [slot.dict() for slot in body.reference_models],
+                    "aggregator": body.aggregator.dict(),
+                    "reference_temperature": body.reference_temperature,
+                    "aggregator_temperature": body.aggregator_temperature,
+                    "max_tokens": body.max_tokens,
+                    "enabled": body.enabled,
+                }
+            normalized = normalize_moa_config(raw)
+            cfg["moa"] = normalized
+            save_config(cfg)
+            return {"ok": True, **normalized}
+    except HTTPException:
+        raise
+    except Exception:
+        _log.exception("PUT /api/model/moa failed")
+        raise HTTPException(status_code=500, detail="Failed to save MoA config")
+
+
@app.post("/api/model/set")
 async def set_model_assignment(body: ModelAssignment, profile: Optional[str] = None):
    """Assign a model to the main slot or an auxiliary task slot.
--- a/model_tools.py
+++ b/model_tools.py
@ -225,7 +225,6 @@ _LEGACY_TOOLSET_MAP = {
    "web_tools": ["web_search", "web_extract"],
    "terminal_tools": ["terminal"],
    "vision_tools": ["vision_analyze"],
-    "moa_tools": ["mixture_of_agents"],
    "image_tools": ["image_generate"],
    "skills_tools": ["skills_list", "skill_view", "skill_manage"],
    "browser_tools": [
--- a/run_agent.py
+++ b/run_agent.py
@ -3709,6 +3709,8 @@ class AIAgent:
        from unittest.mock import Mock

        primary_client = self._ensure_primary_openai_client(reason=reason)
+        if self.provider == "moa":
+            return primary_client
        if isinstance(primary_client, Mock):
            return primary_client
        with self._openai_client_lock():
@ -5313,6 +5315,7 @@ class AIAgent:
        stream_callback: Optional[callable] = None,
        persist_user_message: Optional[str] = None,
        persist_user_timestamp: Optional[float] = None,
+        moa_config: Optional[dict[str, Any]] = None,
    ) -> Dict[str, Any]:
        """Forwarder — see ``agent.conversation_loop.run_conversation``."""
        from agent.conversation_loop import run_conversation
@ -5324,7 +5327,8 @@ class AIAgent:
            task_id,
            stream_callback,
            persist_user_message,
-            persist_user_timestamp,
+            persist_user_timestamp=persist_user_timestamp,
+            moa_config=moa_config,
        )

    def chat(self, message: str, stream_callback: Optional[callable] = None) -> str:
--- a/skills/autonomous-ai-agents/hermes-agent/SKILL.md
+++ b/skills/autonomous-ai-agents/hermes-agent/SKILL.md
@ -448,7 +448,6 @@ Enable/disable via `hermes tools` (interactive) or `hermes tools enable/disable
 | `feishu_drive` | Feishu (Lark) drive tools |
 | `yuanbao` | Yuanbao integration tools |
 | `rl` | Reinforcement learning tools (off by default) |
-| `moa` | Mixture of Agents (off by default) |

 Full enumeration lives in `toolsets.py` as the `TOOLSETS` dict; `_HERMES_CORE_TOOLS` is the default bundle most platforms inherit from.

--- a/tests/cli/test_moa_command.py
+++ b/tests/cli/test_moa_command.py
@ -0,0 +1,69 @@
+import queue
+from unittest.mock import patch
+
+from cli import HermesCLI
+from hermes_cli.moa_config import decode_moa_turn
+
+
+def _make_cli():
+    cli = HermesCLI.__new__(HermesCLI)
+    cli.config = {
+        "moa": {
+            "default_preset": "default",
+            "presets": {
+                "default": {
+                    "reference_models": [{"provider": "openai-codex", "model": "gpt-5.5"}],
+                    "aggregator": {"provider": "openrouter", "model": "anthropic/claude-opus-4.8"},
+                },
+                "review": {
+                    "reference_models": [{"provider": "openrouter", "model": "deepseek/deepseek-v4-pro"}],
+                    "aggregator": {"provider": "openrouter", "model": "anthropic/claude-opus-4.8"},
+                },
+            },
+        }
+    }
+    cli._pending_input = queue.Queue()
+    cli._pending_agent_seed = None
+    cli._pending_moa_config = None
+    cli._agent_running = False
+    cli.agent = None
+    return cli
+
+
+def test_moa_bare_switches_to_default_preset_model():
+    cli = _make_cli()
+    with patch("cli._cprint"):
+        assert cli.process_command("/moa") is True
+    assert cli.provider == "moa"
+    assert cli.requested_provider == "moa"
+    assert cli.model == "default"
+    assert cli.agent is None
+
+
+def test_moa_exact_preset_switches_to_named_preset_model():
+    cli = _make_cli()
+    with patch("cli._cprint"):
+        cli.process_command("/moa review")
+    assert cli.provider == "moa"
+    assert cli.model == "review"
+    assert cli.agent is None
+
+
+def test_moa_non_preset_is_one_shot_prompt():
+    cli = _make_cli()
+    with patch("cli._cprint"):
+        cli.process_command("/moa inspect the flaky test")
+    assert cli._pending_agent_seed == "inspect the flaky test"
+    assert cli._pending_moa_disable_after_turn is True
+    assert cli.provider == "moa"
+    assert cli.model == "default"
+    assert cli._pending_moa_restore_model["provider"] != "moa"
+
+
+def test_decode_legacy_encoded_moa_turn_still_works():
+    from hermes_cli.moa_config import build_moa_turn_prompt
+
+    encoded = build_moa_turn_prompt("hello", _make_cli().config["moa"], preset="review")
+    prompt, cfg = decode_moa_turn(encoded)
+    assert prompt == "hello"
+    assert cfg["reference_models"] == [{"provider": "openrouter", "model": "deepseek/deepseek-v4-pro"}]
--- a/tests/hermes_cli/test_inventory.py
+++ b/tests/hermes_cli/test_inventory.py
@ -165,7 +165,9 @@ def test_build_models_payload_returns_expected_shape():
    assert set(payload.keys()) == {"providers", "model", "provider"}
    assert payload["model"] == "m1"
    assert payload["provider"] == "openrouter"
-    assert payload["providers"] == rows
+    assert payload["providers"][0]["slug"] == "moa"
+    assert payload["providers"][0]["models"] == ["default"]
+    assert payload["providers"][1:] == rows


 def test_build_models_payload_does_not_call_provider_model_ids():
@ -586,7 +588,7 @@ def test_aggregator_dedup_no_user_providers_unchanged():
    with _list_auth_returning(rows):
        payload = build_models_payload(ctx)

-    or_row = payload["providers"][0]
+    or_row = next(r for r in payload["providers"] if r["slug"] == "openrouter")
    assert len(or_row["models"]) == 2


--- a/tests/hermes_cli/test_moa_config.py
+++ b/tests/hermes_cli/test_moa_config.py
@ -0,0 +1,97 @@
+from hermes_cli.moa_config import (
+    DEFAULT_MOA_AGGREGATOR,
+    DEFAULT_MOA_PRESET_NAME,
+    DEFAULT_MOA_REFERENCE_MODELS,
+    build_moa_turn_prompt,
+    decode_moa_turn,
+    exact_moa_preset_name,
+    normalize_moa_config,
+    resolve_moa_preset,
+    set_active_moa_preset,
+)
+
+
+def test_normalize_moa_config_uses_default_named_preset():
+    cfg = normalize_moa_config({})
+
+    assert cfg["default_preset"] == DEFAULT_MOA_PRESET_NAME
+    assert list(cfg["presets"]) == [DEFAULT_MOA_PRESET_NAME]
+    assert cfg["reference_models"] == DEFAULT_MOA_REFERENCE_MODELS
+    assert cfg["aggregator"] == DEFAULT_MOA_AGGREGATOR
+
+
+def test_normalize_moa_config_preserves_named_presets():
+    cfg = normalize_moa_config(
+        {
+            "default_preset": "coding",
+            "presets": {
+                "coding": {
+                    "reference_models": [{"provider": "openai-codex", "model": "gpt-5.5"}],
+                    "aggregator": {"provider": "openrouter", "model": "anthropic/claude-opus-4.8"},
+                },
+                "review": {
+                    "reference_models": [{"provider": "openrouter", "model": "deepseek/deepseek-v4-pro"}],
+                    "aggregator": {"provider": "openrouter", "model": "anthropic/claude-opus-4.8"},
+                },
+            },
+        }
+    )
+
+    assert cfg["default_preset"] == "coding"
+    assert set(cfg["presets"]) == {"coding", "review"}
+    assert cfg["reference_models"] == [{"provider": "openai-codex", "model": "gpt-5.5"}]
+
+
+def test_legacy_flat_config_becomes_default_preset():
+    cfg = normalize_moa_config(
+        {
+            "reference_models": [{"provider": "openai-codex", "model": "gpt-5.5"}],
+            "aggregator": {"provider": "openrouter", "model": "anthropic/claude-opus-4.8"},
+        }
+    )
+
+    assert cfg["presets"][DEFAULT_MOA_PRESET_NAME]["reference_models"] == [
+        {"provider": "openai-codex", "model": "gpt-5.5"}
+    ]
+
+
+def test_exact_preset_matching_is_not_fuzzy():
+    config = {"presets": {"coding": {}, "review": {}}}
+
+    assert exact_moa_preset_name(config, "coding") == "coding"
+    assert exact_moa_preset_name(config, "cod") is None
+    assert exact_moa_preset_name(config, "coding please fix this") is None
+
+
+def test_active_preset_toggle_validation():
+    config = {"default_preset": "coding", "presets": {"coding": {}, "review": {}}}
+
+    active = set_active_moa_preset(config, "review")
+    assert active["active_preset"] == "review"
+
+    inactive = set_active_moa_preset(active, "")
+    assert inactive["active_preset"] == ""
+
+
+def test_resolve_moa_preset_returns_requested_model_set():
+    cfg = normalize_moa_config(
+        {
+            "presets": {
+                "coding": {"reference_models": [{"provider": "openai-codex", "model": "gpt-5.5"}]},
+                "review": {"reference_models": [{"provider": "openrouter", "model": "deepseek/deepseek-v4-pro"}]},
+            }
+        }
+    )
+
+    assert resolve_moa_preset(cfg, "review")["reference_models"] == [
+        {"provider": "openrouter", "model": "deepseek/deepseek-v4-pro"}
+    ]
+
+
+def test_build_moa_turn_prompt_encodes_one_shot_default_preset():
+    prompt = build_moa_turn_prompt("write a file then inspect it")
+
+    decoded_prompt, cfg = decode_moa_turn(prompt)
+    assert decoded_prompt == "write a file then inspect it"
+    assert cfg is not None
+    assert cfg["reference_models"] == DEFAULT_MOA_REFERENCE_MODELS
--- a/tests/hermes_cli/test_provider_parity.py
+++ b/tests/hermes_cli/test_provider_parity.py
@ -24,7 +24,14 @@ HEADERS = {"X-Hermes-Session-Token": _SESSION_TOKEN}
 # the model picker's local-endpoint flow, not a fixed credential card. It is in
 # the CLI picker's universe but intentionally has no dedicated Providers-tab
 # card. Exempt it from the union check.
-_EXEMPT = {"custom"}
+#
+# Virtual providers (auth_type "virtual", e.g. `moa`) are likewise in the CLI
+# picker universe but have no real credential and no Providers-tab card — they
+# are configured through their own feature UI (MoA presets). Exempt them too,
+# derived from the catalog so any future virtual provider is covered without a
+# hardcoded slug.
+_VIRTUAL = {d.slug for d in provider_catalog() if d.auth_type == "virtual"}
+_EXEMPT = {"custom"} | _VIRTUAL

 # Providers that legitimately offer BOTH auth methods and so intentionally
 # appear on both desktop tabs (an API-key card AND an account sign-in card).
--- a/tests/hermes_cli/test_web_server.py
+++ b/tests/hermes_cli/test_web_server.py
@ -393,6 +393,36 @@ class TestWebServerEndpoints:
        assert fields["api_key"]["value"] == ""
        assert "secret-value" not in json.dumps(data)

+    def test_get_moa_models_returns_provider_model_slots(self):
+        resp = self.client.get("/api/model/moa")
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["reference_models"]
+        assert all(set(slot) == {"provider", "model"} for slot in data["reference_models"])
+        assert set(data["aggregator"]) == {"provider", "model"}
+
+    def test_put_moa_models_persists_provider_model_slots(self):
+        from hermes_cli.config import load_config
+
+        payload = {
+            "reference_models": [
+                {"provider": "openai-codex", "model": "gpt-5.5"},
+                {"provider": "openrouter", "model": "deepseek/deepseek-v4-pro"},
+            ],
+            "aggregator": {"provider": "openrouter", "model": "anthropic/claude-opus-4.8"},
+            "reference_temperature": 0.6,
+            "aggregator_temperature": 0.4,
+            "max_tokens": 4096,
+            "enabled": True,
+        }
+
+        resp = self.client.put("/api/model/moa", json=payload)
+        assert resp.status_code == 200
+        assert resp.json()["ok"] is True
+        cfg = load_config()
+        assert cfg["moa"]["reference_models"] == payload["reference_models"]
+        assert cfg["moa"]["aggregator"] == payload["aggregator"]
+
    # ── GET /api/media (remote image display) ───────────────────────────

    def test_get_media_serves_image_in_root(self):
--- a/tests/run_agent/test_moa_loop_mode.py
+++ b/tests/run_agent/test_moa_loop_mode.py
@ -0,0 +1,224 @@
+from types import SimpleNamespace
+from unittest.mock import MagicMock
+
+from run_agent import AIAgent
+
+
+def _response(content="done", *, tool_calls=None):
+    message = SimpleNamespace(content=content, tool_calls=tool_calls or [])
+    choice = SimpleNamespace(message=message, finish_reason="stop")
+    return SimpleNamespace(choices=[choice], usage=None, model="fake-model")
+
+
+def test_moa_virtual_provider_aggregator_is_actor(monkeypatch, tmp_path):
+    home = tmp_path / ".hermes"
+    home.mkdir()
+    (home / "config.yaml").write_text(
+        """
+moa:
+  default_preset: review
+  presets:
+    review:
+      reference_models:
+        - provider: openai-codex
+          model: gpt-5.5
+      aggregator:
+        provider: openrouter
+        model: anthropic/claude-opus-4.8
+""".strip(),
+        encoding="utf-8",
+    )
+    monkeypatch.setenv("HERMES_HOME", str(home))
+    calls = []
+
+    def fake_call_llm(**kwargs):
+        calls.append(kwargs)
+        if kwargs["task"] == "moa_reference":
+            return _response("reference advice")
+        return _response("aggregator acted")
+
+    monkeypatch.setattr("agent.moa_loop.call_llm", fake_call_llm)
+
+    agent = AIAgent(
+        api_key="moa-virtual-provider",
+        base_url="moa://local",
+        model="review",
+        provider="moa",
+        quiet_mode=True,
+        skip_context_files=True,
+        skip_memory=True,
+        enabled_toolsets=["file"],
+        max_iterations=1,
+    )
+
+    result = agent.run_conversation("solve this")
+
+    assert result["final_response"] == "aggregator acted"
+    assert [(c["task"], c["provider"], c["model"]) for c in calls] == [
+        ("moa_reference", "openai-codex", "gpt-5.5"),
+        ("moa_aggregator", "openrouter", "anthropic/claude-opus-4.8"),
+    ]
+    assert calls[1]["tools"] is not None
+
+
+def test_reference_messages_strips_system_and_tool_history():
+    from agent.moa_loop import _reference_messages
+
+    messages = [
+        {"role": "system", "content": "huge hermes system prompt"},
+        {"role": "user", "content": "do the thing"},
+        {
+            "role": "assistant",
+            "content": "",
+            "tool_calls": [{"id": "c1", "function": {"name": "f", "arguments": "{}"}}],
+        },
+        {"role": "tool", "tool_call_id": "c1", "content": "tool result"},
+        {"role": "assistant", "content": "here is my answer"},
+    ]
+
+    trimmed = _reference_messages(messages)
+
+    # System prompt, tool-call-only assistant turn, and tool result are gone.
+    assert all(m["role"] in ("user", "assistant") for m in trimmed)
+    assert all("tool_calls" not in m for m in trimmed)
+    assert trimmed == [
+        {"role": "user", "content": "do the thing"},
+        {"role": "assistant", "content": "here is my answer"},
+    ]
+
+
+def test_moa_facade_references_get_trimmed_messages(monkeypatch, tmp_path):
+    home = tmp_path / ".hermes"
+    home.mkdir()
+    (home / "config.yaml").write_text(
+        """
+moa:
+  default_preset: review
+  presets:
+    review:
+      reference_models:
+        - provider: openai-codex
+          model: gpt-5.5
+      aggregator:
+        provider: openrouter
+        model: anthropic/claude-opus-4.8
+""".strip(),
+        encoding="utf-8",
+    )
+    monkeypatch.setenv("HERMES_HOME", str(home))
+    calls = []
+
+    def fake_call_llm(**kwargs):
+        calls.append(kwargs)
+        return _response("ok")
+
+    monkeypatch.setattr("agent.moa_loop.call_llm", fake_call_llm)
+
+    from agent.moa_loop import MoAChatCompletions
+
+    facade = MoAChatCompletions("review")
+    facade.create(
+        messages=[
+            {"role": "system", "content": "system prompt"},
+            {"role": "user", "content": "question"},
+            {"role": "tool", "tool_call_id": "x", "content": "leftover"},
+        ],
+        tools=[{"type": "function"}],
+    )
+
+    ref_call = next(c for c in calls if c["task"] == "moa_reference")
+    # Reference never sees system prompt or tool-role messages.
+    assert all(m["role"] == "user" for m in ref_call["messages"])
+    assert ref_call.get("tools") in (None, [])
+    # Aggregator still receives the original messages + tool schema.
+    agg_call = next(c for c in calls if c["task"] == "moa_aggregator")
+    assert agg_call["tools"] is not None
+
+
+def test_moa_disabled_preset_skips_references(monkeypatch, tmp_path):
+    home = tmp_path / ".hermes"
+    home.mkdir()
+    (home / "config.yaml").write_text(
+        """
+moa:
+  default_preset: review
+  presets:
+    review:
+      enabled: false
+      reference_models:
+        - provider: openai-codex
+          model: gpt-5.5
+      aggregator:
+        provider: openrouter
+        model: anthropic/claude-opus-4.8
+""".strip(),
+        encoding="utf-8",
+    )
+    monkeypatch.setenv("HERMES_HOME", str(home))
+    calls = []
+
+    def fake_call_llm(**kwargs):
+        calls.append(kwargs)
+        return _response("aggregator only")
+
+    monkeypatch.setattr("agent.moa_loop.call_llm", fake_call_llm)
+
+    from agent.moa_loop import MoAChatCompletions
+
+    facade = MoAChatCompletions("review")
+    facade.create(messages=[{"role": "user", "content": "question"}], tools=[{"type": "function"}])
+
+    tasks = [c["task"] for c in calls]
+    # No reference fan-out — only the aggregator runs.
+    assert tasks == ["moa_aggregator"]
+    # Aggregator gets the unmodified user message (no MoA guidance appended).
+    agg_call = calls[0]
+    assert agg_call["messages"][-1]["content"] == "question"
+
+
+def test_references_run_in_parallel(monkeypatch):
+    """References fan out concurrently (delegate-batch semantics), not serially.
+
+    Each reference sleeps; wall-time must approximate the slowest single call,
+    not the sum. Order is preserved and a failing reference is isolated.
+    """
+    import time
+
+    from agent import moa_loop
+
+    # Force _extract_text down its fallback path (no transport normalize).
+    monkeypatch.setattr(moa_loop, "get_transport", lambda *_a, **_k: None)
+
+    barrier_hits = []
+
+    def slow_call_llm(**kwargs):
+        barrier_hits.append(time.monotonic())
+        model = kwargs["model"]
+        if model == "boom":
+            raise RuntimeError("kaboom")
+        time.sleep(0.5)
+        return _response(f"resp-{kwargs['provider']}")
+
+    monkeypatch.setattr(moa_loop, "call_llm", slow_call_llm)
+
+    refs = [
+        {"provider": "p1", "model": "ok"},
+        {"provider": "moa", "model": "preset"},  # recursion guard, not dispatched
+        {"provider": "p2", "model": "boom"},  # failure isolated
+        {"provider": "p3", "model": "ok"},
+    ]
+
+    start = time.monotonic()
+    out = moa_loop._run_references_parallel(
+        refs, [{"role": "user", "content": "hi"}], temperature=0.6, max_tokens=64
+    )
+    elapsed = time.monotonic() - start
+
+    # Two 0.5s sleeps run concurrently → well under the 1.0s serial floor.
+    assert elapsed < 0.9, f"references did not run in parallel (took {elapsed:.2f}s)"
+    # Output order matches input order (stable Reference N labelling).
+    assert [label for label, _ in out] == ["p1:ok", "moa:preset", "p2:boom", "p3:ok"]
+    assert "recursively reference MoA" in out[1][1]
+    assert out[2][1].startswith("[failed:")
+    assert out[0][1] == "resp-p1"
+
--- a/tests/test_model_tools.py
+++ b/tests/test_model_tools.py
@ -375,7 +375,7 @@ class TestPreToolCallBlocking:
 class TestLegacyToolsetMap:
    def test_expected_legacy_names(self):
        expected = [
-            "web_tools", "terminal_tools", "vision_tools", "moa_tools",
+            "web_tools", "terminal_tools", "vision_tools",
            "image_tools", "skills_tools", "browser_tools", "cronjob_tools",
            "file_tools", "tts_tools",
        ]
--- a/tests/tools/test_llm_content_none_guard.py
+++ b/tests/tools/test_llm_content_none_guard.py
@ -36,52 +36,6 @@ def _run(coro):
    return asyncio.get_event_loop().run_until_complete(coro)


-# ── mixture_of_agents_tool — reference model (line 146) ───────────────────
-
-class TestMoAReferenceModelContentNone:
-    """tools/mixture_of_agents_tool.py — _query_model()"""
-
-    def test_none_content_raises_before_fix(self):
-        """Demonstrate that None content from a reasoning model crashes."""
-        response = _make_response(None)
-
-        # Simulate the exact line: response.choices[0].message.content.strip()
-        with pytest.raises(AttributeError):
-            response.choices[0].message.content.strip()
-
-    def test_none_content_safe_with_or_guard(self):
-        """The ``or ""`` guard should convert None to empty string."""
-        response = _make_response(None)
-
-        content = (response.choices[0].message.content or "").strip()
-        assert content == ""
-
-    def test_normal_content_unaffected(self):
-        """Regular string content should pass through unchanged."""
-        response = _make_response("  Hello world  ")
-
-        content = (response.choices[0].message.content or "").strip()
-        assert content == "Hello world"
-
-
-# ── mixture_of_agents_tool — aggregator (line 214) ────────────────────────
-
-class TestMoAAggregatorContentNone:
-    """tools/mixture_of_agents_tool.py — _run_aggregator()"""
-
-    def test_none_content_raises_before_fix(self):
-        response = _make_response(None)
-
-        with pytest.raises(AttributeError):
-            response.choices[0].message.content.strip()
-
-    def test_none_content_safe_with_or_guard(self):
-        response = _make_response(None)
-
-        content = (response.choices[0].message.content or "").strip()
-        assert content == ""
-
-
 # ── web_tools — LLM content processor (line 419) ─────────────────────────

 class TestWebToolsProcessorContentNone:
@ -170,14 +124,6 @@ class TestSourceLinesAreGuarded:
        with open(os.path.join(base, rel_path)) as f:
            return f.read()

-    def test_mixture_of_agents_reference_model_guarded(self):
-        src = self._read_file("tools/mixture_of_agents_tool.py")
-        # The unguarded pattern should NOT exist
-        assert ".message.content.strip()" not in src, (
-            "tools/mixture_of_agents_tool.py still has unguarded "
-            ".content.strip() — apply `(... or \"\").strip()` guard"
-        )
-
    def test_web_tools_guarded(self):
        src = self._read_file("tools/web_tools.py")
        assert ".message.content.strip()" not in src, (
--- a/tests/tools/test_mixture_of_agents_tool.py
+++ b/tests/tools/test_mixture_of_agents_tool.py
@ -1,85 +0,0 @@
-import importlib
-import json
-from types import SimpleNamespace
-from unittest.mock import AsyncMock, MagicMock
-
-import pytest
-
-moa = importlib.import_module("tools.mixture_of_agents_tool")
-
-
-def test_moa_defaults_are_well_formed():
-    # Invariants, not a catalog snapshot: the exact model list churns with
-    # OpenRouter availability (see PR #6636 where gemini-3-pro-preview was
-    # removed upstream). What we care about is that the defaults are present
-    # and valid vendor/model slugs.
-    assert isinstance(moa.REFERENCE_MODELS, list)
-    assert len(moa.REFERENCE_MODELS) >= 1
-    for m in moa.REFERENCE_MODELS:
-        assert isinstance(m, str) and "/" in m and not m.startswith("/")
-    assert isinstance(moa.AGGREGATOR_MODEL, str)
-    assert "/" in moa.AGGREGATOR_MODEL
-
-
-@pytest.mark.asyncio
-async def test_reference_model_retry_warnings_avoid_exc_info_until_terminal_failure(monkeypatch):
-    fake_client = SimpleNamespace(
-        chat=SimpleNamespace(
-            completions=SimpleNamespace(
-                create=AsyncMock(side_effect=RuntimeError("rate limited"))
-            )
-        )
-    )
-    warn = MagicMock()
-    err = MagicMock()
-
-    monkeypatch.setattr(moa, "_get_openrouter_client", lambda: fake_client)
-    monkeypatch.setattr(moa.logger, "warning", warn)
-    monkeypatch.setattr(moa.logger, "error", err)
-
-    model, message, success = await moa._run_reference_model_safe(
-        "openai/gpt-5.4-pro", "hello", max_retries=2
-    )
-
-    assert model == "openai/gpt-5.4-pro"
-    assert success is False
-    assert "failed after 2 attempts" in message
-    assert warn.call_count == 2
-    assert all(call.kwargs.get("exc_info") is None for call in warn.call_args_list)
-    err.assert_called_once()
-    assert err.call_args.kwargs.get("exc_info") is True
-
-
-@pytest.mark.asyncio
-async def test_moa_top_level_error_logs_single_traceback_on_aggregator_failure(monkeypatch):
-    monkeypatch.setenv("OPENROUTER_API_KEY", "test-key")
-    monkeypatch.setattr(
-        moa,
-        "_run_reference_model_safe",
-        AsyncMock(return_value=("anthropic/claude-opus-4.6", "ok", True)),
-    )
-    monkeypatch.setattr(
-        moa,
-        "_run_aggregator_model",
-        AsyncMock(side_effect=RuntimeError("aggregator boom")),
-    )
-    monkeypatch.setattr(
-        moa,
-        "_debug",
-        SimpleNamespace(log_call=MagicMock(), save=MagicMock(), active=False),
-    )
-
-    err = MagicMock()
-    monkeypatch.setattr(moa.logger, "error", err)
-
-    result = json.loads(
-        await moa.mixture_of_agents_tool(
-            "solve this",
-            reference_models=["anthropic/claude-opus-4.6"],
-        )
-    )
-
-    assert result["success"] is False
-    assert "Error in MoA processing" in result["error"]
-    err.assert_called_once()
-    assert err.call_args.kwargs.get("exc_info") is True
--- a/tests/tui_gateway/test_goal_command.py
+++ b/tests/tui_gateway/test_goal_command.py
@ -202,3 +202,77 @@ def test_pending_input_commands_includes_goal(server):
    """Guard: _PENDING_INPUT_COMMANDS must list 'goal' — removing it would
    silently re-break the TUI."""
    assert "goal" in server._PENDING_INPUT_COMMANDS
+
+
+# ── command.dispatch /moa ────────────────────────────────────────────
+
+def _write_moa_config(home, text):
+    cfg_path = home / "config.yaml"
+    cfg_path.write_text(text)
+
+
+def test_moa_bare_switches_to_default_preset_model(server, session, hermes_home):
+    _write_moa_config(hermes_home, """
+moa:
+  default_preset: default
+  presets:
+    default:
+      reference_models:
+        - provider: openai-codex
+          model: gpt-5.5
+      aggregator:
+        provider: openrouter
+        model: anthropic/claude-opus-4.8
+""")
+    sid, _, s = session
+    r = _call(server, "command.dispatch", name="moa", arg="", session_id=sid)
+    assert r["result"]["type"] == "exec"
+    assert "Model switched to MoA preset: default" in r["result"]["output"]
+    assert s["model_override"]["provider"] == "moa"
+    assert s["model_override"]["model"] == "default"
+
+
+def test_moa_exact_preset_switches_to_named_preset_model(server, session, hermes_home):
+    _write_moa_config(hermes_home, """
+moa:
+  default_preset: default
+  presets:
+    default: {}
+    review:
+      reference_models:
+        - provider: openrouter
+          model: deepseek/deepseek-v4-pro
+      aggregator:
+        provider: openrouter
+        model: anthropic/claude-opus-4.8
+""")
+    sid, _, s = session
+    r = _call(server, "command.dispatch", name="moa", arg="review", session_id=sid)
+    assert r["result"]["type"] == "exec"
+    assert s["model_override"]["provider"] == "moa"
+    assert s["model_override"]["model"] == "review"
+
+
+def test_moa_non_preset_returns_one_shot_send(server, session, hermes_home):
+    _write_moa_config(hermes_home, """
+moa:
+  default_preset: default
+  presets:
+    default:
+      reference_models:
+        - provider: openai-codex
+          model: gpt-5.5
+      aggregator:
+        provider: openrouter
+        model: anthropic/claude-opus-4.8
+""")
+    sid, _, _ = session
+    r = _call(server, "command.dispatch", name="moa", arg="inspect this project", session_id=sid)
+    result = r["result"]
+    assert result["type"] == "send"
+    assert result["message"] == "inspect this project"
+    assert "one-shot" in result["notice"]
+
+
+def test_pending_input_commands_includes_moa(server):
+    assert "moa" in server._PENDING_INPUT_COMMANDS
--- a/tools/debug_helpers.py
+++ b/tools/debug_helpers.py
@ -2,7 +2,7 @@

 Replaces the identical DEBUG_MODE / _log_debug_call / _save_debug_log /
 get_debug_session_info boilerplate previously duplicated across web_tools,
-vision_tools, mixture_of_agents_tool, and image_generation_tool.
+vision_tools, and image_generation_tool.

 Usage in a tool module:

--- a/tools/delegate_tool.py
+++ b/tools/delegate_tool.py
@ -120,7 +120,7 @@ def _get_subagent_approval_callback():
 # toolset to request explicitly — the correct mechanism for nested
 # delegation is role='orchestrator', which re-adds "delegation" in
 # _build_child_agent regardless of this exclusion.
-_EXCLUDED_TOOLSET_NAMES = frozenset({"debugging", "safe", "delegation", "moa", "rl"})
+_EXCLUDED_TOOLSET_NAMES = frozenset({"debugging", "safe", "delegation", "rl"})
 _SUBAGENT_TOOLSETS = sorted(
    name
    for name, defn in TOOLSETS.items()
--- a/tools/mixture_of_agents_tool.py
+++ b/tools/mixture_of_agents_tool.py
@ -1,542 +0,0 @@
-#!/usr/bin/env python3
-"""
-Mixture-of-Agents Tool Module
-
-This module implements the Mixture-of-Agents (MoA) methodology that leverages
-the collective strengths of multiple LLMs through a layered architecture to
-achieve state-of-the-art performance on complex reasoning tasks.
-
-Based on the research paper: "Mixture-of-Agents Enhances Large Language Model Capabilities"
-by Junlin Wang et al. (arXiv:2406.04692v1)
-
-Key Features:
- Multi-layer LLM collaboration for enhanced reasoning
- Parallel processing of reference models for efficiency
- Intelligent aggregation and synthesis of diverse responses
- Specialized for extremely difficult problems requiring intense reasoning
- Optimized for coding, mathematics, and complex analytical tasks
-
-Available Tool:
- mixture_of_agents_tool: Process complex queries using multiple frontier models
-
-Architecture:
-1. Reference models generate diverse initial responses in parallel
-2. Aggregator model synthesizes responses into a high-quality output
-3. Multiple layers can be used for iterative refinement (future enhancement)
-
-Models Used (via OpenRouter):
- Reference Models: claude-opus-4.6, gemini-3-pro-preview, gpt-5.4-pro, deepseek-v3.2
- Aggregator Model: claude-opus-4.6 (highest capability for synthesis)
-
-Configuration:
-    To customize the MoA setup, modify the configuration constants at the top of this file:
-    - REFERENCE_MODELS: List of models for generating diverse initial responses
-    - AGGREGATOR_MODEL: Model used to synthesize the final response
-    - REFERENCE_TEMPERATURE/AGGREGATOR_TEMPERATURE: Sampling temperatures
-    - MIN_SUCCESSFUL_REFERENCES: Minimum successful models needed to proceed
-
-Usage:
-    from mixture_of_agents_tool import mixture_of_agents_tool
-    import asyncio
-    
-    # Process a complex query
-    result = await mixture_of_agents_tool(
-        user_prompt="Solve this complex mathematical proof..."
-    )
-"""
-
-import json
-import logging
-import os
-import asyncio
-import datetime
-from typing import Dict, Any, List, Optional
-from tools.openrouter_client import get_async_client as _get_openrouter_client, check_api_key as check_openrouter_api_key
-from agent.auxiliary_client import extract_content_or_reasoning
-from tools.debug_helpers import DebugSession
-import sys
-
-logger = logging.getLogger(__name__)
-
-# Configuration for MoA processing
-# Reference models - these generate diverse initial responses in parallel.
-# Keep this list aligned with current top-tier OpenRouter frontier options.
-REFERENCE_MODELS = [
-    "anthropic/claude-opus-4.6",
-    "google/gemini-2.5-pro",
-    "openai/gpt-5.4-pro",
-    "deepseek/deepseek-v3.2",
-]
-
-# Aggregator model - synthesizes reference responses into final output.
-# Prefer the strongest synthesis model in the current OpenRouter lineup.
-AGGREGATOR_MODEL = "anthropic/claude-opus-4.6"
-
-# Temperature settings optimized for MoA performance
-REFERENCE_TEMPERATURE = 0.6  # Balanced creativity for diverse perspectives
-AGGREGATOR_TEMPERATURE = 0.4  # Focused synthesis for consistency
-
-# Failure handling configuration
-MIN_SUCCESSFUL_REFERENCES = 1  # Minimum successful reference models needed to proceed
-
-# System prompt for the aggregator model (from the research paper)
-AGGREGATOR_SYSTEM_PROMPT = """You have been provided with a set of responses from various open-source models to the latest user query. Your task is to synthesize these responses into a single, high-quality response. It is crucial to critically evaluate the information provided in these responses, recognizing that some of it may be biased or incorrect. Your response should not simply replicate the given answers but should offer a refined, accurate, and comprehensive reply to the instruction. Ensure your response is well-structured, coherent, and adheres to the highest standards of accuracy and reliability.
-
-Responses from models:"""
-
-_debug = DebugSession("moa_tools", env_var="MOA_TOOLS_DEBUG")
-
-
-def _construct_aggregator_prompt(system_prompt: str, responses: List[str]) -> str:
-    """
-    Construct the final system prompt for the aggregator including all model responses.
-    
-    Args:
-        system_prompt (str): Base system prompt for aggregation
-        responses (List[str]): List of responses from reference models
-        
-    Returns:
-        str: Complete system prompt with enumerated responses
-    """
-    response_text = "\n".join([f"{i+1}. {response}" for i, response in enumerate(responses)])
-    return f"{system_prompt}\n\n{response_text}"
-
-
-async def _run_reference_model_safe(
-    model: str,
-    user_prompt: str,
-    temperature: float = REFERENCE_TEMPERATURE,
-    max_tokens: int = 32000,
-    max_retries: int = 6
-) -> tuple[str, str, bool]:
-    """
-    Run a single reference model with retry logic and graceful failure handling.
-    
-    Args:
-        model (str): Model identifier to use
-        user_prompt (str): The user's query
-        temperature (float): Sampling temperature for response generation
-        max_tokens (int): Maximum tokens in response
-        max_retries (int): Maximum number of retry attempts
-        
-    Returns:
-        tuple[str, str, bool]: (model_name, response_content_or_error, success_flag)
-    """
-    for attempt in range(max_retries):
-        try:
-            logger.info("Querying %s (attempt %s/%s)", model, attempt + 1, max_retries)
-            
-            # Build parameters for the API call
-            api_params = {
-                "model": model,
-                "messages": [{"role": "user", "content": user_prompt}],
-                "max_tokens": max_tokens,
-                "extra_body": {
-                    "reasoning": {
-                        "enabled": True,
-                        "effort": "xhigh"
-                    }
-                }
-            }
-            
-            # GPT models (especially gpt-4o-mini) don't support custom temperature values
-            # Only include temperature for non-GPT models
-            if not model.lower().startswith('gpt-'):
-                api_params["temperature"] = temperature
-            
-            response = await _get_openrouter_client().chat.completions.create(**api_params)
-            
-            content = extract_content_or_reasoning(response)
-            if not content:
-                # Reasoning-only response — let the retry loop handle it
-                logger.warning("%s returned empty content (attempt %s/%s), retrying", model, attempt + 1, max_retries)
-                if attempt < max_retries - 1:
-                    await asyncio.sleep(min(2 ** (attempt + 1), 60))
-                    continue
-            logger.info("%s responded (%s characters)", model, len(content))
-            return model, content, True
-            
-        except Exception as e:
-            error_str = str(e)
-            # Keep retry-path logging concise; full tracebacks are reserved for
-            # terminal failure paths so long-running MoA retries don't flood logs.
-            if "invalid" in error_str.lower():
-                logger.warning("%s invalid request error (attempt %s): %s", model, attempt + 1, error_str)
-            elif "rate" in error_str.lower() or "limit" in error_str.lower():
-                logger.warning("%s rate limit error (attempt %s): %s", model, attempt + 1, error_str)
-            else:
-                logger.warning("%s unknown error (attempt %s): %s", model, attempt + 1, error_str)
-
-            if attempt < max_retries - 1:
-                # Exponential backoff for rate limiting: 2s, 4s, 8s, 16s, 32s, 60s
-                sleep_time = min(2 ** (attempt + 1), 60)
-                logger.info("Retrying in %ss...", sleep_time)
-                await asyncio.sleep(sleep_time)
-            else:
-                error_msg = f"{model} failed after {max_retries} attempts: {error_str}"
-                logger.error("%s", error_msg, exc_info=True)
-                return model, error_msg, False
-
-
-async def _run_aggregator_model(
-    system_prompt: str,
-    user_prompt: str,
-    temperature: float = AGGREGATOR_TEMPERATURE,
-    max_tokens: int = None
-) -> str:
-    """
-    Run the aggregator model to synthesize the final response.
-    
-    Args:
-        system_prompt (str): System prompt with all reference responses
-        user_prompt (str): Original user query
-        temperature (float): Focused temperature for consistent aggregation
-        max_tokens (int): Maximum tokens in final response
-        
-    Returns:
-        str: Synthesized final response
-    """
-    logger.info("Running aggregator model: %s", AGGREGATOR_MODEL)
-
-    # Build parameters for the API call
-    api_params = {
-        "model": AGGREGATOR_MODEL,
-        "messages": [
-            {"role": "system", "content": system_prompt},
-            {"role": "user", "content": user_prompt}
-        ],
-        "max_tokens": max_tokens,
-        "extra_body": {
-            "reasoning": {
-                "enabled": True,
-                "effort": "xhigh"
-            }
-        }
-    }
-
-    # GPT models (especially gpt-4o-mini) don't support custom temperature values
-    # Only include temperature for non-GPT models
-    if not AGGREGATOR_MODEL.lower().startswith('gpt-'):
-        api_params["temperature"] = temperature
-
-    response = await _get_openrouter_client().chat.completions.create(**api_params)
-
-    content = extract_content_or_reasoning(response)
-
-    # Retry once on empty content (reasoning-only response)
-    if not content:
-        logger.warning("Aggregator returned empty content, retrying once")
-        response = await _get_openrouter_client().chat.completions.create(**api_params)
-        content = extract_content_or_reasoning(response)
-
-    logger.info("Aggregation complete (%s characters)", len(content))
-    return content
-
-
-async def mixture_of_agents_tool(
-    user_prompt: str,
-    reference_models: Optional[List[str]] = None,
-    aggregator_model: Optional[str] = None
-) -> str:
-    """
-    Process a complex query using the Mixture-of-Agents methodology.
-    
-    This tool leverages multiple frontier language models to collaboratively solve
-    extremely difficult problems requiring intense reasoning. It's particularly
-    effective for:
-    - Complex mathematical proofs and calculations
-    - Advanced coding problems and algorithm design
-    - Multi-step analytical reasoning tasks
-    - Problems requiring diverse domain expertise
-    - Tasks where single models show limitations
-    
-    The MoA approach uses a fixed 2-layer architecture:
-    1. Layer 1: Multiple reference models generate diverse responses in parallel (temp=0.6)
-    2. Layer 2: Aggregator model synthesizes the best elements into final response (temp=0.4)
-    
-    Args:
-        user_prompt (str): The complex query or problem to solve
-        reference_models (Optional[List[str]]): Custom reference models to use
-        aggregator_model (Optional[str]): Custom aggregator model to use
-    
-    Returns:
-        str: JSON string containing the MoA results with the following structure:
-             {
-                 "success": bool,
-                 "response": str,
-                 "models_used": {
-                     "reference_models": List[str],
-                     "aggregator_model": str
-                 },
-                 "processing_time": float
-             }
-    
-    Raises:
-        Exception: If MoA processing fails or API key is not set
-    """
-    start_time = datetime.datetime.now()
-    
-    debug_call_data = {
-        "parameters": {
-            "user_prompt": user_prompt[:200] + "..." if len(user_prompt) > 200 else user_prompt,
-            "reference_models": reference_models or REFERENCE_MODELS,
-            "aggregator_model": aggregator_model or AGGREGATOR_MODEL,
-            "reference_temperature": REFERENCE_TEMPERATURE,
-            "aggregator_temperature": AGGREGATOR_TEMPERATURE,
-            "min_successful_references": MIN_SUCCESSFUL_REFERENCES
-        },
-        "error": None,
-        "success": False,
-        "reference_responses_count": 0,
-        "failed_models_count": 0,
-        "failed_models": [],
-        "final_response_length": 0,
-        "processing_time_seconds": 0,
-        "models_used": {}
-    }
-    
-    try:
-        logger.info("Starting Mixture-of-Agents processing...")
-        logger.info("Query: %s", user_prompt[:100])
-        
-        # Validate API key availability
-        if not os.getenv("OPENROUTER_API_KEY"):
-            raise ValueError("OPENROUTER_API_KEY environment variable not set")
-        
-        # Use provided models or defaults
-        ref_models = reference_models or REFERENCE_MODELS
-        agg_model = aggregator_model or AGGREGATOR_MODEL
-        
-        logger.info("Using %s reference models in 2-layer MoA architecture", len(ref_models))
-        
-        # Layer 1: Generate diverse responses from reference models (with failure handling)
-        logger.info("Layer 1: Generating reference responses...")
-        model_results = await asyncio.gather(*[
-            _run_reference_model_safe(model, user_prompt, REFERENCE_TEMPERATURE)
-            for model in ref_models
-        ])
-        
-        # Separate successful and failed responses
-        successful_responses = []
-        failed_models = []
-        
-        for model_name, content, success in model_results:
-            if success:
-                successful_responses.append(content)
-            else:
-                failed_models.append(model_name)
-        
-        successful_count = len(successful_responses)
-        failed_count = len(failed_models)
-        
-        logger.info("Reference model results: %s successful, %s failed", successful_count, failed_count)
-        
-        if failed_models:
-            logger.warning("Failed models: %s", ', '.join(failed_models))
-        
-        # Check if we have enough successful responses to proceed
-        if successful_count < MIN_SUCCESSFUL_REFERENCES:
-            raise ValueError(f"Insufficient successful reference models ({successful_count}/{len(ref_models)}). Need at least {MIN_SUCCESSFUL_REFERENCES} successful responses.")
-        
-        debug_call_data["reference_responses_count"] = successful_count
-        debug_call_data["failed_models_count"] = failed_count
-        debug_call_data["failed_models"] = failed_models
-        
-        # Layer 2: Aggregate responses using the aggregator model
-        logger.info("Layer 2: Synthesizing final response...")
-        aggregator_system_prompt = _construct_aggregator_prompt(
-            AGGREGATOR_SYSTEM_PROMPT, 
-            successful_responses
-        )
-        
-        final_response = await _run_aggregator_model(
-            aggregator_system_prompt,
-            user_prompt,
-            AGGREGATOR_TEMPERATURE
-        )
-        
-        # Calculate processing time
-        end_time = datetime.datetime.now()
-        processing_time = (end_time - start_time).total_seconds()
-        
-        logger.info("MoA processing completed in %.2f seconds", processing_time)
-        
-        # Prepare successful response (only final aggregated result, minimal fields)
-        result = {
-            "success": True,
-            "response": final_response,
-            "models_used": {
-                "reference_models": ref_models,
-                "aggregator_model": agg_model
-            }
-        }
-        
-        debug_call_data["success"] = True
-        debug_call_data["final_response_length"] = len(final_response)
-        debug_call_data["processing_time_seconds"] = processing_time
-        debug_call_data["models_used"] = result["models_used"]
-        
-        # Log debug information
-        _debug.log_call("mixture_of_agents_tool", debug_call_data)
-        _debug.save()
-        
-        return json.dumps(result, indent=2, ensure_ascii=False)
-        
-    except Exception as e:
-        error_msg = f"Error in MoA processing: {str(e)}"
-        logger.error("%s", error_msg, exc_info=True)
-        
-        # Calculate processing time even for errors
-        end_time = datetime.datetime.now()
-        processing_time = (end_time - start_time).total_seconds()
-        
-        # Prepare error response (minimal fields)
-        result = {
-            "success": False,
-            "response": "MoA processing failed. Please try again or use a single model for this query.",
-            "models_used": {
-                "reference_models": reference_models or REFERENCE_MODELS,
-                "aggregator_model": aggregator_model or AGGREGATOR_MODEL
-            },
-            "error": error_msg
-        }
-        
-        debug_call_data["error"] = error_msg
-        debug_call_data["processing_time_seconds"] = processing_time
-        _debug.log_call("mixture_of_agents_tool", debug_call_data)
-        _debug.save()
-        
-        return json.dumps(result, indent=2, ensure_ascii=False)
-
-
-def check_moa_requirements() -> bool:
-    """
-    Check if all requirements for MoA tools are met.
-    
-    Returns:
-        bool: True if requirements are met, False otherwise
-    """
-    return check_openrouter_api_key()
-
-
-
-def get_moa_configuration() -> Dict[str, Any]:
-    """
-    Get the current MoA configuration settings.
-    
-    Returns:
-        Dict[str, Any]: Dictionary containing all configuration parameters
-    """
-    return {
-        "reference_models": REFERENCE_MODELS,
-        "aggregator_model": AGGREGATOR_MODEL,
-        "reference_temperature": REFERENCE_TEMPERATURE,
-        "aggregator_temperature": AGGREGATOR_TEMPERATURE,
-        "min_successful_references": MIN_SUCCESSFUL_REFERENCES,
-        "total_reference_models": len(REFERENCE_MODELS),
-        "failure_tolerance": f"{len(REFERENCE_MODELS) - MIN_SUCCESSFUL_REFERENCES}/{len(REFERENCE_MODELS)} models can fail"
-    }
-
-
-if __name__ == "__main__":
-    """
-    Simple test/demo when run directly
-    """
-    print("🤖 Mixture-of-Agents Tool Module")
-    print("=" * 50)
-    
-    # Check if API key is available
-    api_available = check_openrouter_api_key()
-    
-    if not api_available:
-        print("❌ OPENROUTER_API_KEY environment variable not set")
-        print("Please set your API key: export OPENROUTER_API_KEY='your-key-here'")
-        print("Get API key at: https://openrouter.ai/")
-        sys.exit(1)
-    else:
-        print("✅ OpenRouter API key found")
-    
-    print("🛠️  MoA tools ready for use!")
-    
-    # Show current configuration
-    config = get_moa_configuration()
-    print("\n⚙️  Current Configuration:")
-    print(f"  🤖 Reference models ({len(config['reference_models'])}): {', '.join(config['reference_models'])}")
-    print(f"  🧠 Aggregator model: {config['aggregator_model']}")
-    print(f"  🌡️  Reference temperature: {config['reference_temperature']}")
-    print(f"  🌡️  Aggregator temperature: {config['aggregator_temperature']}")
-    print(f"  🛡️  Failure tolerance: {config['failure_tolerance']}")
-    print(f"  📊 Minimum successful models: {config['min_successful_references']}")
-    
-    # Show debug mode status
-    if _debug.active:
-        print(f"\n🐛 Debug mode ENABLED - Session ID: {_debug.session_id}")
-        print(f"   Debug logs will be saved to: ./logs/moa_tools_debug_{_debug.session_id}.json")
-    else:
-        print("\n🐛 Debug mode disabled (set MOA_TOOLS_DEBUG=true to enable)")
-    
-    print("\nBasic usage:")
-    print("  from mixture_of_agents_tool import mixture_of_agents_tool")
-    print("  import asyncio")
-    print("")
-    print("  async def main():")
-    print("      result = await mixture_of_agents_tool(")
-    print("          user_prompt='Solve this complex mathematical proof...'")
-    print("      )")
-    print("      print(result)")
-    print("  asyncio.run(main())")
-    
-    print("\nBest use cases:")
-    print("  - Complex mathematical proofs and calculations")
-    print("  - Advanced coding problems and algorithm design")
-    print("  - Multi-step analytical reasoning tasks")
-    print("  - Problems requiring diverse domain expertise")
-    print("  - Tasks where single models show limitations")
-    
-    print("\nPerformance characteristics:")
-    print("  - Higher latency due to multiple model calls")
-    print("  - Significantly improved quality for complex tasks")
-    print("  - Parallel processing for efficiency")
-    print(f"  - Optimized temperatures: {REFERENCE_TEMPERATURE} for reference models, {AGGREGATOR_TEMPERATURE} for aggregation")
-    print("  - Token-efficient: only returns final aggregated response")
-    print("  - Resilient: continues with partial model failures")
-    print("  - Configurable: easy to modify models and settings at top of file")
-    print("  - State-of-the-art results on challenging benchmarks")
-    
-    print("\nDebug mode:")
-    print("  # Enable debug logging")
-    print("  export MOA_TOOLS_DEBUG=true")
-    print("  # Debug logs capture all MoA processing steps and metrics")
-    print("  # Logs saved to: ./logs/moa_tools_debug_UUID.json")
-
-
-# ---------------------------------------------------------------------------
-# Registry
-# ---------------------------------------------------------------------------
-from tools.registry import registry
-
-MOA_SCHEMA = {
-    "name": "mixture_of_agents",
-    "description": "Route a hard problem through multiple frontier LLMs collaboratively. Makes 5 API calls (4 reference models + 1 aggregator) with maximum reasoning effort — use sparingly for genuinely difficult problems. Best for: complex math, advanced algorithms, multi-step analytical reasoning, problems benefiting from diverse perspectives.",
-    "parameters": {
-        "type": "object",
-        "properties": {
-            "user_prompt": {
-                "type": "string",
-                "description": "The complex query or problem to solve using multiple AI models. Should be a challenging problem that benefits from diverse perspectives and collaborative reasoning."
-            }
-        },
-        "required": ["user_prompt"]
-    }
-}
-
-registry.register(
-    name="mixture_of_agents",
-    toolset="moa",
-    schema=MOA_SCHEMA,
-    handler=lambda args, **kw: mixture_of_agents_tool(user_prompt=args.get("user_prompt", "")),
-    check_fn=check_moa_requirements,
-    requires_env=["OPENROUTER_API_KEY"],
-    is_async=True,
-    emoji="🧠",
-)
--- a/toolset_distributions.py
+++ b/toolset_distributions.py
@ -36,7 +36,6 @@ DISTRIBUTIONS = {
            "image_gen": 100,
            "terminal": 100,
            "file": 100,
-            "moa": 100,
            "browser": 100
        }
    },
@ -48,8 +47,7 @@ DISTRIBUTIONS = {
            "image_gen": 90,  # 80% chance of image generation tools
            "vision": 90,      # 60% chance of vision tools
            "web": 55,         # 40% chance of web tools
-            "terminal": 45,
-            "moa": 10          # 20% chance of reasoning tools
+            "terminal": 45
        }
    },
    
@ -60,7 +58,6 @@ DISTRIBUTIONS = {
            "web": 90,       # 90% chance of web tools
            "browser": 70,   # 70% chance of browser tools for deep research
            "vision": 50,    # 50% chance of vision tools
-            "moa": 40,       # 40% chance of reasoning tools
            "terminal": 10   # 10% chance of terminal tools
        }
    },
@ -74,8 +71,7 @@ DISTRIBUTIONS = {
            "file": 94,      # 94% chance of file tools
            "vision": 65,    # 65% chance of vision tools
            "browser": 50,   # 50% chance of browser for accessing papers/databases
-            "image_gen": 15, # 15% chance of image generation tools
-            "moa": 10        # 10% chance of reasoning tools
+            "image_gen": 15  # 15% chance of image generation tools
        }
    },

@ -85,7 +81,6 @@ DISTRIBUTIONS = {
        "toolsets": {
            "terminal": 80,  # 80% chance of terminal tools
            "file": 80,      # 80% chance of file tools (read, write, patch, search)
-            "moa": 60,       # 60% chance of reasoning tools
            "web": 30,       # 30% chance of web tools
            "vision": 10     # 10% chance of vision tools
        }
@ -98,8 +93,7 @@ DISTRIBUTIONS = {
            "web": 80,
            "browser": 70,   # Browser is safe (no local filesystem access)
            "vision": 60,
-            "image_gen": 60,
-            "moa": 50
+            "image_gen": 60
        }
    },
    
@ -112,7 +106,6 @@ DISTRIBUTIONS = {
            "image_gen": 50,
            "terminal": 50,
            "file": 50,
-            "moa": 50,
            "browser": 50
        }
    },
@ -156,14 +149,15 @@ DISTRIBUTIONS = {
    
    # Reasoning heavy
    "reasoning": {
-        "description": "Heavy mixture of agents usage with minimal other tools",
+        "description": "Heavy research/reasoning distribution with minimal other tools",
        "toolsets": {
-            "moa": 90,
-            "web": 30,
+            "web": 90,
+            "file": 60,
            "terminal": 20
        }
    },
-    
+
+
    # Browser-based web interaction
    "browser_use": {
        "description": "Full browser-based web interaction with search, vision, and page control",
--- a/toolsets.py
+++ b/toolsets.py
@ -156,12 +156,6 @@ TOOLSETS = {
        "includes": []
    },
    
-    "moa": {
-        "description": "Advanced reasoning and problem-solving tools",
-        "tools": ["mixture_of_agents"],
-        "includes": []
-    },
-    
    "skills": {
        "description": "Access, create, edit, and manage skill documents with specialized instructions and knowledge",
        "tools": ["skills_list", "skill_view", "skill_manage"],
--- a/tui_gateway/server.py
+++ b/tui_gateway/server.py
@ -8163,6 +8163,12 @@ def _run_prompt_submit(rid, sid: str, session: dict, text: Any) -> None:
            except (TypeError, ValueError):
                pass
            result = agent.run_conversation(run_message, **run_kwargs)
+            if "moa_one_shot_restore" in session:
+                _restore = session.pop("moa_one_shot_restore", None)
+                if _restore is None:
+                    session.pop("model_override", None)
+                else:
+                    session["model_override"] = _restore

            last_reasoning = None
            status_note = None
@ -10223,6 +10229,7 @@ _PENDING_INPUT_COMMANDS: frozenset[str] = frozenset(
        "steer",
        "plan",
        "goal",
+        "moa",
        "undo",
        "learn",
    }
@ -10495,6 +10502,49 @@ def _(rid, params: dict) -> dict:
        from agent.learn_prompt import build_learn_prompt

        return _ok(rid, {"type": "send", "message": build_learn_prompt(arg)})
+    if name == "moa":
+        try:
+            from hermes_cli.moa_config import (
+                build_moa_turn_prompt, exact_moa_preset_name, moa_usage, normalize_moa_config
+            )
+
+            moa_cfg = normalize_moa_config(_load_cfg().get("moa") or {})
+            matched = exact_moa_preset_name(moa_cfg, arg) if arg else moa_cfg["default_preset"]
+            if matched:
+                if not session:
+                    return _err(rid, 4001, "no active session")
+                session["model_override"] = {
+                    "model": matched,
+                    "provider": "moa",
+                    "base_url": "moa://local",
+                    "api_key": "moa-virtual-provider",
+                    "api_mode": "chat_completions",
+                }
+                session["moa_active_preset"] = matched
+                return _ok(rid, {"type": "exec", "output": f"Model switched to MoA preset: {matched}."})
+            if not arg:
+                return _err(rid, 4004, moa_usage())
+            if not session:
+                return _err(rid, 4001, "no active session")
+            preset = moa_cfg["default_preset"]
+            session["moa_one_shot_restore"] = session.get("model_override")
+            session["model_override"] = {
+                "model": preset,
+                "provider": "moa",
+                "base_url": "moa://local",
+                "api_key": "moa-virtual-provider",
+                "api_mode": "chat_completions",
+            }
+            return _ok(
+                rid,
+                {
+                    "type": "send",
+                    "notice": f"MoA one-shot queued with preset {preset}; previous model will be restored after this turn.",
+                    "message": arg,
+                },
+            )
+        except Exception as exc:
+            return _err(rid, 5030, f"moa unavailable: {exc}")

    if name == "retry":
        if not session:
--- a/web/src/lib/api.ts
+++ b/web/src/lib/api.ts
@ -76,6 +76,7 @@ const PROFILE_SCOPED_PREFIXES = [
  "/api/model/info",
  "/api/model/set",
  "/api/model/auxiliary",
+  "/api/model/moa",
  "/api/model/options",
 ];

@ -472,6 +473,13 @@ export const api = {
  getModelInfo: () => fetchJSON<ModelInfoResponse>("/api/model/info"),
  getModelOptions: () => fetchJSON<ModelOptionsResponse>("/api/model/options"),
  getAuxiliaryModels: () => fetchJSON<AuxiliaryModelsResponse>("/api/model/auxiliary"),
+  getMoaModels: () => fetchJSON<MoaConfigResponse>("/api/model/moa"),
+  saveMoaModels: (body: MoaConfigResponse) =>
+    fetchJSON<MoaConfigResponse & { ok: boolean }>("/api/model/moa", {
+      method: "PUT",
+      headers: { "Content-Type": "application/json" },
+      body: JSON.stringify(body),
+    }),
  setModelAssignment: (body: ModelAssignmentRequest) =>
    fetchJSON<ModelAssignmentResponse>("/api/model/set", {
      method: "POST",
@ -2061,6 +2069,30 @@ export interface AuxiliaryModelsResponse {
  main: { provider: string; model: string };
 }

+export interface MoaModelSlot {
+  provider: string;
+  model: string;
+}
+
+export interface MoaConfigResponse {
+  default_preset: string;
+  active_preset: string;
+  presets: Record<string, {
+    reference_models: MoaModelSlot[];
+    aggregator: MoaModelSlot;
+    reference_temperature: number;
+    aggregator_temperature: number;
+    max_tokens: number;
+    enabled: boolean;
+  }>;
+  reference_models: MoaModelSlot[];
+  aggregator: MoaModelSlot;
+  reference_temperature: number;
+  aggregator_temperature: number;
+  max_tokens: number;
+  enabled: boolean;
+}
+
 export interface ModelAssignmentRequest {
  confirm_expensive_model?: boolean;
  scope: "main" | "auxiliary";
--- a/web/src/pages/ModelsPage.tsx
+++ b/web/src/pages/ModelsPage.tsx
@ -16,6 +16,8 @@ import { api } from "@/lib/api";
 import type {
  AuxiliaryModelsResponse,
  AuxiliaryTaskAssignment,
+  MoaConfigResponse,
+  MoaModelSlot,
  ModelsAnalyticsModelEntry,
  ModelsAnalyticsResponse,
 } from "@/lib/api";
@ -534,6 +536,10 @@ type PickerTarget =
  | { kind: "main" }
  | { kind: "aux"; task: string };

+type MoaPickerTarget =
+  | { kind: "reference"; index: number }
+  | { kind: "aggregator" };
+
 function AuxiliaryTasksModal({
  aux,
  refreshKey,
@ -687,6 +693,174 @@ function AuxiliaryTasksModal({
  );
 }

+function MoaModelsModal({
+  config,
+  refreshKey,
+  onClose,
+  onSaved,
+}: {
+  config: MoaConfigResponse;
+  refreshKey: number;
+  onClose(): void;
+  onSaved(next: MoaConfigResponse): void;
+}) {
+  const [draft, setDraft] = useState<MoaConfigResponse>(config);
+  const [selected, setSelected] = useState(config.default_preset || Object.keys(config.presets)[0] || "default");
+  const [newName, setNewName] = useState("");
+  const [picker, setPicker] = useState<MoaPickerTarget | null>(null);
+  const [busy, setBusy] = useState(false);
+  const [error, setError] = useState<string | null>(null);
+
+  const presetNames = Object.keys(draft.presets || {});
+  const preset = draft.presets[selected] || draft.presets[presetNames[0]];
+  const slotLabel = (slot: MoaModelSlot) => `${slot.provider || "(provider)"} · ${slot.model || "(model)"}`;
+
+  const updateSelectedPreset = (updater: (preset: MoaConfigResponse["presets"][string]) => MoaConfigResponse["presets"][string]) => {
+    setDraft((prev) => ({
+      ...prev,
+      presets: {
+        ...prev.presets,
+        [selected]: updater(prev.presets[selected]),
+      },
+    }));
+  };
+
+  const save = async () => {
+    setBusy(true);
+    setError(null);
+    try {
+      const saved = await api.saveMoaModels(draft);
+      onSaved(saved);
+      onClose();
+    } catch (e) {
+      setError(e instanceof Error ? e.message : String(e));
+    } finally {
+      setBusy(false);
+    }
+  };
+
+  const addPreset = () => {
+    const name = newName.trim();
+    if (!name || draft.presets[name]) return;
+    const seed = preset || {
+      reference_models: draft.reference_models,
+      aggregator: draft.aggregator,
+      reference_temperature: draft.reference_temperature,
+      aggregator_temperature: draft.aggregator_temperature,
+      max_tokens: draft.max_tokens,
+      enabled: draft.enabled,
+    };
+    setDraft((prev) => ({
+      ...prev,
+      default_preset: prev.default_preset || name,
+      presets: { ...prev.presets, [name]: { ...seed, reference_models: [...seed.reference_models] } },
+    }));
+    setSelected(name);
+    setNewName("");
+  };
+
+  const deletePreset = () => {
+    if (presetNames.length <= 1) return;
+    const remaining = presetNames.filter((name) => name !== selected);
+    const nextSelected = remaining[0];
+    setDraft((prev) => {
+      const next = { ...prev.presets };
+      delete next[selected];
+      return {
+        ...prev,
+        presets: next,
+        default_preset: prev.default_preset === selected ? nextSelected : prev.default_preset,
+        active_preset: prev.active_preset === selected ? "" : prev.active_preset,
+      };
+    });
+    setSelected(nextSelected);
+  };
+
+  if (!preset) return null;
+
+  return (
+    <div className="fixed inset-0 z-50 flex items-center justify-center bg-background/80 p-4 backdrop-blur-sm">
+      <Card className="max-h-[85vh] w-full max-w-2xl overflow-auto">
+        <CardHeader>
+          <CardTitle className="text-sm">Configure Mixture of Agents presets</CardTitle>
+        </CardHeader>
+        <CardContent className="space-y-4">
+          <p className="text-xs text-text-secondary">
+            Presets appear as models under the Mixture of Agents provider. References produce perspectives; the aggregator is the acting model that answers and calls tools.
+          </p>
+
+          <div className="flex flex-wrap items-center gap-2">
+            <select
+              className="border border-border bg-background px-2 py-1 text-xs"
+              value={selected}
+              onChange={(event) => setSelected(event.target.value)}
+            >
+              {presetNames.map((name) => <option key={name} value={name}>{name}</option>)}
+            </select>
+            <Button size="sm" outlined onClick={() => setDraft((prev) => ({ ...prev, default_preset: selected }))}>Set default</Button>
+            <Button size="sm" ghost disabled={presetNames.length <= 1} onClick={deletePreset}>Delete</Button>
+            <input
+              className="border border-border bg-background px-2 py-1 text-xs"
+              placeholder="new preset name"
+              value={newName}
+              onChange={(event) => setNewName(event.target.value)}
+            />
+            <Button size="sm" outlined disabled={!newName.trim() || !!draft.presets[newName.trim()]} onClick={addPreset}>Add preset</Button>
+          </div>
+
+          <div className="text-xs text-text-secondary">
+            Default: <span className="font-mono">{draft.default_preset}</span>
+          </div>
+
+          <div className="space-y-2">
+            <div className="text-display text-xs font-medium tracking-wider">Reference models</div>
+            {preset.reference_models.map((slot, index) => (
+              <div key={`${selected}-${slot.provider}-${slot.model}-${index}`} className="flex items-center gap-2 border border-border/50 bg-muted/20 px-3 py-2">
+                <div className="min-w-0 flex-1 truncate font-mono text-xs text-text-secondary">{slotLabel(slot)}</div>
+                <Button size="sm" outlined onClick={() => setPicker({ kind: "reference", index })}>Change</Button>
+                <Button size="sm" ghost disabled={preset.reference_models.length <= 1} onClick={() => updateSelectedPreset((prev) => ({ ...prev, reference_models: prev.reference_models.filter((_, i) => i !== index) }))}>Remove</Button>
+              </div>
+            ))}
+            <Button size="sm" outlined onClick={() => updateSelectedPreset((prev) => ({ ...prev, reference_models: [...prev.reference_models, prev.aggregator] }))}>Add reference model</Button>
+          </div>
+
+          <div className="space-y-2">
+            <div className="text-display text-xs font-medium tracking-wider">Aggregator</div>
+            <div className="flex items-center gap-2 border border-border/50 bg-muted/20 px-3 py-2">
+              <div className="min-w-0 flex-1 truncate font-mono text-xs text-text-secondary">{slotLabel(preset.aggregator)}</div>
+              <Button size="sm" outlined onClick={() => setPicker({ kind: "aggregator" })}>Change</Button>
+            </div>
+          </div>
+
+          {error && <div className="text-xs text-destructive">{error}</div>}
+          <div className="flex justify-end gap-2 pt-2">
+            <Button ghost onClick={onClose} disabled={busy}>Cancel</Button>
+            <Button onClick={save} disabled={busy}>{busy ? "Saving…" : "Save"}</Button>
+          </div>
+        </CardContent>
+      </Card>
+      {picker && (
+        <ModelPickerDialog
+          key={`moa-picker-${refreshKey}-${selected}-${picker.kind}-${picker.kind === "reference" ? picker.index : "agg"}`}
+          loader={api.getModelOptions}
+          alwaysGlobal
+          title="Select MoA Model"
+          onApply={async ({ provider, model }) => {
+            updateSelectedPreset((prev) => {
+              if (picker.kind === "aggregator") return { ...prev, aggregator: { provider, model } };
+              return {
+                ...prev,
+                reference_models: prev.reference_models.map((slot, i) => i === picker.index ? { provider, model } : slot),
+              };
+            });
+          }}
+          onClose={() => setPicker(null)}
+        />
+      )}
+    </div>
+  );
+}
+
 function ModelSettingsPanel({
  aux,
  refreshKey,
@ -697,6 +871,8 @@ function ModelSettingsPanel({
  onSaved(): void;
 }) {
  const [auxModalOpen, setAuxModalOpen] = useState(false);
+  const [moaModalOpen, setMoaModalOpen] = useState(false);
+  const [moa, setMoa] = useState<MoaConfigResponse | null>(null);
  const [picker, setPicker] = useState<PickerTarget | null>(null);
  const [pendingReloadModel, setPendingReloadModel] = useState<string | null>(
    null,
@ -705,6 +881,10 @@ function ModelSettingsPanel({
  const mainProv = aux?.main.provider ?? "";
  const mainModel = aux?.main.model ?? "";

+  useEffect(() => {
+    api.getMoaModels().then(setMoa).catch(() => setMoa(null));
+  }, [refreshKey]);
+
  const applyAssignment = async ({
    scope,
    task,
@ -796,6 +976,31 @@ function ModelSettingsPanel({
          </Button>
        </div>

+        <div className="flex min-w-0 flex-col gap-2 bg-muted/20 border border-border/50 px-3 py-2 sm:flex-row sm:items-center sm:justify-between sm:gap-3">
+          <div className="min-w-0 flex-1">
+            <div className="flex items-center gap-2 mb-0.5">
+              <Brain className="h-3 w-3 text-text-tertiary" />
+              <span className="text-display text-xs font-medium tracking-wider">
+                Mixture of Agents
+              </span>
+            </div>
+            <div className="text-xs font-mono text-text-secondary truncate">
+              {moa
+                ? `${moa.reference_models.length} reference${moa.reference_models.length === 1 ? "" : "s"} · ${moa.aggregator.provider}/${shortModelName(moa.aggregator.model)}`
+                : "not loaded"}
+            </div>
+          </div>
+          <Button
+            size="sm"
+            outlined
+            onClick={() => setMoaModalOpen(true)}
+            disabled={!moa}
+            className="shrink-0 self-start text-xs uppercase sm:self-center"
+          >
+            Configure
+          </Button>
+        </div>
+
        {picker && (
          <ModelPickerDialog
            key={`picker-${refreshKey}`}
@ -832,6 +1037,17 @@ function ModelSettingsPanel({
          model={pendingReloadModel}
          onCancel={() => setPendingReloadModel(null)}
        />
+        {moaModalOpen && moa && (
+          <MoaModelsModal
+            config={moa}
+            refreshKey={refreshKey}
+            onSaved={(next) => {
+              setMoa(next);
+              onSaved();
+            }}
+            onClose={() => setMoaModalOpen(false)}
+          />
+        )}
      </CardContent>
    </Card>
  );
--- a/website/docs/reference/cli-commands.md
+++ b/website/docs/reference/cli-commands.md
@ -39,6 +39,7 @@ hermes [global-options] <command> [subcommand/options]
 |---------|---------|
 | `hermes chat` | Interactive or one-shot chat with the agent. |
 | `hermes model` | Interactively choose the default provider and model. |
+| `hermes moa` | Configure named Mixture of Agents presets used by `/moa`. |
 | `hermes fallback` | Manage fallback providers tried when the primary model errors. |
 | `hermes gateway` | Run or manage the messaging gateway service. |
 | `hermes proxy` | Local OpenAI-compatible proxy that attaches OAuth provider credentials. See [Subscription Proxy](../user-guide/features/subscription-proxy.md). |
@ -1119,6 +1120,18 @@ On a fresh install the first scheduled pass is deferred by one full `interval_ho

 See [Curator](../user-guide/features/curator.md) for behavior and config.

+## `hermes moa`
+
+Configure named Mixture of Agents presets used by the `/moa` slash command.
+
+```bash
+hermes moa list
+hermes moa configure [name]
+hermes moa delete <name>
+```
+
+`hermes moa configure` reuses Hermes' provider → model picker for each reference model and the aggregator. A preset is an execution-mode configuration, not a primary model or provider.
+
 ## `hermes fallback`

 ```bash
--- a/website/docs/reference/tools-reference.md
+++ b/website/docs/reference/tools-reference.md
@ -8,7 +8,7 @@ description: "Authoritative reference for Hermes built-in tools, grouped by tool

 This page documents Hermes' built-in tools, grouped by toolset. Availability varies by platform, credentials, and enabled toolsets.

-**Quick counts (current registry):** ~71 tools — 10 browser tools (core) + 2 CDP-gated browser tools, 4 file tools, 4 Home Assistant tools, 2 terminal tools, 2 web tools, 5 Feishu tools, 7 Spotify tools (registered by the bundled `spotify` plugin), 5 Yuanbao tools, 9 kanban tools (registered when the kanban dispatcher spawns the agent), 2 Discord tools, and a handful of standalone tools (`memory`, `clarify`, `delegate_task`, `execute_code`, `cronjob`, `session_search`, `skill_view`/`skill_manage`/`skills_list`, `text_to_speech`, `image_generate`, `video_generate`, `vision_analyze`, `video_analyze`, `mixture_of_agents`, `send_message`, `todo`, `computer_use`, `process`).
+**Quick counts (current registry):** ~71 tools — 10 browser tools (core) + 2 CDP-gated browser tools, 4 file tools, 4 Home Assistant tools, 2 terminal tools, 2 web tools, 5 Feishu tools, 7 Spotify tools (registered by the bundled `spotify` plugin), 5 Yuanbao tools, 9 kanban tools (registered when the kanban dispatcher spawns the agent), 2 Discord tools, and a handful of standalone tools (`memory`, `clarify`, `delegate_task`, `execute_code`, `cronjob`, `session_search`, `skill_view`/`skill_manage`/`skills_list`, `text_to_speech`, `image_generate`, `video_generate`, `vision_analyze`, `video_analyze`, `send_message`, `todo`, `computer_use`, `process`).

 :::tip MCP Tools
 In addition to built-in tools, Hermes can load tools dynamically from MCP servers. MCP tools appear with the prefix `mcp_<server>_` (e.g., `mcp_github_create_issue` for the `github` MCP server). See [MCP Integration](/user-guide/features/mcp) for configuration.
@ -144,12 +144,6 @@ Registered when the agent is either (a) spawned by the kanban dispatcher (`HERME
 |------|-------------|----------------------|
 | `send_message` | Send a message to a connected messaging platform, or list available targets. IMPORTANT: When the user asks to send to a specific channel or person (not just a bare platform name), call send_message(action='list') FIRST to see available tar… | — |

-## `moa` toolset
-
-| Tool | Description | Requires environment |
-|------|-------------|----------------------|
-| `mixture_of_agents` | Route a hard problem through multiple frontier LLMs collaboratively. Makes 5 API calls (4 reference models + 1 aggregator) with maximum reasoning effort — use sparingly for genuinely difficult problems. Best for: complex math, advanced alg… | OPENROUTER_API_KEY |
-
 ## `session_search` toolset

 | Tool | Description | Requires environment |
--- a/website/docs/reference/toolsets-reference.md
+++ b/website/docs/reference/toolsets-reference.md
@ -71,7 +71,6 @@ Or in-session:
 | `kanban` | `kanban_block`, `kanban_comment`, `kanban_complete`, `kanban_create`, `kanban_heartbeat`, `kanban_link`, `kanban_list`, `kanban_show`, `kanban_unblock` | Multi-agent coordination tools. Registered for dispatcher-spawned task workers (`HERMES_KANBAN_TASK`) and for profiles that explicitly list the `kanban` toolset by name (the `all`/`*` wildcard does **not** enable it). Workers mark tasks done, block, heartbeat, comment, and create/link follow-up tasks; orchestrator profiles additionally get board-routing tools like list/unblock. |
 | `memory` | `memory` | Persistent cross-session memory management. |
 | `messaging` | `send_message` | Send messages to other platforms (Telegram, Discord, etc.) from within a session. |
-| `moa` | `mixture_of_agents` | Multi-model consensus via Mixture of Agents. |
 | `safe` | `image_generate`, `vision_analyze`, `web_extract`, `web_search` (via `includes`) | Read-only research + media generation. No file writes, no terminal, no code execution. |
 | `search` | `web_search` | Web search only (without extract). |
 | `session_search` | `session_search` | Search past conversation sessions. |
--- a/website/docs/user-guide/features/cron.md
+++ b/website/docs/user-guide/features/cron.md
@ -552,7 +552,7 @@ cronjob(action="create", name="weekly-news-summary",
        prompt="Summarize this week's AI news: ...")
 ```

-When `enabled_toolsets` is set on a job it wins; otherwise the `hermes tools` cron-platform config wins; otherwise Hermes falls back to the built-in defaults. This matters for cost control: carrying `moa`, `browser`, `delegation` into every tiny "fetch news" job bloats the tool-schema prompt on every LLM call.
+When `enabled_toolsets` is set on a job it wins; otherwise the `hermes tools` cron-platform config wins; otherwise Hermes falls back to the built-in defaults. This matters for cost control: carrying `browser`, `delegation` into every tiny "fetch news" job bloats the tool-schema prompt on every LLM call.

 ### Skipping the agent entirely: `wakeAgent`

--- a/website/docs/user-guide/features/mixture-of-agents.md
+++ b/website/docs/user-guide/features/mixture-of-agents.md
@ -0,0 +1,115 @@
+---
+sidebar_position: 7
+title: "Mixture of Agents"
+description: "Create named MoA presets that appear as selectable models under the Mixture of Agents provider"
+---
+
+# Mixture of Agents
+
+Mixture of Agents is a virtual model provider. Each named MoA preset appears as a selectable model under the `moa` provider.
+
+When you select a MoA preset, the preset's aggregator is the acting model. It is the model that writes the assistant response and emits tool calls. Reference models run first and provide analysis for the aggregator to use.
+
+Use MoA when a hard task benefits from multiple model perspectives but still needs Hermes' normal agent loop: tool calls, follow-up iterations, interrupts, transcript persistence, and the same session context as any other message.
+
+## Select a MoA preset as your model
+
+You can select a preset through the normal model picker surfaces:
+
+```bash
+/model default --provider moa
+/model review --provider moa
+```
+
+The Dashboard, TUI, and Desktop model pickers also show a `Mixture of Agents` provider row. Its models are your configured preset names.
+
+## Slash command shortcut
+
+`/moa` is convenience sugar over model selection:
+
+```bash
+/moa
+```
+
+Switches the current session to the default MoA preset.
+
+```bash
+/moa review
+```
+
+If `review` exactly matches a preset name, switches the current session to provider `moa`, model `review`.
+
+```bash
+/moa design and implement a migration plan for this flaky test cluster
+```
+
+If the text does not exactly match a preset name, Hermes treats it as a one-shot prompt. It temporarily switches to the default MoA preset for that turn, sends the prompt, then restores the previous model afterward.
+
+Preset matching is exact on purpose. Hermes does not fuzzy-match preset names, so normal prompts cannot accidentally become model switches.
+
+## How it works in the agent loop
+
+For each main model call when provider `moa` is selected, Hermes:
+
+1. resolves the selected preset by name;
+2. runs the configured reference models without tool schemas (they receive only the conversation's user/assistant text — not the Hermes system prompt or tool-call transcript — so reference calls stay cheap and avoid strict-provider rejections);
+3. appends the reference outputs as private context for the aggregator;
+4. calls the configured aggregator with the normal Hermes tool schema;
+5. treats the aggregator response as the real model response;
+6. if the aggregator calls tools, Hermes executes those tools normally;
+7. on the next model iteration, the same MoA process runs again over the updated conversation, including tool results.
+
+Because MoA is selected through the normal model system, it composes automatically with `/goal`, gateway sessions, TUI sessions, and Desktop chat.
+
+## Configure presets
+
+You can configure named MoA presets from:
+
+- Dashboard → Models → Model Settings → Mixture of Agents
+- Desktop app → Settings → Model → Mixture of Agents
+- `hermes moa configure [name]`
+- `config.yaml`
+
+The config stores explicit provider/model pairs, so you can mix providers and use multiple models from the same provider:
+
+```yaml
+moa:
+  default_preset: default
+  presets:
+    default:
+      reference_models:
+        - provider: openai-codex
+          model: gpt-5.5
+        - provider: openrouter
+          model: deepseek/deepseek-v4-pro
+      aggregator:
+        provider: openrouter
+        model: anthropic/claude-opus-4.8
+      reference_temperature: 0.6
+      aggregator_temperature: 0.4
+      max_tokens: 4096
+      enabled: true
+```
+
+Default preset:
+
+- reference: `openai-codex:gpt-5.5`
+- reference: `openrouter:deepseek/deepseek-v4-pro`
+- aggregator / acting model: `openrouter:anthropic/claude-opus-4.8`
+
+## Terminal preset management
+
+```bash
+hermes moa list
+hermes moa configure              # update the default preset
+hermes moa configure review       # create or update a named preset
+hermes moa delete review
+```
+
+## Notes
+
+- MoA is no longer listed under `hermes tools`; there is no `moa` toolset to enable.
+- Setting `enabled: false` on a preset disables the reference fan-out for that preset: the aggregator acts alone, exactly as if you selected it as a plain model. This is the per-preset off switch surfaced in the dashboard and desktop settings.
+- A preset's aggregator cannot be another MoA preset. Recursive MoA trees are intentionally blocked.
+- Credential failures on one reference model do not abort the turn. Hermes includes the failure in the reference context and continues with whatever models returned.
+- MoA increases model-call count. A single model iteration can involve multiple reference calls plus the aggregator call.
--- a/website/docs/user-guide/features/tools.md
+++ b/website/docs/user-guide/features/tools.md
@ -49,7 +49,7 @@ hermes tools
 hermes tools
 ```

-Common toolsets include `web`, `search`, `terminal`, `file`, `browser`, `vision`, `image_gen`, `moa`, `skills`, `tts`, `todo`, `memory`, `session_search`, `cronjob`, `code_execution`, `delegation`, `clarify`, `homeassistant`, `messaging`, `spotify`, `discord`, `discord_admin`, `debugging`, and `safe`.
+Common toolsets include `web`, `search`, `terminal`, `file`, `browser`, `vision`, `image_gen`, `skills`, `tts`, `todo`, `memory`, `session_search`, `cronjob`, `code_execution`, `delegation`, `clarify`, `homeassistant`, `messaging`, `spotify`, `discord`, `discord_admin`, `debugging`, and `safe`.

 See [Toolsets Reference](/reference/toolsets-reference) for the full set, including platform presets such as `hermes-cli`, `hermes-telegram`, and dynamic MCP toolsets like `mcp-<server>`.

--- a/website/docs/user-guide/skills/bundled/autonomous-ai-agents/autonomous-ai-agents-hermes-agent.md
+++ b/website/docs/user-guide/skills/bundled/autonomous-ai-agents/autonomous-ai-agents-hermes-agent.md
@ -455,7 +455,6 @@ Enable/disable via `hermes tools` (interactive) or `hermes tools enable/disable
 | `feishu_drive` | Feishu (Lark) drive tools |
 | `yuanbao` | Yuanbao integration tools |
 | `rl` | Reinforcement learning tools (off by default) |
-| `moa` | Mixture of Agents (off by default) |

 Full enumeration lives in `toolsets.py` as the `TOOLSETS` dict; `_HERMES_CORE_TOOLS` is the default bundle most platforms inherit from.

--- a/website/i18n/zh-Hans/docusaurus-plugin-content-docs/current/reference/tools-reference.md
+++ b/website/i18n/zh-Hans/docusaurus-plugin-content-docs/current/reference/tools-reference.md
@ -8,7 +8,7 @@ description: "Hermes 内置工具权威参考，按工具集分组"

 本页记录 Hermes 的内置工具，按工具集分组。可用性因平台、凭据和已启用的工具集而异。

-**当前注册表快速统计：** 约 71 个工具 —— 10 个浏览器工具（核心）+ 2 个 CDP 门控浏览器工具、4 个文件工具、4 个 Home Assistant 工具、2 个终端工具、2 个 Web 工具、5 个 Feishu 工具、7 个 Spotify 工具（由内置 `spotify` 插件注册）、5 个 Yuanbao 工具、9 个 kanban 工具（在 kanban 调度器生成 agent 时注册）、2 个 Discord 工具，以及若干独立工具（`memory`、`clarify`、`delegate_task`、`execute_code`、`cronjob`、`session_search`、`skill_view`/`skill_manage`/`skills_list`、`text_to_speech`、`image_generate`、`video_generate`、`vision_analyze`、`video_analyze`、`mixture_of_agents`、`send_message`、`todo`、`computer_use`、`process`）。
+**当前注册表快速统计：** 约 71 个工具 —— 10 个浏览器工具（核心）+ 2 个 CDP 门控浏览器工具、4 个文件工具、4 个 Home Assistant 工具、2 个终端工具、2 个 Web 工具、5 个 Feishu 工具、7 个 Spotify 工具（由内置 `spotify` 插件注册）、5 个 Yuanbao 工具、9 个 kanban 工具（在 kanban 调度器生成 agent 时注册）、2 个 Discord 工具，以及若干独立工具（`memory`、`clarify`、`delegate_task`、`execute_code`、`cronjob`、`session_search`、`skill_view`/`skill_manage`/`skills_list`、`text_to_speech`、`image_generate`、`video_generate`、`vision_analyze`、`video_analyze`、`send_message`、`todo`、`computer_use`、`process`）。

 :::tip MCP 工具
 除内置工具外，Hermes 还可从 MCP 服务器动态加载工具。MCP 工具以 `mcp_<server>_` 为前缀（例如，`github` MCP 服务器的 `mcp_github_create_issue`）。配置方法见 [MCP 集成](/user-guide/features/mcp)。
@ -143,12 +143,6 @@ description: "Hermes 内置工具权威参考，按工具集分组"
 |------|------|----------|
 | `send_message` | 向已连接的消息平台发送消息，或列出可用目标。重要：当用户要求发送到特定频道或人员（而非仅平台名称）时，请先调用 `send_message(action='list')` 查看可用目标… | — |

-## `moa` 工具集
-
-| 工具 | 描述 | 所需环境 |
-|------|------|----------|
-| `mixture_of_agents` | 将难题路由给多个前沿 LLM 协作处理。进行 5 次 API 调用（4 个参考模型 + 1 个聚合器），以最大推理力度运行——请谨慎用于真正困难的问题。最适合：复杂数学、高级算法… | OPENROUTER_API_KEY |
-
 ## `session_search` 工具集

 | 工具 | 描述 | 所需环境 |
--- a/website/i18n/zh-Hans/docusaurus-plugin-content-docs/current/reference/toolsets-reference.md
+++ b/website/i18n/zh-Hans/docusaurus-plugin-content-docs/current/reference/toolsets-reference.md
@ -70,7 +70,6 @@ hermes tools                            # curses UI to enable/disable per platfo
 | `kanban` | `kanban_block`, `kanban_comment`, `kanban_complete`, `kanban_create`, `kanban_heartbeat`, `kanban_link`, `kanban_list`, `kanban_show`, `kanban_unblock` | 多 agent 协调工具。为调度器生成的任务工作者（`HERMES_KANBAN_TASK`）以及显式启用 `kanban` 工具集的 profile 注册。工作者可标记任务完成、阻塞、心跳、评论以及创建/关联后续任务；编排器 profile 还额外获得看板路由工具，如 list/unblock。 |
 | `memory` | `memory` | 持久化跨会话记忆管理。 |
 | `messaging` | `send_message` | 在会话中向其他平台（Telegram、Discord 等）发送消息。 |
-| `moa` | `mixture_of_agents` | 通过 Mixture of Agents 实现多模型共识。 |
 | `safe` | `image_generate`, `vision_analyze`, `web_extract`, `web_search`（通过 `includes`） | 只读研究 + 媒体生成。无文件写入、无终端、无代码执行。 |
 | `search` | `web_search` | 仅网页搜索（不含提取）。 |
 | `session_search` | `session_search` | 搜索历史会话记录。 |
--- a/website/sidebars.ts
+++ b/website/sidebars.ts
@ -69,6 +69,7 @@ const sidebars: SidebarsConfig = {
            'user-guide/features/honcho',
            'user-guide/features/context-files',
            'user-guide/features/context-references',
+            'user-guide/features/mixture-of-agents',
            'user-guide/features/personality',
            'user-guide/features/skins',
            'user-guide/features/plugins',