fix(x_search): surface degraded results + validate dates

The xAI Responses API for x_search returns 200 OK with a synthesized fluff answer in two failure modes that callers currently cannot distinguish from a real, citation-backed result: 1. Any narrowing filter (allowed_x_handles, excluded_x_handles, from_date, to_date) was active, but the X index returned no matching posts. The model then answers from training data. 2. The date range is malformed, inverted, or pure-future (e.g. from_date=2030-01-01). The API call burns quota and Grok responds with a generic answer. Mitigations, both client-side: * Validate from_date / to_date before the HTTP call: - Strict YYYY-MM-DD. - from_date <= to_date when both set. - from_date <= today UTC (no posts in a window that hasn't started). to_date in the future remains allowed so callers can request 'from yesterday to tomorrow'. * Add 'degraded' + 'degraded_reason' to successful responses. degraded=True iff any narrowing filter was active AND both the top-level 'citations' array and inline 'url_citation' annotations came back empty. A broad query with no filters that returns no citations is *not* flagged degraded — that case is just an unsourced answer, not a filter miss. Tests cover all four validation paths plus six degraded-flag scenarios (each filter type, inline vs top-level citation recovery, broad query baseline). All existing tests continue to pass; the additions are purely additive on the success-path response shape. Discovered while testing the x_search toolset end-to-end: queries scoped to @Teknium1 returned confident-sounding generic text about Nous Research with zero citations, and from_date in 2030 produced sassy non-answers. Both are now detectable by the caller.
2026-06-09 08:21:50 +00:00 · 2026-05-21 02:38:45 +05:30 · 2026-05-21 02:38:45 +05:30 · 2a352f96ee
commit 2a352f96ee
parent 31a0100104
3 changed files with 412 additions and 0 deletions
--- a/tools/x_search_tool.py
+++ b/tools/x_search_tool.py
@ -18,6 +18,24 @@ auto-refreshes the OAuth access token when it's within the refresh skew
 window, so a ``True`` from :func:`check_x_search_requirements` means the
 bearer is fetchable AND non-empty.

+Defensive output
+----------------
+The tool surfaces two additional signals beyond xAI's raw response so callers
+can tell a real citation-backed answer from an unsourced one:
+
+* ``from_date`` / ``to_date`` are validated client-side before the HTTP call.
+  Malformed (non ``YYYY-MM-DD``), inverted (``from_date > to_date``), and
+  pure-future ranges (``from_date`` later than today UTC) fail fast with a
+  clear error instead of burning an API call. ``to_date`` in the future is
+  still allowed so callers can legitimately request "from yesterday to
+  tomorrow".
+* Successful responses carry ``degraded`` and ``degraded_reason`` fields.
+  ``degraded`` is ``True`` when any narrowing filter (handles or dates) was
+  active AND xAI returned no citations in either the top-level ``citations``
+  array or the inline ``url_citation`` annotations. In that case the
+  ``answer`` came from the model's own knowledge rather than the X index,
+  and the caller should treat the result as unsourced.
+
 Salvaged from PR #10786 (originally by @Jaaneek); credential resolution
 reworked to honor both auth modes per Teknium's design.
 """
@ -28,6 +46,7 @@ import json
 import logging
 import os
 import time
+from datetime import date, datetime, timezone
 from typing import Any, Dict, List, Optional, Tuple

 import requests
@ -136,6 +155,57 @@ def _normalize_handles(handles: Optional[List[str]], field_name: str) -> List[st
    return cleaned


+def _parse_iso_date(value: str, field_name: str) -> date:
+    """Parse a strict YYYY-MM-DD string into a ``date``.
+
+    xAI accepts any string in the ``from_date``/``to_date`` slots and silently
+    returns an answer with no citations when the value is malformed or refers
+    to a window where no posts can exist. That behavior burns a billable API
+    call and produces a confident-sounding fluff answer that's hard for callers
+    to distinguish from a real result. Validating client-side fails fast and
+    gives the agent a clear error to act on.
+    """
+    raw = value.strip()
+    try:
+        return datetime.strptime(raw, "%Y-%m-%d").date()
+    except ValueError as exc:
+        raise ValueError(
+            f"{field_name} must be YYYY-MM-DD (got {raw!r})"
+        ) from exc
+
+
+def _validate_date_range(from_date: str, to_date: str) -> None:
+    """Validate ``from_date`` / ``to_date`` before they reach xAI.
+
+    Rules:
+      * Either field, if non-empty, must parse as ``YYYY-MM-DD``.
+      * When both are set, ``from_date <= to_date``.
+      * ``from_date`` must not be later than today UTC — no posts can exist
+        in a window that hasn't started yet, so the call would be guaranteed
+        to return zero citations. ``to_date`` in the future is allowed
+        (callers may legitimately set "from yesterday to tomorrow").
+    """
+    parsed_from: Optional[date] = None
+    parsed_to: Optional[date] = None
+    if from_date.strip():
+        parsed_from = _parse_iso_date(from_date, "from_date")
+    if to_date.strip():
+        parsed_to = _parse_iso_date(to_date, "to_date")
+    if parsed_from and parsed_to and parsed_from > parsed_to:
+        raise ValueError(
+            f"from_date ({parsed_from.isoformat()}) must be on or before "
+            f"to_date ({parsed_to.isoformat()})"
+        )
+    if parsed_from is not None:
+        today_utc = datetime.now(timezone.utc).date()
+        if parsed_from > today_utc:
+            raise ValueError(
+                f"from_date ({parsed_from.isoformat()}) is in the future; "
+                f"X Search only indexes past posts (today UTC is "
+                f"{today_utc.isoformat()})"
+            )
+
+
 def _extract_response_text(payload: Dict[str, Any]) -> str:
    output_text = str(payload.get("output_text") or "").strip()
    if output_text:
@ -225,6 +295,11 @@ def x_search_tool(
        if allowed and excluded:
            return tool_error("allowed_x_handles and excluded_x_handles cannot be used together")

+        try:
+            _validate_date_range(from_date, to_date)
+        except ValueError as exc:
+            return tool_error(str(exc))
+
        tool_def: Dict[str, Any] = {"type": "x_search"}
        if allowed:
            tool_def["allowed_x_handles"] = allowed
@ -299,6 +374,31 @@ def x_search_tool(
        citations = list(data.get("citations") or [])
        inline_citations = _extract_inline_citations(data)

+        # Degraded-result detection.
+        #
+        # xAI returns 200 OK with a synthesized answer even when its X index
+        # has no posts matching the caller's narrowing filters. The answer
+        # then comes from the model's training data, which is misleading
+        # because it looks identical to a real, citation-backed result. When
+        # any narrowing filter is active AND both citation channels came back
+        # empty, mark the response as degraded so callers can decide to
+        # broaden filters, retry, or fall back to a different source.
+        active_filters: List[str] = []
+        if allowed:
+            active_filters.append("allowed_x_handles")
+        if excluded:
+            active_filters.append("excluded_x_handles")
+        if from_date.strip():
+            active_filters.append("from_date")
+        if to_date.strip():
+            active_filters.append("to_date")
+        degraded = bool(active_filters) and not citations and not inline_citations
+        degraded_reason = (
+            f"no citations returned despite filters: {', '.join(active_filters)}"
+            if degraded
+            else None
+        )
+
        return json.dumps(
            {
                "success": True,
@ -310,6 +410,8 @@ def x_search_tool(
                "answer": answer,
                "citations": citations,
                "inline_citations": inline_citations,
+                "degraded": degraded,
+                "degraded_reason": degraded_reason,
            },
            ensure_ascii=False,
        )