feat: add website blocklist enforcement for web/browser tools (#1064)

Adds security.website_blocklist config for user-managed domain blocking
across URL-capable tools. Enforced at the tool level (not monkey-patching)
so it's safe and predictable.

- tools/website_policy.py: shared policy loader with domain normalization,
  wildcard support (*.tracking.example), shared file imports, and
  structured block metadata
- web_extract: pre-fetch URL check + post-redirect recheck
- web_crawl: pre-crawl URL check + per-page URL recheck
- browser_navigate: pre-navigation URL check
- Blocked responses include blocked_by_policy metadata so the agent
  can explain exactly what was denied

Config:
  security:
    website_blocklist:
      enabled: true
      domains: ["evil.com", "*.tracking.example"]
      shared_files: ["team-blocklist.txt"]

Salvaged from PR #1086 by @kshitijk4poor. Browser post-redirect checks
deferred (browser_tool was fully rewritten since the PR branched).

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
This commit is contained in:
teknium1 2026-03-17 02:59:28 -07:00
parent 6020db0243
commit 30c417fe70
5 changed files with 758 additions and 2 deletions

View file

@ -65,6 +65,7 @@ import requests
from typing import Dict, Any, Optional, List
from pathlib import Path
from agent.auxiliary_client import call_llm
from tools.website_policy import check_website_access
from tools.browser_providers.base import CloudBrowserProvider
from tools.browser_providers.browserbase import BrowserbaseProvider
from tools.browser_providers.browser_use import BrowserUseProvider
@ -901,6 +902,19 @@ def browser_navigate(url: str, task_id: Optional[str] = None) -> str:
Returns:
JSON string with navigation result (includes stealth features info on first nav)
"""
# Website policy check — block before navigating
try:
blocked = check_website_access(url)
except Exception as _policy_err:
return json.dumps({"success": False, "error": f"Website policy error: {_policy_err}"})
if blocked:
logger.info("Blocked browser_navigate to %s by rule %s", blocked["host"], blocked["rule"])
return json.dumps({
"success": False,
"error": blocked["message"],
"blocked_by_policy": {"host": blocked["host"], "rule": blocked["rule"], "source": blocked["source"]},
})
effective_task_id = task_id or "default"
# Get session info to check if this is a new session