fix: allow self-hosted Firecrawl without API key + add self-hosting docs

On top of PR #460: self-hosted Firecrawl instances don't require an API
key (USE_DB_AUTHENTICATION=false), so don't force users to set a dummy
FIRECRAWL_API_KEY when FIRECRAWL_API_URL is set. Also adds a proper
self-hosting section to the configuration docs explaining what you get,
what you lose, and how to set it up (Docker stack, tradeoffs vs cloud).

Added 2 more tests (URL-only without key, neither-set raises).
This commit is contained in:
teknium1 2026-03-05 16:44:21 -08:00
parent a41ba57a7a
commit 363633e2ba
4 changed files with 78 additions and 19 deletions

View file

@ -56,18 +56,29 @@ logger = logging.getLogger(__name__)
_firecrawl_client = None
def _get_firecrawl_client():
"""Get or create the Firecrawl client (lazy initialization)."""
"""Get or create the Firecrawl client (lazy initialization).
Uses the cloud API by default (requires FIRECRAWL_API_KEY).
Set FIRECRAWL_API_URL to point at a self-hosted instance instead
in that case the API key is optional (set USE_DB_AUTHENTICATION=false
on your Firecrawl server to disable auth entirely).
"""
global _firecrawl_client
if _firecrawl_client is None:
api_key = os.getenv("FIRECRAWL_API_KEY")
if not api_key:
raise ValueError("FIRECRAWL_API_KEY environment variable not set")
api_url = os.getenv("FIRECRAWL_API_URL")
if not api_key and not api_url:
raise ValueError(
"FIRECRAWL_API_KEY environment variable not set. "
"Set it for cloud Firecrawl, or set FIRECRAWL_API_URL "
"to use a self-hosted instance."
)
kwargs = {}
if api_key:
kwargs["api_key"] = api_key
if api_url:
_firecrawl_client = Firecrawl(api_key=api_key, api_url=api_url)
else:
_firecrawl_client = Firecrawl(api_key=api_key)
kwargs["api_url"] = api_url
_firecrawl_client = Firecrawl(**kwargs)
return _firecrawl_client
DEFAULT_MIN_LENGTH_FOR_SUMMARIZATION = 5000