mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-08 03:01:47 +00:00
docs+skill: add searxng-search optional skill and documentation
Closes the remaining gaps from PR #11562 that weren't covered by the core SearXNG integration landed in #20823. - optional-skills/research/searxng-search/ — installable skill with SKILL.md (curl-based usage, category support, Python example) and searxng.sh helper script for health checks and instance queries - website/docs/user-guide/configuration.md — SearXNG added to the Web Search Backends section (5 backends, backend table, per-capability split config example, correct search-only note) - website/docs/reference/environment-variables.md — SEARXNG_URL row - website/docs/reference/optional-skills-catalog.md — searxng-search entry The core SearXNG code, OPTIONAL_ENV_VARS, hermes tools picker, and tests were already on main via #20823. This commit is purely additive docs + the optional skill scaffold. Credits from #11562 salvage: @w4rum — original _searxng_search structure @nathansdev — tools_config.py integration @moyomartin — category support and result formatting @0xMihai — config/env var approach @nicobailon — skill and documentation structure @searxng-fan — error handling patterns @local-first — self-hosted-first philosophy and docs
This commit is contained in:
parent
5c906d7026
commit
94016dd1aa
5 changed files with 246 additions and 4 deletions
211
optional-skills/research/searxng-search/SKILL.md
Normal file
211
optional-skills/research/searxng-search/SKILL.md
Normal file
|
|
@ -0,0 +1,211 @@
|
|||
---
|
||||
name: searxng-search
|
||||
description: Free meta-search via SearXNG — aggregates results from 70+ search engines. Self-hosted or use a public instance. No API key needed. Falls back automatically when the web search toolset is unavailable.
|
||||
version: 1.0.0
|
||||
author: hermes-agent
|
||||
license: MIT
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [search, searxng, meta-search, self-hosted, free, fallback]
|
||||
related_skills: [duckduckgo-search, domain-intel]
|
||||
fallback_for_toolsets: [web]
|
||||
---
|
||||
|
||||
# SearXNG Search
|
||||
|
||||
Free meta-search using [SearXNG](https://searxng.org/) — a privacy-respecting, self-hosted search aggregator that queries 70+ search engines simultaneously.
|
||||
|
||||
**No API key required** when using a public instance. Can also be self-hosted for full control. Automatically appears as a fallback when the main web search toolset (`FIRECRAWL_API_KEY`) is not configured.
|
||||
|
||||
## Configuration
|
||||
|
||||
SearXNG requires a `SEARXNG_URL` environment variable pointing to your SearXNG instance:
|
||||
|
||||
```bash
|
||||
# Public instances (no setup required)
|
||||
SEARXNG_URL=https://searxng.example.com
|
||||
|
||||
# Self-hosted SearXNG
|
||||
SEARXNG_URL=http://localhost:8888
|
||||
```
|
||||
|
||||
If no instance is configured, this skill is unavailable and the agent falls back to other search options.
|
||||
|
||||
## Detection Flow
|
||||
|
||||
Check what is actually available before choosing an approach:
|
||||
|
||||
```bash
|
||||
# Check if SEARXNG_URL is set and the instance is reachable
|
||||
curl -s --max-time 5 "${SEARXNG_URL}/search?q=test&format=json" | head -c 200
|
||||
```
|
||||
|
||||
Decision tree:
|
||||
1. If `SEARXNG_URL` is set and the instance responds, use SearXNG
|
||||
2. If `SEARXNG_URL` is unset or unreachable, fall back to other available search tools
|
||||
3. If the user wants SearXNG specifically, help them set up an instance or find a public one
|
||||
|
||||
## Method 1: CLI via curl (Preferred)
|
||||
|
||||
Use `curl` via `terminal` to call the SearXNG JSON API. This avoids assuming any particular Python package is installed.
|
||||
|
||||
```bash
|
||||
# Text search (JSON output)
|
||||
curl -s --max-time 10 \
|
||||
"${SEARXNG_URL}/search?q=python+async+programming&format=json&engines=google,bing&limit=10"
|
||||
|
||||
# With Safesearch off
|
||||
curl -s --max-time 10 \
|
||||
"${SEARXNG_URL}/search?q=example&format=json&safesearch=0"
|
||||
|
||||
# Specific categories (general, news, science, etc.)
|
||||
curl -s --max-time 10 \
|
||||
"${SEARXNG_URL}/search?q=AI+news&format=json&categories=news"
|
||||
```
|
||||
|
||||
### Common CLI Flags
|
||||
|
||||
| Flag | Description | Example |
|
||||
|------|-------------|---------|
|
||||
| `q` | Query string (URL-encoded) | `q=python+async` |
|
||||
| `format` | Output format: `json`, `csv`, `rss` | `format=json` |
|
||||
| `engines` | Comma-separated engine names | `engines=google,bing,ddg` |
|
||||
| `limit` | Max results per engine (default 10) | `limit=5` |
|
||||
| `categories` | Filter by category | `categories=news,science` |
|
||||
| `safesearch` | 0=none, 1=moderate, 2=strict | `safesearch=0` |
|
||||
| `time_range` | Filter: `day`, `week`, `month`, `year` | `time_range=week` |
|
||||
|
||||
### Parsing JSON Results
|
||||
|
||||
```bash
|
||||
# Extract titles and URLs from JSON
|
||||
curl -s --max-time 10 "${SEARXNG_URL}/search?q=fastapi&format=json&limit=5" \
|
||||
| python3 -c "
|
||||
import json, sys
|
||||
data = json.load(sys.stdin)
|
||||
for r in data.get('results', []):
|
||||
print(r.get('title',''))
|
||||
print(r.get('url',''))
|
||||
print(r.get('content','')[:200])
|
||||
print()
|
||||
"
|
||||
```
|
||||
|
||||
Returns per result: `title`, `url`, `content` (snippet), `engine`, `parsed_url`, `img_src`, `thumbnail`, `author`, `published_date`
|
||||
|
||||
## Method 2: Python API via `requests`
|
||||
|
||||
Use the SearXNG REST API directly from Python with the `requests` library:
|
||||
|
||||
```python
|
||||
import os, requests, urllib.parse
|
||||
|
||||
base_url = os.environ.get("SEARXNG_URL", "")
|
||||
if not base_url:
|
||||
raise RuntimeError("SEARXNG_URL is not set")
|
||||
|
||||
query = "fastapi deployment guide"
|
||||
params = {
|
||||
"q": query,
|
||||
"format": "json",
|
||||
"limit": 5,
|
||||
"engines": "google,bing",
|
||||
}
|
||||
|
||||
resp = requests.get(f"{base_url}/search", params=params, timeout=10)
|
||||
resp.raise_for_status()
|
||||
data = resp.json()
|
||||
|
||||
for r in data.get("results", []):
|
||||
print(r["title"])
|
||||
print(r["url"])
|
||||
print(r.get("content", "")[:200])
|
||||
print()
|
||||
```
|
||||
|
||||
## Method 3: searxng-data Python Package
|
||||
|
||||
For more structured access, install the `searxng-data` package:
|
||||
|
||||
```bash
|
||||
pip install searxng-data
|
||||
```
|
||||
|
||||
```python
|
||||
from searxng_data import engines
|
||||
|
||||
# List available engines
|
||||
print(engines.list_engines())
|
||||
```
|
||||
|
||||
Note: This package only provides engine metadata, not the search API itself.
|
||||
|
||||
## Self-Hosting SearXNG
|
||||
|
||||
To run your own SearXNG instance:
|
||||
|
||||
```bash
|
||||
# Using Docker
|
||||
docker run -d -p 8888:8080 \
|
||||
-v $(pwd)/searxng:/etc/searxng \
|
||||
searxng/searxng:latest
|
||||
|
||||
# Then set
|
||||
SEARXNG_URL=http://localhost:8888
|
||||
```
|
||||
|
||||
Or install via pip:
|
||||
```bash
|
||||
pip install searxng
|
||||
# Edit /etc/searxng/settings.yml
|
||||
searxng-run
|
||||
```
|
||||
|
||||
Public SearXNG instances are available at:
|
||||
- `https://searxng.example.com` (replace with any public instance)
|
||||
|
||||
## Workflow: Search then Extract
|
||||
|
||||
SearXNG returns titles, URLs, and snippets — not full page content. To get full page content, search first and then extract the most relevant URL with `web_extract`, browser tools, or `curl`.
|
||||
|
||||
```bash
|
||||
# Search for relevant pages
|
||||
curl -s "${SEARXNG_URL}/search?q=fastapi+deployment&format=json&limit=3"
|
||||
# Output: list of results with titles and URLs
|
||||
|
||||
# Then extract the best URL with web_extract
|
||||
```
|
||||
|
||||
## Limitations
|
||||
|
||||
- **Instance availability**: If the SearXNG instance is down or unreachable, search fails. Always check `SEARXNG_URL` is set and the instance is reachable.
|
||||
- **No content extraction**: SearXNG returns snippets, not full page content. Use `web_extract`, browser tools, or `curl` for full articles.
|
||||
- **Rate limiting**: Some public instances limit requests. Self-hosting avoids this.
|
||||
- **Engine coverage**: Available engines depend on the SearXNG instance configuration. Some engines may be disabled.
|
||||
- **Results freshness**: Meta-search aggregates external engines — result freshness depends on those engines.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Problem | Likely Cause | What To Do |
|
||||
|---------|--------------|------------|
|
||||
| `SEARXNG_URL` not set | No instance configured | Use a public SearXNG instance or set up your own |
|
||||
| Connection refused | Instance not running or wrong URL | Check the URL is correct and the instance is running |
|
||||
| Empty results | Instance blocks the query | Try a different instance or self-host |
|
||||
| Slow responses | Public instance under load | Self-host or use a less-loaded public instance |
|
||||
| `json` format not supported | Old SearXNG version | Try `format=rss` or upgrade SearXNG |
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- **Always set `SEARXNG_URL`**: Without it, the skill cannot function.
|
||||
- **URL-encode queries**: Spaces and special characters must be URL-encoded in curl, or use `urllib.parse.quote()` in Python.
|
||||
- **Use `format=json`**: The default format may not be machine-readable. Always request JSON explicitly.
|
||||
- **Set a timeout**: Always use `--max-time` or `timeout=` to avoid hanging on unreachable instances.
|
||||
- **Self-hosting is best**: Public instances may go down, rate-limit, or block. A self-hosted instance is reliable.
|
||||
|
||||
## Instance Discovery
|
||||
|
||||
If `SEARXNG_URL` is not set and the user asks about SearXNG, help them either:
|
||||
1. Find a public SearXNG instance (search for "public searxng instance")
|
||||
2. Set up their own with Docker or pip
|
||||
|
||||
Public instances are listed at: https://searxng.org/
|
||||
22
optional-skills/research/searxng-search/scripts/searxng.sh
Executable file
22
optional-skills/research/searxng-search/scripts/searxng.sh
Executable file
|
|
@ -0,0 +1,22 @@
|
|||
#!/bin/bash
|
||||
# Usage: ./searxng.sh <query> [max_results] [engines]
|
||||
# Example: ./searxng.sh "python async" 10 "google,bing"
|
||||
|
||||
QUERY="${1:-}"
|
||||
MAX="${2:-5}"
|
||||
ENGINES="${3:-google,bing}"
|
||||
|
||||
if [ -z "$SEARXNG_URL" ]; then
|
||||
echo "Error: SEARXNG_URL is not set"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ -z "$QUERY" ]; then
|
||||
echo "Usage: $0 <query> [max_results] [engines]"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
ENCODED_QUERY=$(echo "$QUERY" | sed 's/ /+/g')
|
||||
|
||||
curl -s --max-time 10 \
|
||||
"${SEARXNG_URL}/search?q=${ENCODED_QUERY}&format=json&limit=${MAX}&engines=${ENGINES}"
|
||||
Loading…
Add table
Add a link
Reference in a new issue