* refactor: re-architect tests to mirror the codebase
* Update tests.yml
* fix: add missing tool_error imports after registry refactor
* fix(tests): replace patch.dict with monkeypatch to prevent env var leaks under xdist
patch.dict(os.environ) can leak TERMINAL_ENV across xdist workers,
causing test_code_execution tests to hit the Modal remote path.
* fix(tests): fix update_check and telegram xdist failures
- test_update_check: replace patch("hermes_cli.banner.os.getenv") with
monkeypatch.setenv("HERMES_HOME") — banner.py no longer imports os
directly, it uses get_hermes_home() from hermes_constants.
- test_telegram_conflict/approval_buttons: provide real exception classes
for telegram.error mock (NetworkError, TimedOut, BadRequest) so the
except clause in connect() doesn't fail with "catching classes that do
not inherit from BaseException" when xdist pollutes sys.modules.
* fix(tests): accept unavailable_models kwarg in _prompt_model_selection mock
* docs: update llama.cpp section with --jinja flag and tool calling guide
The llama.cpp docs were missing the --jinja flag which is required for
tool calling to work. Without it, models output tool calls as raw JSON
text instead of structured API responses, making Hermes unable to
execute them.
Changes:
- Add --jinja and -fa flags to the server startup example
- Replace deprecated env vars (OPENAI_BASE_URL, LLM_MODEL) with
hermes model interactive setup
- Add caution block explaining the --jinja requirement and symptoms
- List models with native tool calling support
- Add /props endpoint verification tip
* docs+feat: comprehensive local LLM provider guides and context length warning
Docs (providers.md):
- Rewrote Ollama section with context length warning (defaults to 4k on
<24GB VRAM), three methods to increase it, and verification steps
- Rewrote vLLM section with --max-model-len, tool calling flags
(--enable-auto-tool-choice, --tool-call-parser), and context guidance
- Rewrote SGLang section with --context-length, --tool-call-parser,
and warning about 128-token default max output
- Added LM Studio section (port 1234, context length defaults to 2048,
tool calling since 0.3.6)
- Added llama.cpp context length flag (-c) and GPU offload (-ngl)
- Added Troubleshooting Local Models section covering:
- Tool calls appearing as text (with per-server fix table)
- Silent context truncation and diagnosis commands
- Low detected context at startup
- Truncated responses
- Replaced all deprecated env vars (OPENAI_BASE_URL, LLM_MODEL) with
hermes model interactive setup and config.yaml examples
- Added deprecation warning for legacy env vars in General Setup
Code (cli.py):
- Added context length warning in show_banner() when detected context
is <= 8192 tokens, with server-specific fix hints:
- Ollama (port 11434): suggests OLLAMA_CONTEXT_LENGTH env var
- LM Studio (port 1234): suggests model settings adjustment
- Other servers: suggests config.yaml override
Tests:
- 9 new tests covering warning thresholds, server-specific hints,
and no-warning cases