mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-13 09:01:54 +00:00
* fix: bypass FTS5 for CJK queries in session_search FTS5 default tokenizer splits CJK characters into individual tokens, so multi-character queries like "大别山项目" become AND of single chars. This produces few/no results compared to LIKE substring search. For CJK queries, skip FTS5 entirely and use LIKE for accurate phrase matching. Fixes NousResearch/hermes-agent#15500 * fix: cache _contains_cjk, escape LIKE wildcards, add regression tests On top of the CJK FTS5 bypass from #15509: - Cache _contains_cjk() result in a local var to avoid redundant O(n) scans on every CJK query - Escape %, _ in LIKE queries so literal wildcards in user input are not treated as SQL wildcards (consistent with other LIKE queries in hermes_state.py that use ESCAPE '\') - Fix misleading comment ('or CJK fallback' → accurate description) - Add 3 regression tests: - test_cjk_partial_fts5_results_supplemented_by_like (#15500 / #14829) - test_cjk_like_dedup_no_duplicates - test_cjk_like_escapes_wildcards (new wildcard escaping) * feat: trigram FTS5 index for CJK search, replace LIKE fallback Replace the LIKE '%query%' full-table-scan fallback for CJK queries with a proper trigram FTS5 index (messages_fts_trigram). The trigram tokenizer creates overlapping 3-byte sequences so substring matching works natively for any script — CJK, Thai, etc. For queries with 3+ CJK characters: uses the trigram FTS5 table with proper ranking, snippets, and indexed lookups. For shorter queries (1-2 CJK chars): falls back to LIKE since the trigram tokenizer needs ≥9 UTF-8 bytes (3 CJK chars) minimum. Schema v10 migration creates the trigram table and backfills existing messages. Triggers keep the index in sync on INSERT/UPDATE/DELETE. Builds on top of #16276 (bypass FTS5 for CJK, escape LIKE wildcards). --------- Co-authored-by: vominh1919 <vominh1919@gmail.com> |
||
|---|---|---|
| .. | ||
| acp | ||
| agent | ||
| cli | ||
| cron | ||
| e2e | ||
| environments/benchmarks | ||
| fakes | ||
| gateway | ||
| hermes_cli | ||
| hermes_state | ||
| honcho_plugin | ||
| integration | ||
| plugins | ||
| run_agent | ||
| skills | ||
| tools | ||
| tui_gateway | ||
| website | ||
| __init__.py | ||
| conftest.py | ||
| run_interrupt_test.py | ||
| test_account_usage.py | ||
| test_base_url_hostname.py | ||
| test_batch_runner_checkpoint.py | ||
| test_cli_file_drop.py | ||
| test_cli_skin_integration.py | ||
| test_ctx_halving_fix.py | ||
| test_empty_model_fallback.py | ||
| test_evidence_store.py | ||
| test_hermes_constants.py | ||
| test_hermes_logging.py | ||
| test_hermes_state.py | ||
| test_honcho_client_config.py | ||
| test_ipv4_preference.py | ||
| test_mcp_serve.py | ||
| test_mini_swe_runner.py | ||
| test_minimax_model_validation.py | ||
| test_minisweagent_path.py | ||
| test_model_picker_scroll.py | ||
| test_model_tools.py | ||
| test_model_tools_async_bridge.py | ||
| test_ollama_num_ctx.py | ||
| test_packaging_metadata.py | ||
| test_plugin_skills.py | ||
| test_project_metadata.py | ||
| test_retry_utils.py | ||
| test_sql_injection.py | ||
| test_subprocess_home_isolation.py | ||
| test_timezone.py | ||
| test_toolset_distributions.py | ||
| test_toolsets.py | ||
| test_trajectory_compressor.py | ||
| test_trajectory_compressor_async.py | ||
| test_transform_tool_result_hook.py | ||
| test_tui_gateway_server.py | ||
| test_utils_truthy_values.py | ||
| test_yuanbao_integration.py | ||
| test_yuanbao_markdown.py | ||
| test_yuanbao_pipeline.py | ||
| test_yuanbao_proto.py | ||