The analytics dashboard had three accuracy issues:
1. TOTAL TOKENS excluded cache_read and cache_write tokens — only counted
the non-cached input portion. With 90%+ cache hit rates typical in
Hermes, this dramatically undercounted actual token usage (e.g. showing
9.1M when the real total was 169M+).
2. The 'API Calls' card displayed session count (COUNT(*) from sessions
table), not actual LLM API requests. A single session makes 10-90 API
calls through the tool loop, so this was ~30x lower than reality.
3. cache_write_tokens was stored in the DB but never exposed through the
analytics API endpoint or frontend.
Changes:
- Add api_call_count column to sessions table (schema v7 migration)
- Persist api_call_count=1 per LLM API call in run_agent.py
- Analytics SQL queries now include cache_write_tokens and api_call_count
in daily, by_model, and totals aggregations
- Frontend TOTAL TOKENS card now shows input + cache_read + cache_write +
output (the full prompt total + output)
- API CALLS card now uses real api_call_count from DB
- New Cache Hit Rate card shows cache efficiency percentage
- Bar chart, tooltips, daily table, model table all use prompt totals
(input + cache_read + cache_write) instead of just input
- Labels changed from 'Input' to 'Prompt' to reflect the full prompt total
- TypeScript interfaces and i18n strings updated (en + zh)