hermes-agent/website/docs/user-guide/skills/optional/dogfood/dogfood-adversarial-ux-test.md
Teknium 252d68fd45
docs: deep audit — fix stale config keys, missing commands, and registry drift (#22784)
* docs: deep audit — fix stale config keys, missing commands, and registry drift

Cross-checked ~80 high-impact docs pages (getting-started, reference, top-level
user-guide, user-guide/features) against the live registries:

  hermes_cli/commands.py    COMMAND_REGISTRY (slash commands)
  hermes_cli/auth.py        PROVIDER_REGISTRY (providers)
  hermes_cli/config.py      DEFAULT_CONFIG (config keys)
  toolsets.py               TOOLSETS (toolsets)
  tools/registry.py         get_all_tool_names() (tools)
  python -m hermes_cli.main <subcmd> --help (CLI args)

reference/
- cli-commands.md: drop duplicate hermes fallback row + duplicate section,
  add stepfun/lmstudio to --provider enum, expand auth/mcp/curator subcommand
  lists to match --help output (status/logout/spotify, login, archive/prune/
  list-archived).
- slash-commands.md: add missing /sessions and /reload-skills entries +
  correct the cross-platform Notes line.
- tools-reference.md: drop bogus '68 tools' headline, drop fictional
  'browser-cdp toolset' (these tools live in 'browser' and are runtime-gated),
  add missing 'kanban' and 'video' toolset sections, fix MCP example to use
  the real mcp_<server>_<tool> prefix.
- toolsets-reference.md: list browser_cdp/browser_dialog inside the 'browser'
  row, add missing 'kanban' and 'video' toolset rows, drop the stale
  '38 tools' count for hermes-cli.
- profile-commands.md: add missing install/update/info subcommands, document
  fish completion.
- environment-variables.md: dedupe GMI_API_KEY/GMI_BASE_URL rows (kept the
  one with the correct gmi-serving.com default).
- faq.md: Anthropic/Google/OpenAI examples — direct providers exist (not just
  via OpenRouter), refresh the OpenAI model list.

getting-started/
- installation.md: PortableGit (not MinGit) is what the Windows installer
  fetches; document the 32-bit MinGit fallback.
- installation.md / termux.md: installer prefers .[termux-all] then falls
  back to .[termux].
- nix-setup.md: Python 3.12 (not 3.11), Node.js 22 (not 20); fix invalid
  'nix flake update --flake' invocation.
- updating.md: 'hermes backup restore --state pre-update' doesn't exist —
  point at the snapshot/quick-snapshot flow; correct config key
  'updates.pre_update_backup' (was 'update.backup').

user-guide/
- configuration.md: api_max_retries default 3 (not 2); display.runtime_footer
  is the real key (not display.runtime_metadata_footer); checkpoints defaults
  enabled=false / max_snapshots=20 (not true / 50).
- configuring-models.md: 'hermes model list' / 'hermes model set ...' don't
  exist — hermes model is interactive only.
- tui.md: busy_indicator -> tui_status_indicator with values
  kaomoji|emoji|unicode|ascii (not kawaii|minimal|dots|wings|none).
- security.md: SSH backend keys (TERMINAL_SSH_HOST/USER/KEY) live in .env,
  not config.yaml.
- windows-wsl-quickstart.md: there is no 'hermes api' subcommand — the
  OpenAI-compatible API server runs inside hermes gateway.

user-guide/features/
- computer-use.md: approvals.mode (not security.approval_level); fix broken
  ./browser-use.md link to ./browser.md.
- fallback-providers.md: top-level fallback_providers (not
  model.fallback_providers); the picker is subcommand-based, not modal.
- api-server.md: API_SERVER_* are env vars — write to per-profile .env,
  not 'hermes config set' which targets YAML.
- web-search.md: drop web_crawl as a registered tool (it isn't); deep-crawl
  modes are exposed through web_extract.
- kanban.md: failure_limit default is 2, not '~5'.
- plugins.md: drop hard-coded '33 providers' count.
- honcho.md: fix unclosed quote in echo HONCHO_API_KEY snippet; document
  that 'hermes honcho' subcommand is gated on memory.provider=honcho;
  reconcile subcommand list with actual --help output.
- memory-providers.md: legacy 'hermes honcho setup' redirect documented.

Verified via 'npm run build' — site builds cleanly; broken-link count went
from 149 to 146 (no regressions, fixed a few in passing).

* docs: round 2 audit fixes + regenerate skill catalogs

Follow-up to the previous commit on this branch:

Round 2 manual fixes:
- quickstart.md: KIMI_CODING_API_KEY mentioned alongside KIMI_API_KEY;
  voice-mode and ACP install commands rewritten — bare 'pip install ...'
  doesn't work for curl-installed setups (no pip on PATH, not in repo
  dir); replaced with 'cd ~/.hermes/hermes-agent && uv pip install -e
  ".[voice]"'. ACP already ships in [all] so the curl install includes it.
- cli.md / configuration.md: 'auxiliary.compression.model' shown as
  'google/gemini-3-flash-preview' (the doc's own claimed default);
  actual default is empty (= use main model). Reworded as 'leave empty
  (default) or pin a cheap model'.
- built-in-plugins.md: added the bundled 'kanban/dashboard' plugin row
  that was missing from the table.

Regenerated skill catalogs:
- ran website/scripts/generate-skill-docs.py to refresh all 163 per-skill
  pages and both reference catalogs (skills-catalog.md,
  optional-skills-catalog.md). This adds the entries that were genuinely
  missing — productivity/teams-meeting-pipeline (bundled),
  optional/finance/* (entire category — 7 skills:
  3-statement-model, comps-analysis, dcf-model, excel-author, lbo-model,
  merger-model, pptx-author), creative/hyperframes,
  creative/kanban-video-orchestrator, devops/watchers,
  productivity/shop-app, research/searxng-search,
  apple/macos-computer-use — and rewrites every other per-skill page from
  the current SKILL.md. Most diffs are tiny (one line of refreshed
  metadata).

Validation:
- 'npm run build' succeeded.
- Broken-link count moved 146 -> 155 — the +9 are zh-Hans translation
  shells that lag every newly-added skill page (pre-existing pattern).
  No regressions on any en/ page.
2026-05-09 13:19:51 -07:00

9.8 KiB

title sidebar_label description
Adversarial Ux Test — Roleplay the most difficult, tech-resistant user for your product Adversarial Ux Test Roleplay the most difficult, tech-resistant user for your product

{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}

Adversarial Ux Test

Roleplay the most difficult, tech-resistant user for your product. Browse the app as that persona, find every UX pain point, then filter complaints through a pragmatism layer to separate real problems from noise. Creates actionable tickets from genuine issues only.

Skill metadata

Source Optional — install with hermes skills install official/dogfood/adversarial-ux-test
Path optional-skills/dogfood/adversarial-ux-test
Version 1.0.0
Author Omni @ Comelse
License MIT
Platforms linux, macos, windows
Tags qa, ux, testing, adversarial, dogfood, personas, user-testing
Related skills dogfood

Reference: full SKILL.md

:::info The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active. :::

Adversarial UX Test

Roleplay the worst-case user for your product — the person who hates technology, doesn't want your software, and will find every reason to complain. Then filter their feedback through a pragmatism layer to separate real UX problems from "I hate computers" noise.

Think of it as an automated "mom test" — but angry.

Why This Works

Most QA finds bugs. This finds friction. A technically correct app can still be unusable for real humans. The adversarial persona catches:

  • Confusing terminology that makes sense to developers but not users
  • Too many steps to accomplish basic tasks
  • Missing onboarding or "aha moments"
  • Accessibility issues (font size, contrast, click targets)
  • Cold-start problems (empty states, no demo content)
  • Paywall/signup friction that kills conversion

The pragmatism filter (Phase 3) is what makes this useful instead of just entertaining. Without it, you'd add a "print this page" button to every screen because Grandpa can't figure out PDFs.

How to Use

Tell the agent:

"Run an adversarial UX test on [URL]"
"Be a grumpy [persona type] and test [app name]"
"Do an asshole user test on my staging site"

You can provide a persona or let the agent generate one based on your product's target audience.

Step 1: Define the Persona

If no persona is provided, generate one by answering:

  1. Who is the HARDEST user for this product? (age 50+, non-technical role, decades of experience doing it "the old way")
  2. What is their tech comfort level? (the lower the better — WhatsApp-only, paper notebooks, wife set up their email)
  3. What is the ONE thing they need to accomplish? (their core job, not your feature list)
  4. What would make them give up? (too many clicks, jargon, slow, confusing)
  5. How do they talk when frustrated? (blunt, sweary, dismissive, sighing)

Good Persona Example

"Big Mick" McAllister — 58-year-old S&C coach. Uses WhatsApp and that's it. His "spreadsheet" is a paper notebook. "If I can't figure it out in 10 seconds I'm going back to my notebook." Needs to log session results for 25 players. Hates small text, jargon, and passwords.

Bad Persona Example

"A user who doesn't like the app" — too vague, no constraints, no voice.

The persona must be specific enough to stay in character for 20 minutes of testing.

Step 2: Become the Asshole (Browse as the Persona)

  1. Read any available project docs for app context and URLs

  2. Fully inhabit the persona — their frustrations, limitations, goals

  3. Navigate to the app using browser tools

  4. Attempt the persona's ACTUAL TASKS (not a feature tour):

    • Can they do what they came to do?
    • How many clicks/screens to accomplish it?
    • What confuses them?
    • What makes them angry?
    • Where do they get lost?
    • What would make them give up and go back to their old way?
  5. Test these friction categories:

    • First impression — would they even bother past the landing page?
    • Core workflow — the ONE thing they need to do most often
    • Error recovery — what happens when they do something wrong?
    • Readability — text size, contrast, information density
    • Speed — does it feel faster than their current method?
    • Terminology — any jargon they wouldn't understand?
    • Navigation — can they find their way back? do they know where they are?
  6. Take screenshots of every pain point

  7. Check browser console for JS errors on every page

Step 3: The Rant (Write Feedback in Character)

Write the feedback AS THE PERSONA — in their voice, with their frustrations. This is not a bug report. This is a real human venting.

[PERSONA NAME]'s Review of [PRODUCT]

Overall: [Would they keep using it? Yes/No/Maybe with conditions]

THE GOOD (grudging admission):
- [things even they have to admit work]

THE BAD (legitimate UX issues):
- [real problems that would stop them from using the product]

THE UGLY (showstoppers):
- [things that would make them uninstall/cancel immediately]

SPECIFIC COMPLAINTS:
1. [Page/feature]: "[quote in persona voice]" — [what happened, expected]
2. ...

VERDICT: "[one-line persona quote summarizing their experience]"

Step 4: The Pragmatism Filter (Critical — Do Not Skip)

Step OUT of the persona. Evaluate each complaint as a product person:

  • RED: REAL UX BUG — Any user would have this problem, not just grumpy ones. Fix it.
  • YELLOW: VALID BUT LOW PRIORITY — Real issue but only for extreme users. Note it.
  • WHITE: PERSONA NOISE — "I hate computers" talking, not a product problem. Skip it.
  • GREEN: FEATURE REQUEST — Good idea hidden in the complaint. Consider it.

Filter Criteria

  1. Would a 35-year-old competent-but-busy user have the same complaint? → RED
  2. Is this a genuine accessibility issue (font size, contrast, click targets)? → RED
  3. Is this "I want it to work like paper" resistance to digital? → WHITE
  4. Is this a real workflow inefficiency the persona stumbled on? → YELLOW or RED
  5. Would fixing this add complexity for the 80% who are fine? → WHITE
  6. Does the complaint reveal a missing onboarding moment? → GREEN

This filter is MANDATORY. Never ship raw persona complaints as tickets.

Step 5: Create Tickets

For RED and GREEN items only:

  • Clear, actionable title
  • Include the persona's verbatim quote (entertaining + memorable)
  • The real UX issue underneath (objective)
  • A suggested fix (actionable)
  • Tag/label: "ux-review"

For YELLOW items: one catch-all ticket with all notes.

WHITE items appear in the report only. No tickets.

Max 10 tickets per session — focus on the worst issues.

Step 6: Report

Deliver:

  1. The persona rant (Step 3) — entertaining and visceral
  2. The filtered assessment (Step 4) — pragmatic and actionable
  3. Tickets created (Step 5) — with links
  4. Screenshots of key issues

Tips

  • One persona per session. Don't mix perspectives.
  • Stay in character during Steps 2-3. Break character only at Step 4.
  • Test the CORE WORKFLOW first. Don't get distracted by settings pages.
  • Empty states are gold. New user experience reveals the most friction.
  • The best findings are RED items the persona found accidentally while trying to do something else.
  • If the persona has zero complaints, your persona is too tech-savvy. Make them older, less patient, more set in their ways.
  • Run this before demos, launches, or after shipping a batch of features.
  • Register as a NEW user when possible. Don't use pre-seeded admin accounts — the cold start experience is where most friction lives.
  • Zero WHITE items is a signal, not a failure. If the pragmatism filter finds no noise, your product has real UX problems, not just a grumpy persona.
  • Check known issues in project docs AFTER the test. If the persona found a bug that's already in the known issues list, that's actually the most damning finding — it means the team knew about it but never felt the user's pain.
  • Subscription/paywall testing is critical. Test with expired accounts, not just active ones. The "what happens when you can't pay" experience reveals whether the product respects users or holds their data hostage.
  • Count the clicks to accomplish the persona's ONE task. If it's more than 5, that's almost always a RED finding regardless of persona tech level.

Example Personas by Industry

These are starting points — customize for your specific product:

Product Type Persona Age Key Trait
CRM Retirement home director 68 Filing cabinet is the current CRM
Photography SaaS Rural wedding photographer 62 Books clients by phone, invoices on paper
AI/ML Tool Department store buyer 55 Burned by 3 failed tech startups
Fitness App Old-school gym coach 58 Paper notebook, thick fingers, bad eyes
Accounting Family bakery owner 64 Shoebox of receipts, hates subscriptions
E-commerce Market stall vendor 60 Cash only, smartphone is for calls
Healthcare Senior GP 63 Dictates notes, nurse handles the computer
Education Veteran teacher 57 Chalk and talk, worksheets in ring binders

Rules

  • Stay in character during Steps 2-3
  • Be genuinely mean but fair — find real problems, not manufactured ones
  • The pragmatism filter (Step 4) is MANDATORY
  • Screenshots required for every complaint
  • Max 10 tickets per session
  • Test on staging/deployed app, not local dev
  • One persona, one session, one report