- references/cli.md: add Inspect step (5/7) to Workflow + dedicated `## inspect` section between validate and preview, covering --json/--samples/--at flags and the legacy `hyperframes layout` alias - SKILL.md: rename procedure step 7 to "Lint, validate, inspect, preview, render" with the full pipeline; explain inspect as the layout-side companion to validate (catches overflow / off-frame / occluded text issues that static lint can't see) - SKILL.md verification: lint + validate + inspect as a single combined pass - SKILL.md References list: include `inspect` in the cli.md command list Brings the optional skill in sync with hyperframes-oss main as of 2026-05-03 — `inspect` was added in heygen-com/hyperframes#480 (2026-04-25) and is documented as a real workflow step in skills/hyperframes-cli/SKILL.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
14 KiB
| name | description | version | author | license | prerequisites | metadata | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| hyperframes | Create HTML-based video compositions, animated title cards, social overlays, captioned talking-head videos, audio-reactive visuals, and shader transitions using HyperFrames. HTML is the source of truth for video. Use when the user wants a rendered MP4/WebM from an HTML composition, wants to animate text/logos/charts over media, needs captions synced to audio, wants TTS narration, or wants to convert a website into a video. | 1.0.0 | heygen-com | Apache-2.0 |
|
|
HyperFrames
HTML is the source of truth for video. A composition is an HTML file with data-* attributes for timing, a GSAP timeline for animation, and CSS for appearance. The HyperFrames engine captures the page frame-by-frame and encodes to MP4/WebM with FFmpeg.
Complement to manim-video: Use manim-video for mathematical/geometric explainers (equations, 3B1B-style). Use hyperframes for motion-graphics, talking-head with captions, product tours, social overlays, shader transitions, and anything driven by real video/audio media.
When to Use
- User asks for a rendered video from text, a script, or a website
- Animated title cards, lower thirds, or typographic intros
- Captioned narration video (TTS + captions synced to waveform)
- Audio-reactive visuals (beat sync, spectrum bars, pulsing glow)
- Scene-to-scene transitions (crossfade, wipe, shader warp, flash-through-white)
- Social overlays (Instagram/TikTok/YouTube style)
- Website-to-video pipeline (capture a URL, produce a promo)
- Any HTML/CSS/JS animation that must render deterministically to a video file
Do not use this skill for:
- Pure math/equation animation (→
manim-video) - Image generation or memes (→
meme-generation, image models) - Live video conferencing or streaming
Quick Reference
npx hyperframes init my-video # scaffold a project
cd my-video
npx hyperframes lint # validate before preview/render
npx hyperframes preview # live-reload browser preview (port 3002)
npx hyperframes render --output final.mp4 # render to MP4
npx hyperframes doctor # diagnose environment issues
Render flags: --quality draft|standard|high · --fps 24|30|60 · --format mp4|webm · --docker (reproducible) · --strict.
Full CLI reference: references/cli.md.
Setup (one-time)
bash "$(dirname "$(find ~/.hermes/skills -path '*/hyperframes/SKILL.md' 2>/dev/null | head -1)")/scripts/setup.sh"
The script:
- Verifies Node.js >= 22 and FFmpeg are installed (prints fix instructions if not).
- Installs the
hyperframesCLI globally (npm install -g hyperframes@>=0.4.2). - Pre-caches
chrome-headless-shellvia Puppeteer — required for best-quality rendering via Chrome'sHeadlessExperimental.beginFramecapture path. - Runs
npx hyperframes doctorand reports the result.
See references/troubleshooting.md if setup fails.
Procedure
1. Plan before writing HTML
Before touching code, articulate at a high level:
- What — narrative arc, key moments, emotional beats
- Structure — compositions, tracks (video/audio/overlays), durations
- Visual identity — colors, fonts, motion character (explosive / cinematic / fluid / technical)
- Hero frame — for each scene, the moment when the most elements are simultaneously visible. This is the static layout you'll build first.
Visual Identity Gate (HARD-GATE). Before writing ANY composition HTML, a visual identity must be defined. Do NOT write compositions with default or generic colors (#333, #3b82f6, Roboto are tells that this step was skipped). Check in order:
-
DESIGN.mdat project root? → Use its exact colors, fonts, motion rules, and "What NOT to Do" constraints. -
User named a style (e.g. "Swiss Pulse", "dark and techy", "luxury brand")? → Generate a minimal
DESIGN.mdwith## Style Prompt,## Colors(3-5 hex with roles),## Typography(1-2 families),## What NOT to Do(3-5 anti-patterns). -
None of the above? → Ask 3 questions before writing any HTML:
- Mood? (explosive / cinematic / fluid / technical / chaotic / warm)
- Light or dark canvas?
- Any brand colors, fonts, or visual references?
Then generate a
DESIGN.mdfrom the answers. Every composition must trace its palette and typography back toDESIGN.mdor explicit user direction.
2. Scaffold
npx hyperframes init my-video --non-interactive
Templates: blank, warm-grain, play-mode, swiss-grid, vignelli, decision-tree, kinetic-type, product-promo, nyt-graph. Pass --example <name> to pick one, --video clip.mp4 or --audio track.mp3 to seed with media.
3. Layout before animation
Write the static HTML+CSS for the hero frame first — no GSAP yet. The .scene-content container must fill the scene (width:100%; height:100%; padding:Npx) with display:flex + gap. Use padding to push content inward — never position: absolute; top: Npx on a content container (content overflows when taller than the remaining space).
Only after the hero frame looks right, add gsap.from() entrances (animate to the CSS position) and gsap.to() exits (animate from it).
See references/composition.md for the full data-attribute schema and composition rules.
4. Animate with GSAP
Every composition must:
- Register its timeline:
window.__timelines["<composition-id>"] = tl - Start paused:
gsap.timeline({ paused: true })— the player controls playback - Use finite
repeatvalues (norepeat: -1— breaks the capture engine). Calculate:repeat: Math.ceil(duration / cycleDuration) - 1. - Be deterministic — no
Math.random(),Date.now(), or wall-clock logic. Use a seeded PRNG if you need pseudo-randomness. - Build synchronously — no
async/await,setTimeout, or Promises around timeline construction.
See references/gsap.md for the core GSAP API (tweens, eases, stagger, timelines).
5. Transitions between scenes
Multi-scene compositions require transitions. Rules:
- Always use a transition between scenes — no jump cuts.
- Always use entrance animations on every scene element (
gsap.from(...)). - Never use exit animations except on the final scene — the transition IS the exit.
- The final scene may fade out.
Use npx hyperframes add <transition-name> to install shader transitions (flash-through-white, liquid-wipe, etc.). Full list: npx hyperframes add --list.
6. Audio, captions, TTS, audio-reactive, highlighting
- Audio: always a separate
<audio>element (video ismuted playsinline). - TTS:
npx hyperframes tts "Script text" --voice af_nova --output narration.wav. List voices with--list. Voice ID first letter encodes language (a/b=English,e=Spanish,f=French,j=Japanese,z=Mandarin, etc.) — the CLI auto-infers the phonemizer locale; pass--langonly to override. Non-English phonemization requiresespeak-nginstalled system-wide. - Captions:
npx hyperframes transcribe narration.wav→ word-level transcript. Pick style from the transcript tone (hype / corporate / tutorial / storytelling / social — see the table inreferences/features.md). Language rule: never use.enwhisper models unless the audio is confirmed English —.entranslates non-English audio instead of transcribing it. Every caption group MUST have a hardtl.set(el, { opacity: 0, visibility: "hidden" }, group.end)kill after its exit tween — otherwise groups leak visible into later ones. - Audio-reactive visuals: pre-extract audio bands (bass / mid / treble) and sample per-frame inside the timeline with a
forloop oftl.call(draw, [], f / fps)— a single long tween does NOT react to audio. Map bass →scale(pulse), treble →textShadow/boxShadow(glow), overall amplitude →opacity/y/backgroundColor. Avoid equalizer-bar clichés — let content guide the visual, audio drive its behavior. - Marker-style highlighting: highlight, circle, burst, scribble, sketchout effects for text emphasis are deterministic CSS+GSAP — see
references/features.md#marker-highlighting. Fully seekable, no animated SVG filters. - Scene transitions: every multi-scene composition MUST use transitions (no jump cuts). Pick from CSS primitives (push slide, blur crossfade, zoom through, staggered blocks) or shader transitions (
flash-through-white,liquid-wipe,cross-warp-morph,chromatic-split, etc.) vianpx hyperframes add. Mood and energy tables live inreferences/features.md#transitions. Do not mix CSS and shader transitions in the same composition.
7. Lint, validate, inspect, preview, render
npx hyperframes lint # catches missing data-composition-id, overlapping tracks, unregistered timelines
npx hyperframes validate # WCAG contrast audit at 5 timestamps
npx hyperframes inspect # visual layout audit — overflow, off-frame elements, occluded text
npx hyperframes preview # live browser preview
npx hyperframes render --quality draft --output draft.mp4 # fast iteration
npx hyperframes render --quality high --output final.mp4 # final delivery
hyperframes validate samples background pixels behind every text element and warns on contrast ratios below 4.5:1 (or 3:1 for large text). hyperframes inspect is the layout-side companion — runs the page at multiple timestamps and flags issues that a static lint can't see (a caption that wraps past the safe area only at 4.5s, a card that overflows when its title is the longest variant, an element that ends up behind a transition shader). Run inspect especially on compositions with speech bubbles, cards, captions, or tight typography.
8. Website-to-video (if the user gives a URL)
Use the 7-step capture-to-video workflow in references/website-to-video.md: capture → DESIGN.md → SCRIPT.md → storyboard → composition → render → deliver.
Pitfalls
HeadlessExperimental.beginFrame' wasn't found— Chromium 147+ removed this protocol. Ensure you're onhyperframes@>=0.4.2(auto-detects and falls back to screenshot mode). Escape hatch:export PRODUCER_FORCE_SCREENSHOT=true. See hyperframes#294 and references/troubleshooting.md.- System Chrome (not
chrome-headless-shell) — renders hang for 120s then timeout. Runnpx puppeteer browsers install chrome-headless-shell(setup.sh does this).hyperframes doctorreports which binary will be used. repeat: -1anywhere — breaks the capture engine. Always compute a finite repeat count.gsap.set()on clip elements that enter later — the element doesn't exist at page load. Usetl.set(selector, vars, timePosition)inside the timeline instead, at or after the clip'sdata-start.<br>inside content text — forced breaks don't know the rendered font width, so natural wrap +<br>double-breaks. Usemax-widthto let text wrap. Exception: short display titles where each word is deliberately on its own line.- Animating
visibilityordisplay— GSAP can't tween these. UseautoAlpha(handles both visibility and opacity). - Calling
video.play()oraudio.play()— the framework owns playback. Never call these yourself. - Building timelines async — the capture engine reads
window.__timelinessynchronously after page load. Never wrap timeline construction inasync,setTimeout, or a Promise. - Standalone
index.htmlwrapped in<template>— hides all content from the browser. Only sub-compositions loaded viadata-composition-srcuse<template>. - Using video for audio — always muted
<video>+ separate<audio>.
Verification
Before and after rendering:
- Lint + validate + inspect pass:
npx hyperframes lint --strict && npx hyperframes validate && npx hyperframes inspect(lint catches structural issues, validate catches contrast, inspect catches visual layout / overflow issues — see troubleshooting.md if warnings appear). - Animation choreography — for new compositions or significant animation changes, run the animation map.
npx hyperframes initcopies the skill scripts into the project, so the path is project-local:
Outputs a singlenode skills/hyperframes/scripts/animation-map.mjs <composition-dir> \ --out <composition-dir>/.hyperframes/anim-mapanimation-map.jsonwith per-tween summaries, ASCII Gantt timeline, stagger detection, dead zones (>1s with no animation), element lifecycles, and flags (offscreen,collision,invisible,paced-fast<0.2s,paced-slow>2s). Scan summaries and flags — fix or justify each. Skip on small edits. - File exists + non-zero:
ls -lh final.mp4. - Duration matches
data-duration:ffprobe -v error -show_entries format=duration -of default=nw=1:nk=1 final.mp4. - Visual check: extract a mid-composition frame:
ffmpeg -i final.mp4 -ss 00:00:05 -vframes 1 preview.png. - Audio present if expected:
ffprobe -v error -show_streams -select_streams a -of default=nw=1:nk=1 final.mp4 | head -1.
If hyperframes render fails, run npx hyperframes doctor and attach its output when reporting.
References
- composition.md — data attributes, timeline contract, non-negotiable rules, typography/asset rules
- cli.md — every CLI command (init, capture, lint, validate, inspect, preview, render, transcribe, tts, doctor, browser, info, upgrade, benchmark)
- gsap.md — GSAP core API for HyperFrames (tweens, eases, stagger, timelines, matchMedia)
- features.md — captions, TTS, audio-reactive, marker highlighting, transitions (load on demand)
- website-to-video.md — 7-step capture-to-video workflow
- troubleshooting.md — OpenClaw fix, env vars, common render errors