Pulls the hyperframes skill up to the latest state of heygen-com/hyperframes skill content. Opened 2026-04-17; upstream has shipped CLI, layout, and path changes since. - SKILL.md: promote the visual-style check to a proper HARD-GATE (DESIGN.md > named style > ask 3 questions, with the #333/#3b82f6/Roboto tells); expand Step 6 to cover audio-reactive (mandatory per-frame tl.call() sampling loop — a single long tween does NOT react to audio), caption exit guarantee (hard tl.set kill after group.end), marker highlighting, and scene transitions; add the animation-map script to Verification; link the new features.md. - references/cli.md: add capture and validate (both shipped commands, both referenced from the workflow but missing from the reference). Add --lang to tts with the voice-prefix auto-inference table and espeak-ng dependency note (heygen-com/hyperframes#351, 2026-04-20 — after this PR opened). - references/website-to-video.md: update all paths to the capture/ subfolder layout introduced in heygen-com/hyperframes#345 (capture/screenshots/, capture/assets/, capture/extracted/tokens.json). Old captured/ prefix was broken — agents following the skill were looking for files in wrong locations. - references/features.md (new): distilled coverage for captions (language rule, tone table, word grouping, fitTextFontSize, exit guarantee), TTS (multilingual phonemization, speed tuning), audio-reactive (data format, mapping table, sampling pattern), marker highlighting (highlight/circle/burst/scribble/sketchout), and transitions (energy/ mood tables, presets, shader-compatible CSS rules). Five topics the original PR didn't cover.
14 KiB
HyperFrames Feature Reference
Load this file when a composition needs captions, TTS narration, audio-reactive visuals, marker-style text highlighting, or scene transitions. All patterns here are deterministic (no Math.random(), no Date.now(), no runtime audio analysis) and live on the same GSAP timeline as the rest of the composition.
Captions
Language Rule (Non-Negotiable)
Never use .en whisper models unless the audio is confirmed English. .en models TRANSLATE non-English audio into English instead of transcribing it.
- User says the language →
npx hyperframes transcribe audio.mp3 --model small --language <code>(no.en) - User confirms English →
--model small.en - Language unknown →
--model small(auto-detects)
Style Detection
If the user doesn't specify a caption style, detect it from the transcript tone:
| Tone | Font mood | Animation | Color | Size |
|---|---|---|---|---|
| Hype / launch | Heavy condensed, 800-900 | Scale-pop, back.out(1.7), 0.1-0.2s |
Bright on dark | 72-96px |
| Corporate | Clean sans, 600-700 | Fade+slide, power3.out, 0.3s |
White / neutral + muted accent | 56-72px |
| Tutorial | Mono / clean sans, 500-600 | Typewriter or fade, 0.4-0.5s | High contrast, minimal | 48-64px |
| Storytelling | Serif / elegant, 400-500 | Slow fade, power2.out, 0.5-0.6s |
Warm muted tones | 44-56px |
| Social | Rounded sans, 700-800 | Bounce, elastic.out, word-by-word |
Playful, colored pills | 56-80px |
Word Grouping
- High energy: 2-3 words, quick turnover.
- Conversational: 3-5 words, natural phrases.
- Measured / calm: 4-6 words.
Break on sentence boundaries, 150ms+ pauses, or a max word count.
Positioning
- Landscape (1920x1080): bottom 80-120px, centered.
- Portrait (1080x1920): ~600-700px from bottom, centered.
- Never cover the subject's face.
position: absolute(never relative). One caption group visible at a time.
Text Overflow Prevention
Use the runtime helper so captions never overflow:
const result = window.__hyperframes.fitTextFontSize(group.text.toUpperCase(), {
fontFamily: "Outfit",
fontWeight: 900,
maxWidth: 1600, // 1600 landscape, 900 portrait
});
el.style.fontSize = result.fontSize + "px";
When per-word styling uses scale > 1.0, compute maxWidth = safeWidth / maxScale to leave headroom. Container needs overflow: visible (not hidden — hidden clips scaled emphasis words and glow).
Caption Exit Guarantee
Every group MUST have a hard kill after its exit tween — otherwise groups leak into later ones:
tl.to(groupEl, { opacity: 0, scale: 0.95, duration: 0.12, ease: "power2.in" }, group.end - 0.12);
tl.set(groupEl, { opacity: 0, visibility: "hidden" }, group.end); // deterministic kill
Per-Word Styling
Scan the transcript for words that deserve distinct treatment:
- Brand / product names — larger, unique color.
- ALL CAPS — scale boost, flash, accent color.
- Numbers / statistics — bold weight, accent color.
- Emotional keywords — exaggerated animation (overshoot, bounce).
- Call-to-action — highlight, underline, color pop.
TTS (Kokoro-82M)
Local, no API key. Runs on CPU. Model downloads on first use (~311 MB + ~27 MB voices, cached in ~/.cache/hyperframes/tts/).
Voice Selection
| Content type | Voice | Why |
|---|---|---|
| Product demo | af_heart / af_nova |
Warm, professional |
| Tutorial | am_adam / bf_emma |
Neutral, easy to follow |
| Marketing | af_sky / am_michael |
Energetic or authoritative |
| Documentation | bf_emma / bm_george |
Clear British English |
| Casual | af_heart / af_sky |
Approachable, natural |
Run npx hyperframes tts --list for all 54 voices across 8 languages.
Multilingual Phonemization
Voice ID first letter encodes language: a=American English, b=British English, e=Spanish, f=French, h=Hindi, i=Italian, j=Japanese, p=Brazilian Portuguese, z=Mandarin. The CLI auto-infers the phonemizer locale from that prefix — you don't need --lang when voice and text match.
npx hyperframes tts "La reunión empieza a las nueve" --voice ef_dora --output es.wav
npx hyperframes tts "今日はいい天気ですね" --voice jf_alpha --output ja.wav
Pass --lang only to override auto-detection (e.g. stylized accents):
npx hyperframes tts "Hello there" --voice af_heart --lang fr-fr --output accented.wav
Valid --lang codes: en-us, en-gb, es, fr-fr, hi, it, pt-br, ja, zh. Non-English phonemization requires espeak-ng installed system-wide (apt-get install espeak-ng / brew install espeak-ng).
Speed
0.7-0.8— tutorial, complex content1.0— natural (default)1.1-1.2— intros, upbeat content1.5+— rarely appropriate
TTS + Captions Workflow
npx hyperframes tts script.txt --voice af_heart --output narration.wav
npx hyperframes transcribe narration.wav # → transcript.json (word-level)
Audio-Reactive Visuals
Drive visuals from music, voice, or sound. Any GSAP-tweenable property can respond to pre-extracted audio data.
Data format
const AUDIO_DATA = {
fps: 30,
totalFrames: 900,
frames: [{ bands: [0.82, 0.45, 0.31, /* ... */] }, /* ... */],
};
frames[i].bands[] are frequency band amplitudes, 0-1. Index 0 = bass, higher indices = treble. Each band is normalized independently across the full track.
Mapping audio to visuals
| Audio signal | Visual property | Effect |
|---|---|---|
Bass (bands[0]) |
scale |
Pulse on beat |
Treble (bands[12-14]) |
textShadow, boxShadow |
Glow intensity |
| Overall amplitude | opacity, y, backgroundColor |
Breathe, lift, color shift |
Mid-range (bands[4-8]) |
borderRadius, width |
Shape morphing |
Any GSAP-tweenable property works — clipPath, filter, SVG attributes, CSS custom properties. Let content guide the visual and let audio drive its behavior. Never add equalizer bars, spectrum analyzers, waveform displays, rainbow cycling, or generic particle systems — they look cheap.
Sampling pattern (required)
Audio reactivity needs per-frame sampling via a for loop of tl.call(), NOT a single tween. A single long tween does NOT react to audio:
for (let f = 0; f < AUDIO_DATA.totalFrames; f++) {
tl.call(
((frame) => () => draw(frame))(AUDIO_DATA.frames[f]),
[],
f / AUDIO_DATA.fps,
);
}
Gotchas
- textShadow on a container with semi-transparent children (e.g. inactive caption words at
rgba(255,255,255,0.3)) renders a visible glow rectangle behind every child. Apply the glow to active words individually, not to the container. - Subtlety for text — 3-6% scale variation, soft glow. Heavy pulsing makes text unreadable.
- Go bigger on non-text — backgrounds and shapes can handle 10-30% swings.
- Deterministic only — pre-extracted audio data, no Web Audio API, no runtime analysis.
Marker-Style Highlighting
Deterministic CSS + GSAP implementations of the classic "highlight / circle / burst / scribble / sketchout" drawing modes for emphasizing text. Fully seekable — no animated SVG filters, no JS timers.
Highlight (yellow marker sweep)
<span class="mh-highlight-wrap">
<span class="mh-highlight-bar" id="hl-1"></span>
<span class="mh-highlight-text">highlighted text</span>
</span>
.mh-highlight-wrap { position: relative; display: inline; }
.mh-highlight-bar {
position: absolute; inset: 0 -6px;
background: #fdd835; opacity: 0.35;
transform: scaleX(0); transform-origin: left center;
border-radius: 3px; z-index: 0;
}
.mh-highlight-text { position: relative; z-index: 1; }
tl.to("#hl-1", { scaleX: 1, duration: 0.5, ease: "power2.out" }, 0.6);
Multi-line: apply to .mh-highlight-bar with stagger: 0.3.
Circle
Hand-drawn ellipse around a word. Use a positioned ::before with border-radius: 50%, slight rotation, and clip-path to avoid covering the letters. Animate clip-path or stroke-dashoffset on an inline SVG circle.
Burst
Short radiating lines around a word. Render 6-12 small <span> elements positioned in a radial pattern; animate scaleY from 0.
Scribble
A chaotic overlay created by animating stroke-dashoffset on an inline SVG <path> with a d attribute describing a zig-zag. Seed values, never Math.random().
Sketchout
A rough rectangle outline. Two <rect>s with slight transform offsets, animated via stroke-dashoffset.
All five modes tween CSS transforms or stroke-dashoffset only — both tween cleanly, are deterministic, and seek correctly.
Scene Transitions
Every multi-scene composition MUST use transitions. No jump cuts.
Energy → primary transition
| Energy | CSS primary | Shader primary | Accent | Duration | Easing |
|---|---|---|---|---|---|
| Calm (wellness, brand, luxury) | Blur crossfade, focus pull | Cross-warp morph, thermal distortion | Light leak, circle iris | 0.5-0.8s | sine.inOut, power1 |
| Medium (corporate, SaaS) | Push slide, staggered blocks | Whip pan, cinematic zoom | Squeeze, vertical push | 0.3-0.5s | power2, power3 |
| High (promos, sports, launch) | Zoom through, overexposure | Ridged burn, glitch, chromatic split | Staggered blocks, gravity drop | 0.15-0.3s | power4, expo |
Pick ONE primary (60-70% of scene changes) plus 1-2 accents. Never use a different transition for every scene.
Mood → transition type
| Mood | Transitions |
|---|---|
| Warm / inviting | Light leak, blur crossfade, focus pull, film burn · Shader: thermal distortion, cross-warp morph |
| Cold / clinical | Squeeze, zoom out, blinds, shutter, grid dissolve · Shader: gravitational lens |
| Editorial / magazine | Push slide, vertical push, diagonal split, shutter · Shader: whip pan |
| Tech / futuristic | Grid dissolve, staggered blocks, blinds · Shader: glitch, chromatic split |
| Tense / edgy | Glitch, VHS, chromatic aberration, ripple · Shader: ridged burn, domain warp |
| Playful / fun | Elastic push, 3D flip, circle iris, morph circle · Shader: swirl vortex, ripple waves |
| Dramatic / cinematic | Zoom through, gravity drop, overexposure · Shader: cinematic zoom, gravitational lens |
| Premium / luxury | Focus pull, blur crossfade, color dip to black · Shader: cross-warp morph |
| Retro / analog | Film burn, light leak, VHS, clock wipe · Shader: light leak |
Presets
| Preset | Duration | Easing |
|---|---|---|
snappy |
0.2s | power4.inOut |
smooth |
0.4s | power2.inOut |
gentle |
0.6s | sine.inOut |
dramatic |
0.5s | power3.in → out |
instant |
0.15s | expo.inOut |
luxe |
0.7s | power1.inOut |
Install a shader transition
npx hyperframes add flash-through-white
npx hyperframes add --list
CSS vs shader
- CSS transitions animate scene containers with opacity, transforms,
clip-path, and filters. Simpler to set up. - Shader transitions composite both scene textures per-pixel on a WebGL canvas — can warp, dissolve, and morph in ways CSS cannot. Import from
@hyperframes/shader-transitionsinstead of writing raw GLSL.
Don't mix CSS and shader transitions in the same composition — once a composition uses shader transitions, the WebGL canvas replaces DOM-based scene switching for every transition.
Shader-compatible CSS rules
Shader transitions capture DOM scenes to WebGL textures via html2canvas. The canvas 2D pipeline doesn't match CSS exactly:
- No
transparentkeyword in gradients — use the target color at zero alpha:rgba(200,117,51,0)nottransparent. (Canvas interpolatestransparentasrgba(0,0,0,0)creating dark fringes.) - No gradient backgrounds on elements thinner than 4px. Use solid
background-coloron thin accent lines. - No CSS variables (
var()) on elements visible during capture — html2canvas doesn't reliably resolve custom properties. Use literal color values. - Mark uncapturable decoratives with
data-no-capture— they stay on the live DOM but are absent from the shader texture. - No gradient opacity below 0.15 — renders differently in canvas vs CSS.
- Every
.scenediv must have explicitbackground-color, AND pass the same color asbgColorin theinit()config. Without either, the texture renders as black.
These rules only apply to shader transition compositions. CSS-only compositions have no restrictions.
Don't
- Mix CSS and shader transitions in one composition.
- Use exit animations on any scene except the final scene — the transition IS the exit.
- Introduce a new transition type every scene — pick one primary + 1-2 accents.
- Use transitions that create visible geometric repetition (grids, hex cells, uniform dots) — they look artificial regardless of the math behind them. Prefer organic noise (FBM, domain warping).