mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-07 02:51:50 +00:00
Target: every skill's description fits in a one-line gateway menu and leads with trigger keywords an agent would match on. Drops filler like 'Use this skill to', 'A skill for', 'This skill provides'. Before: max description length was 791 chars (architecture-diagram), 74 of 81 built-in skills were >60 chars. After: max 60, mean 54, all 81 built-in skills <=60. Rewritten with double-quoted YAML scalars to preserve Chinese/arrow glyphs (baoyu-comic, yuanbao, youtube-content).
2.2 KiB
2.2 KiB
| name | description | version | author | license | metadata | prerequisites | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| songsee | Audio spectrograms/features (mel, chroma, MFCC) via CLI. | 1.0.0 | community | MIT |
|
|
songsee
Generate spectrograms and multi-panel audio feature visualizations from audio files.
Prerequisites
Requires Go:
go install github.com/steipete/songsee/cmd/songsee@latest
Optional: ffmpeg for formats beyond WAV/MP3.
Quick Start
# Basic spectrogram
songsee track.mp3
# Save to specific file
songsee track.mp3 -o spectrogram.png
# Multi-panel visualization grid
songsee track.mp3 --viz spectrogram,mel,chroma,hpss,selfsim,loudness,tempogram,mfcc,flux
# Time slice (start at 12.5s, 8s duration)
songsee track.mp3 --start 12.5 --duration 8 -o slice.jpg
# From stdin
cat track.mp3 | songsee - --format png -o out.png
Visualization Types
Use --viz with comma-separated values:
| Type | Description |
|---|---|
spectrogram |
Standard frequency spectrogram |
mel |
Mel-scaled spectrogram |
chroma |
Pitch class distribution |
hpss |
Harmonic/percussive separation |
selfsim |
Self-similarity matrix |
loudness |
Loudness over time |
tempogram |
Tempo estimation |
mfcc |
Mel-frequency cepstral coefficients |
flux |
Spectral flux (onset detection) |
Multiple --viz types render as a grid in a single image.
Common Flags
| Flag | Description |
|---|---|
--viz |
Visualization types (comma-separated) |
--style |
Color palette: classic, magma, inferno, viridis, gray |
--width / --height |
Output image dimensions |
--window / --hop |
FFT window and hop size |
--min-freq / --max-freq |
Frequency range filter |
--start / --duration |
Time slice of the audio |
--format |
Output format: jpg or png |
-o |
Output file path |
Notes
- WAV and MP3 are decoded natively; other formats require
ffmpeg - Output images can be inspected with
vision_analyzefor automated audio analysis - Useful for comparing audio outputs, debugging synthesis, or documenting audio processing pipelines