Beat Grid
A beat grid is a list of precise timestamps for every beat and downbeat in a track. DJs use it to sync decks, jump between cue points, and automate transitions. TuneLab generates beat grids using the same BiLSTM ensemble as BPM detection, but with Viterbi decoding to produce globally-optimal beat positions — not just an average BPM.
What a beat grid contains
Every beat grid response is a small JSON document:
{
"tempo": 127.87,
"downbeat": 0.413,
"beats": [0.413, 0.882, 1.351, 1.820, 2.289, 2.758, ...],
"downbeats": [0.413, 2.289, 4.165, 6.041, ...],
"confidence": 0.94,
"bpm_alt": null
}
tempo— Dominant BPM as a float (not rounded).downbeat— Timestamp (seconds) of the first downbeat in the track (beat 1 of bar 1).beats— Every beat timestamp, including downbeats.downbeats— Only the downbeats (beat 1 of each bar).confidence— Overall beat tracking confidence, 0.0–1.0.bpm_alt— Alternative BPM if half/double is ambiguous;nullwhen we're confident.
Why you need more than BPM
Static metadata APIs return something like “128 BPM” — useless for syncing. To actually beatmatch, you need:
- Phase offset — where does beat 1 actually start?
- Every individual beat — not computed from BPM, because real tempo wobbles.
- Downbeat detection — the “1” of each bar, essential for musically-aware cuts.
- Confidence — so you know when to trust the grid and when to fall back to manual.
How we compute it
The pipeline has four stages:
- Input — Audio is resampled to 44.1 kHz and fed through a three-resolution spectrogram (1024 / 2048 / 4096 FFT) at 100 Hz, producing 314-dimensional feature vectors per frame.
-
BiLSTM ensemble — An eight-model bidirectional LSTM ensemble outputs
(n_frames, 3)probabilities per frame:[no-beat, beat, downbeat]. - Dominant tempo — The inter-onset-interval (IOI) histogram is peak-picked on the beat channel to find the dominant BPM (same step as standalone BPM detection).
-
Viterbi DBN — A Dynamic Bayesian Network with hidden state
(beat_position_in_bar, bar_boundary)decodes the probabilities into a globally-optimal beat path. Transition probabilities constrain tempo to ±10 BPM around the dominant peak, enforcing smooth, consistent beat spacing.
The output of the Viterbi path is a “beat phase” at every frame (0.0 = on beat, 0.5 = between beats). Beats are sampled where phase crosses zero.
Why Viterbi over peak-picking?
Peak-picking is fast but greedy — it grabs the strongest local maxima, which might not form equally-spaced beats. Viterbi is slower but globally optimal: it finds the most-likely sequence of beats given smoothness constraints.
- Peak-picking is roughly 10× faster.
- But grids drift on low-confidence sections — quiet intros, breakdowns, bridges.
- For DJ use, drift is unacceptable. Viterbi every time.
Viterbi adds ~40 ms to a full-track analysis. In exchange, your beat grid stays locked through breakdowns and quiet passages — exactly where peak-pickers silently fail.
Half/double tempo detection
A 70 BPM hip-hop track can also be read as 140 BPM by counting hi-hats as beats. We detect this ambiguity by inspecting harmonic ratios in the IOI histogram:
- If both 70 and 140 show strong peaks, the track is tempo-ambiguous.
- In that case we return both values:
tempo: 70,bpm_alt: 140(or the inverse). - Clients decide which to use based on genre expectations — hip-hop usually half-time, drum & bass usually full-time.
Accuracy benchmarks
Measured on standard MIR datasets with the standard tolerance windows:
| Task | Dataset | Metric | Score |
|---|---|---|---|
| Beat tracking | SMC | F1 (50 ms tolerance) | 0.89 |
| Beat tracking | Ballroom | F1 (50 ms tolerance) | 0.95 |
| Downbeat tracking | Beatles | F1 (100 ms tolerance) | 0.82 |
| Downbeat tracking | Ballroom | F1 (100 ms tolerance) | 0.91 |
Downbeat detection is consistently harder than beat detection because it requires understanding meter (4/4 vs 3/4 vs 6/8) rather than just periodicity.
Time signatures
The current model assumes 4/4 meter. Non-4/4 time signatures (3/4 waltzes, 6/8
reggae, 5/4 prog) return beat grids that are still beat-accurate, but the
downbeats array may be placed on the wrong beat of each bar. Explicit
time signature detection is on the roadmap.
Known limitations
A few edge cases we're honest about:
- Tempo changes (accelerando, rallentando) — the reported
tempois an average; beats drift at the extremes. - Free-time intros and outros — may include spurious beats before the groove starts.
- Half-time / double-time breakdowns — beats are correct but “feel” wrong until the groove resumes.
- Live recordings with audience noise — lower confidence, occasional extra beats.
- Tracks shorter than ~10 seconds — the DBN needs enough frames to stabilise; very short clips are unreliable.
Use cases
- Deck sync — align beats between two tracks for seamless mixing.
- Auto-cue — set playback cue points on specific beats or bars.
- Quantized loops — loop exactly 4, 8, or 16 beats with zero drift.
- Phrase-aware transitions — cut to the next track on a downbeat for musically-satisfying drops.
- Beat-synced visuals — flash lights, trigger VFX, or animate overlays on every beat.
- Drum replacement — quantize sample triggers to the grid instead of the waveform.
Code example
Fetching a beat grid by track ID:
curl https://api.tunelab.dev/v1/beatgrid/spotify:2WfaOiMkCvy7F5fcp2zZ8L \
-H "Authorization: Bearer tl_live_xxx"
A more complete consumer example — JavaScript that schedules a tone on every downbeat:
const response = await fetch(
"https://api.tunelab.dev/v1/beatgrid/spotify:2WfaOiMkCvy7F5fcp2zZ8L",
{ headers: { Authorization: "Bearer tl_live_xxx" } }
);
const grid = await response.json();
// Schedule a click on every downbeat
const audioCtx = new AudioContext();
grid.downbeats.forEach((time) => {
const osc = audioCtx.createOscillator();
osc.frequency.value = 880;
osc.connect(audioCtx.destination);
osc.start(audioCtx.currentTime + time);
osc.stop(audioCtx.currentTime + time + 0.05);
});
Further reading
- Böck & Schedl, Enhanced Beat Tracking with Context-Aware Neural Networks (DAFx 2011)
- Krebs et al., A Multi-Model Approach to Beat Tracking Considering Heterogeneous Music Styles (ISMIR 2014) — the DBN approach we build on
- API reference —
GET /v1/beatgrid/{id} - Technology — BPM detection