← Technology

Beat Grid

A beat grid is a list of precise timestamps for every beat and downbeat in a track. DJs use it to sync decks, jump between cue points, and automate transitions. TuneLab generates beat grids using the same BiLSTM ensemble as BPM detection, but with Viterbi decoding to produce globally-optimal beat positions — not just an average BPM.

What a beat grid contains

Every beat grid response is a small JSON document:

beat grid
{
  "tempo": 127.87,
  "downbeat": 0.413,
  "beats":     [0.413, 0.882, 1.351, 1.820, 2.289, 2.758, ...],
  "downbeats": [0.413, 2.289, 4.165, 6.041, ...],
  "confidence": 0.94,
  "bpm_alt": null
}

Why you need more than BPM

Static metadata APIs return something like “128 BPM” — useless for syncing. To actually beatmatch, you need:

How we compute it

The pipeline has four stages:

  1. Input — Audio is resampled to 44.1 kHz and fed through a three-resolution spectrogram (1024 / 2048 / 4096 FFT) at 100 Hz, producing 314-dimensional feature vectors per frame.
  2. BiLSTM ensemble — An eight-model bidirectional LSTM ensemble outputs (n_frames, 3) probabilities per frame: [no-beat, beat, downbeat].
  3. Dominant tempo — The inter-onset-interval (IOI) histogram is peak-picked on the beat channel to find the dominant BPM (same step as standalone BPM detection).
  4. Viterbi DBN — A Dynamic Bayesian Network with hidden state (beat_position_in_bar, bar_boundary) decodes the probabilities into a globally-optimal beat path. Transition probabilities constrain tempo to ±10 BPM around the dominant peak, enforcing smooth, consistent beat spacing.

The output of the Viterbi path is a “beat phase” at every frame (0.0 = on beat, 0.5 = between beats). Beats are sampled where phase crosses zero.

Why Viterbi over peak-picking?

Peak-picking is fast but greedy — it grabs the strongest local maxima, which might not form equally-spaced beats. Viterbi is slower but globally optimal: it finds the most-likely sequence of beats given smoothness constraints.

Sync matters more than speed.
Viterbi adds ~40 ms to a full-track analysis. In exchange, your beat grid stays locked through breakdowns and quiet passages — exactly where peak-pickers silently fail.

Half/double tempo detection

A 70 BPM hip-hop track can also be read as 140 BPM by counting hi-hats as beats. We detect this ambiguity by inspecting harmonic ratios in the IOI histogram:

Accuracy benchmarks

Measured on standard MIR datasets with the standard tolerance windows:

TaskDatasetMetricScore
Beat trackingSMCF1 (50 ms tolerance)0.89
Beat trackingBallroomF1 (50 ms tolerance)0.95
Downbeat trackingBeatlesF1 (100 ms tolerance)0.82
Downbeat trackingBallroomF1 (100 ms tolerance)0.91

Downbeat detection is consistently harder than beat detection because it requires understanding meter (4/4 vs 3/4 vs 6/8) rather than just periodicity.

Time signatures

The current model assumes 4/4 meter. Non-4/4 time signatures (3/4 waltzes, 6/8 reggae, 5/4 prog) return beat grids that are still beat-accurate, but the downbeats array may be placed on the wrong beat of each bar. Explicit time signature detection is on the roadmap.

Known limitations

A few edge cases we're honest about:

Use cases

Code example

Fetching a beat grid by track ID:

cURL
curl https://api.tunelab.dev/v1/beatgrid/spotify:2WfaOiMkCvy7F5fcp2zZ8L \
  -H "Authorization: Bearer tl_live_xxx"

A more complete consumer example — JavaScript that schedules a tone on every downbeat:

javascript
const response = await fetch(
  "https://api.tunelab.dev/v1/beatgrid/spotify:2WfaOiMkCvy7F5fcp2zZ8L",
  { headers: { Authorization: "Bearer tl_live_xxx" } }
);
const grid = await response.json();

// Schedule a click on every downbeat
const audioCtx = new AudioContext();
grid.downbeats.forEach((time) => {
  const osc = audioCtx.createOscillator();
  osc.frequency.value = 880;
  osc.connect(audioCtx.destination);
  osc.start(audioCtx.currentTime + time);
  osc.stop(audioCtx.currentTime + time + 0.05);
});

Further reading