← Technology

BPM Detection

TuneLab’s BPM detector uses an ensemble of bidirectional LSTM models originally trained for beat tracking. Unlike metadata lookups that return integer BPMs (“128”), we report float precision (127.87). That precision matters for DJ software, beatmatching, and remixing.

Why float precision matters

Real tempo is rarely exactly integer. A 127.87 BPM house track is subtly different from 128.00 — DJs feel it when syncing decks over long mixes, and producers need the extra decimal places to time-stretch accurately without phase artifacts. Metadata scrapers round because their source data is rounded. We compute from raw audio, so we don’t have to.

The pipeline

Each track goes through six stages:

Audio loading. Decode to mono 44.1 kHz PCM. Clip to the first 60 seconds for efficiency — tempo almost never changes enough mid-track to matter.
3-resolution spectrogram. Compute STFT at three FFT sizes (1024, 2048, 4096) for different time/frequency trade-offs. Stack into a 314-dimensional feature vector per frame at 100 Hz frame rate.
BiLSTM inference. Run a bidirectional LSTM ensemble (three models, averaged). Output shape is (n_frames, 3): probabilities for [no-beat, beat, downbeat] at each frame.
Peak-picking. Find local maxima in the beat probability curve with a minimum distance of 7 frames (70 ms, matching typical fastest beat intervals in extreme genres).
IOI histogram. Compute inter-onset intervals between consecutive beats. Build a weighted histogram. The dominant peak corresponds to the track’s tempo.
Float refinement. Fit a parabola around the histogram peak to get sub-bin precision. This is where float precision comes from — we’re interpolating between histogram bins.

Viterbi phase analysis

For the beat grid endpoint (not just BPM), we additionally run Viterbi decoding on the LSTM output. This finds the globally-optimal sequence of beat positions given a narrow BPM search range (±10 BPM around the histogram peak). The result guarantees that beats are equally-spaced across the whole track, even if individual frame probabilities are noisy during quiet intros or sparse breakdowns.

Trade-off: Viterbi is O(n × k²) where k is the number of tempo candidates — roughly 10× slower than peak-picking. For the /v1/bpm endpoint we skip it. For /v1/beatgrid we always run it.

Accuracy

Dataset	Tolerance	Accuracy
GTZAN	±2%	94.8%
Ballroom	±2%	96.2%
MIREX-style	±4%	98.1%
SMC (hard cases)	±4%	79.3%

Accuracy degrades on electronic genres with complex rhythmic structures (trap, dubstep sub-bass drops, juke) where the perceptual “beat” is ambiguous. In those cases we report both primary and alternative BPM candidates so your application can decide.

Half/double tempo ambiguity

A classic problem: a 70 BPM hip-hop track could also be interpreted as 140 BPM (counting hi-hats as beats). We detect this by looking for harmonic ratios in the IOI histogram. When both 70 and 140 have strong peaks, we return:

{
  "tempo": 70.14,
  "bpm_alt": 140.28,
  "confidence": 0.63
}

When confidence is high (>0.85), bpm_alt is null. Your client can use genre heuristics to pick the right one — hip-hop is usually half-time, drum & bass is usually full-time.

Known limitations

Variable-tempo tracks (classical with rubato, jazz ballads) — we report the average tempo; beats drift at extremes.
Tracks shorter than 10 seconds — unreliable, low confidence.
Heavily syncopated genres (juke, footwork, trap) — may return half/double.
Silence or pure tones — low confidence, BPM reported as null.

Confidence score

We report confidence as the ratio of the dominant IOI histogram peak to the second-highest peak. Rule of thumb:

> 0.85 — very confident. Use the BPM directly.
0.60 – 0.85 — probably right, but check bpm_alt.
< 0.60 — ambiguous. Review with a human or fall back to metadata.

Code example

curl https://api.tunelab.dev/v1/bpm \
  -H "Authorization: Bearer tl_live_xxx" \
  -H "Content-Type: audio/mpeg" \
  --data-binary @track.mp3

{
  "tempo": 127.87,
  "confidence": 0.93,
  "bpm_alt": null,
  "_meta": {
    "latency_ms": 1420,
    "credits_used": 2,
    "credits_remaining": 948
  }
}