BPM Detection
TuneLab’s BPM detector uses an ensemble of bidirectional LSTM models
originally trained for beat tracking. Unlike metadata lookups that return
integer BPMs (“128”), we report float precision
(127.87). That precision matters for DJ software, beatmatching,
and remixing.
Why float precision matters
Real tempo is rarely exactly integer. A 127.87 BPM house track is subtly different from 128.00 — DJs feel it when syncing decks over long mixes, and producers need the extra decimal places to time-stretch accurately without phase artifacts. Metadata scrapers round because their source data is rounded. We compute from raw audio, so we don’t have to.
The pipeline
Each track goes through six stages:
- Audio loading. Decode to mono 44.1 kHz PCM. Clip to the first 60 seconds for efficiency — tempo almost never changes enough mid-track to matter.
- 3-resolution spectrogram. Compute STFT at three FFT sizes (1024, 2048, 4096) for different time/frequency trade-offs. Stack into a 314-dimensional feature vector per frame at 100 Hz frame rate.
- BiLSTM inference. Run a bidirectional LSTM ensemble (three models, averaged). Output shape is
(n_frames, 3): probabilities for [no-beat, beat, downbeat] at each frame. - Peak-picking. Find local maxima in the beat probability curve with a minimum distance of 7 frames (70 ms, matching typical fastest beat intervals in extreme genres).
- IOI histogram. Compute inter-onset intervals between consecutive beats. Build a weighted histogram. The dominant peak corresponds to the track’s tempo.
- Float refinement. Fit a parabola around the histogram peak to get sub-bin precision. This is where float precision comes from — we’re interpolating between histogram bins.
Viterbi phase analysis
For the beat grid endpoint (not just BPM), we additionally run Viterbi decoding on the LSTM output. This finds the globally-optimal sequence of beat positions given a narrow BPM search range (±10 BPM around the histogram peak). The result guarantees that beats are equally-spaced across the whole track, even if individual frame probabilities are noisy during quiet intros or sparse breakdowns.
Trade-off: Viterbi is O(n × k²) where k is the number of
tempo candidates — roughly 10× slower than peak-picking. For the
/v1/bpm endpoint we skip it. For /v1/beatgrid we
always run it.
Accuracy
| Dataset | Tolerance | Accuracy |
|---|---|---|
| GTZAN | ±2% | 94.8% |
| Ballroom | ±2% | 96.2% |
| MIREX-style | ±4% | 98.1% |
| SMC (hard cases) | ±4% | 79.3% |
Accuracy degrades on electronic genres with complex rhythmic structures (trap, dubstep sub-bass drops, juke) where the perceptual “beat” is ambiguous. In those cases we report both primary and alternative BPM candidates so your application can decide.
Half/double tempo ambiguity
A classic problem: a 70 BPM hip-hop track could also be interpreted as 140 BPM (counting hi-hats as beats). We detect this by looking for harmonic ratios in the IOI histogram. When both 70 and 140 have strong peaks, we return:
{
"tempo": 70.14,
"bpm_alt": 140.28,
"confidence": 0.63
}
When confidence is high (>0.85), bpm_alt is null.
Your client can use genre heuristics to pick the right one — hip-hop is
usually half-time, drum & bass is usually full-time.
Known limitations
- Variable-tempo tracks (classical with rubato, jazz ballads) — we report the average tempo; beats drift at extremes.
- Tracks shorter than 10 seconds — unreliable, low confidence.
- Heavily syncopated genres (juke, footwork, trap) — may return half/double.
- Silence or pure tones — low confidence, BPM reported as
null.
Confidence score
We report confidence as the ratio of the dominant IOI histogram peak to the second-highest peak. Rule of thumb:
- > 0.85 — very confident. Use the BPM directly.
- 0.60 – 0.85 — probably right, but check
bpm_alt. - < 0.60 — ambiguous. Review with a human or fall back to metadata.
Code example
curl https://api.tunelab.dev/v1/bpm \
-H "Authorization: Bearer tl_live_xxx" \
-H "Content-Type: audio/mpeg" \
--data-binary @track.mp3
{
"tempo": 127.87,
"confidence": 0.93,
"bpm_alt": null,
"_meta": {
"latency_ms": 1420,
"credits_used": 2,
"credits_remaining": 948
}
}
Further reading
- Böck et al., Multi-Model Beat Tracking (ISMIR 2016) — the BiLSTM ensemble architecture we build on.
- Ellis, Beat Tracking by Dynamic Programming (J. New Music Research 2007) — the original Viterbi beat tracking paper.
- Krebs et al., A Multi-Model Approach to Beat Tracking (ISMIR 2014) — the DBN approach for phase analysis.
- API reference: POST /v1/bpm
- API reference: GET /v1/beatgrid/{id}
- Related: Beat Grid deep dive