Main content



Loading wiki pages...

Wiki Version:
**Note:** For a full description of the data and processing, see Hosken, F., Bechtold, T., Hoesl, F., Kilchenmann, L., & Senn, O. (2022). Drum Groove Corpora. *Empirical Musicology Review*, 16(1). # Introduction This collection of corpora consists of timing information about drum performances. These three corpora have been brought together, filtered, and processed in order to provide a repository of drum beats that may be utilized in performance timing research. The drum grooves are drawn from performances in a variety of musical styles by a number of drummers, and the data have been obtained through three different means: 1) Human annotation, 2) Automatic onset detection, and 3) MIDI data. The performances captured in this data set are intended to represent a range of performances that have at their heart the fundamental 4/4 Anglo-American popular music backbeat drumming pattern. # Data This collection features one novel corpus of drum grooves (The Loop Loft) as well as a repackaging of two existing corpora (Lucerne Groove Research Library and Google Magenta’s Groove MIDI Dataset). Timing data about the three corpora have been obtained using different methods and each corpus’s drummers performed under different conditions. ## Sources * The Loop Loft ( is a commercial sample shop that provides short loops for DJs and producers to use in their creative work. The company invite performers into the studio to record short drum patterns while listening to a click track. Audio files for each microphone placed on each instrument within the drum kit are available allowing for clear identification of which drum was struck at what time. Here, the performances of four world-famous session musicians were purchased and were analyzed using the MIRtoolbox in MATLAB (Lartillot et al., 2008). Onsets below a threshold of 10% the maximum amplitude (obtained using the MIRtoolbox function “mirevents('FILENAME', 'Threshold', 0.1);”) were discarded to remove bleed from other drums. * The Lucerne Groove Research Library ( is a corpus of 250 drum grooves drawn from commercial recordings played by 50 highly acclaimed drummers in the fields of pop, rock, funk, soul, disco, R&B, and heavy metal. Two professional musicians transcribed the drum patterns by ear. Microtiming measurements were carried out manually using the LARA software (, version 2.6.3). Drum patterns are provided in midi- and mp3-format. Since these drum performances are part of full-band recordings (i.e. not just the drums in isolation) drawn from 1956 to 2014, it is not knowable whether a click track was used in the performance, nor the precise location in time of a click track if one was used. * Google Magenta’s Groove MIDI Dataset ( is a corpus of drum patterns performed on a Roland TD-11 electronic drum kit by five professional drummers and four amateur players (Google employees). Drummers played on this MIDI drum kit to a click track. They performed drum patterns and solos as long as they desired. This corpus was initially created as training data for a machine learning project into expressive drum performances (Gillick et al., 2019). Each corpus includes drum grooves performed in a variety of musical styles and at a variety of tempi. ## Data Types * **Corpus**: the corpus to which the onset belongs (Loop, Lucerne, Magenta) * **Drummer**: the surname of the drummer (for the Loop and Lucerne corpora) or an uppercase letter where the surname is unknown (Magenta). * **Track**: a unique track name. * **Year**: the recording year of the track. * Instrument: the instrument of the drum kit on which the stroke is played (HH = Hi-Hat, SD = Snare Drum, or BD = Bass Drum). * **MetricTime**: the metric time of the stroke (in beats), where 0.00 is bar 1, beat 1; 0.25 is one sixteenth note later; and 4.00 is bar 2, beat 1. * **OnsetTime**: the onset time of the stroke (in seconds). * **MetronomicOnsetTime**: the metronomic onset time (in seconds), estimated by linear regression. * **Tempo**: the tempo (in beats per minute), estimated by linear regression. * **MicrotimingSeconds**: the difference between OnsetTime and MetronomicOnsetTime (in seconds). * **MicrotimingBeats**: the microtiming deviation as a proportion of the beat.