This document describes the dataset. Please do not change without good reason.
Go to
[Discussion][1]
[Research using the dataset][2]
[Related research][3]
**Phonation modes dataset**
-----------------------
This is a collection of datasets for training computational models for automated detection of the following phonation modes: **breathy**, **neutral**, **flow** and **pressed** (see Sundberg, J. (1987). *The science of the singing voice.* Illinois University Press).
The collection includes **four sets of recordings, each containing about 900 samples of sustained sung vowels**. These samples are about 750ms long. Nine different vowels are represented in all phonation modes and on all pitches between A3 and G5. All recordings were produced by one female singer under controlled conditions.
Along with the four phonation modes breathy, neutral, flow and pressed, a different kind of pressed sounds called pressedta in the metadata is included: while pressed vocalization was achieved by raising the larynx, pressedta was an attepmt to raise the subglottal pressure directly, without raising the larynx.
License
-------
The datasets were created by Polina Proutskova in 2012 and are available for download under Creative Commons **CC BY-NC-SA** license. This license allows free sharing of the datasets as well as altering them or building new work based upon them. There are following conditions for the use of the datasets according to this license:
- attribution – reference the creators
- no commercial use
- share alike –
if you alter, transform or build upon them, you may distribute the
result only under the same license.
If you would like to **cite** this collection, please use either of the following:
Proutskova, P. (2012). Phonation modes dataset. [http://www.osf.io/pa3ha][4]
Proutskova, P., Rhodes, C., Wiggins, G., and Crawford, T. (2012). Breathy or resonant – a controlled and curated dataset for phonation mode detection in singing. In *Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR 2012).*
This collection of datasets is work in progress, we would appreciate your feedback. We would also welcome contributions from other researchers.
----------
Two recording events took place, each of which was captured by two sets of equipment:
Equipment set 1:
----------------
- Microphone: Electro-Voice N/D375A
- MobilePRE USB – USB Bus-powered
preamp and audio interface from M-Audio
- Mic → Channel 1 (mono)
- MobilePRE → USB port of a MacBookPro, OS X 5.8
- Recorded using Audacity software version 1.3.6
![frequency response N/D357A][5]
The recordings were made with a professional dynamic microphone from Electro-Voice, model no. N/D375A. The model was chosen because of its flat response: +10dB\,\pm1dB between 200 Hz and 15000 Hz. The microphone was positioned horizontally at the level of the singer's mouth, at the distance of 100 cm at which the response curve is given below.
Equipment set 2:
----------------
- Olympus LS10 PCM digital recording device
- automatic recording level
settings
The built-in high-sensitivity, low-noise stereo microphone of Olympus LS10 is a combination of two microphone heads positioned at an 90° angle. It has an overall frequency response 20 – 44000 Hz. In the frequency range of 150 -3000 Hz it displays a flat frequency response of ±2dB and in the range up to 20 kHz the response is ±5dB.
![frequency response Zoom L10][6]
The recorder and the microphone were positioned horizontally at the level of the singer's mouth, at the distance of 50 cm as recommended by the manufacturer for best voice capturing.
I. Recording event 1, equipment set 1
=====================================
44.1 kHz sampling rate, 16 bit signed bit resolution, WAV format
- [**Processed dataset**][7] - original recordings cut into smaller files: each file contains all vowels recorded on one pitch in one phonation mode. Information on the pitch and the mode is given in the filename.
- [**Cut and trimmed dataset**][8] - one file per one sung vowel, trimmed to the middle part of phonation. The metadata is stored externally in a spreadsheet. Currently only recordings for the pitch C4 have been processed, the work will be continued.
- [**Metadata for cut and trimmed recordings**][9]: pitch, vowel, phonation mode, some comments available. Please scroll down to C4
II. Recording event 1, equipment set 2
======================================
Olympus recorder in Zoom mode (highly directional mono recordings), 44 kHz, 16 bit, WAV format
- [**Original recordings**][10] - no metadata/labels; information on pitch, vowel and phonation modes is pronounced during the recordings
- [**Cut and trimmed dataset**][11]: one file per one sung vowel, trimmed to the middle part of phonation. The metadata is stored externally in a spreadsheet. Currently only recordings for the pitch C4 have been processed, the work will be continued.
- [**Metadata for cut and trimmed recordings**][12]: pitch, vowel, phonation mode, some comments available. Please scroll down to C4
III. Recording event 2, equipment set 1
=======================================
96 kHz sampling rate, 24 bit resolution, WAV format
- [**Original recordings**][13] - no metadata/labels; information on pitch, vowel and phonation modes is pronounced during the recordings
- [**Cut and trimmed dataset**][14]: one file per one sung vowel, trimmed to the middle part of phonation. The metadata (pitch, vowel, phonation mode) is currently stored in the filenames. Here **the whole set of recordings has been processed.**
IV. Recording event 2, equipment set 2
======================================
Olympus recorder in standard mode stereo, 96 kHz, 24 bit, WAV format
- [**Original recordings**][15] - no metadata/labels; information on pitch, vowel and phonation modes is pronounced during the recordings
----------
More information on phonation modes, on the recordings and the singer can be found here:
[ISMIR paper][16]
Contact me: proutskova (at) googlemail.com
----------
UPDATE 1/12/2016
recordings in flow mode do not represent the flow phonation described in Sundberg's book
------------------------------------------------------------------------
When I first produced the recordings for the dataset I sent them to Johan Sundberg for verification, and he was happy with them. I published the dataset on the bases of his verification. Later I had a chance to take part in one of his summer schools and to learn about phonation modes not just from the book but from him personally. I found that what he was aiming at with his flow mode was very much the classical Western vocal production with a strong fundamental. What I recorded for the dataset was somehting different. Therefore I would not recommend using my recordings of flow mode for recognition purposes. I leave them in the dataset though - you can use them if your task does not involve flow mode classification.
All other modes - breathy, neutral, pressed - correspond to the terms in Sundberg's book and to general understanding of these terms.
----------
UPDATE 17/01/2017
Loudness
--------
Thanks to Daniel Stoller and QM colleagues, I'd like to mention the factor of loudness here. Phonation modes differ in their loudness: a breathy sound would be softer than a neutral one, and a pressed sound would be louder. This is in line with Sundberg's model which claims that loudness is directly related to subglottal pressure, and subglottal pressure rises from breathy to neutral to pressed. Please bear this in mind if you work with amplitude-sensitive parameters.
[1]: https://osf.io/pa3ha/wiki/Discussion/
[2]: https://osf.io/pa3ha/wiki/Research%20using%20the%20dataset/
[3]: https://osf.io/pa3ha/wiki/Singing%20and%20phonaiton%20modes%20related%20research%20and%20datasets/
[4]: http://www.osf.io/pa3ha
[5]: https://mfr.osf.io/export?url=https://osf.io/46na7/?action=download&direct&mode=render&initialWidth=684&childId=mfrIframe&format=1200x1200.jpeg
[6]: https://mfr.osf.io/export?url=https://osf.io/qzy3t/?action=download&direct&mode=render&initialWidth=684&childId=mfrIframe&format=1200x1200.jpeg
[7]: https://osf.io/q72jg/
[8]: https://osf.io/zgrdc/
[9]: https://osf.io/8gvdq/
[10]: https://osf.io/7jyuu/
[11]: https://osf.io/48pg7/
[12]: https://osf.io/8gvdq/
[13]: https://osf.io/mm97m/
[14]: https://osf.io/cwquj/
[15]: https://osf.io/cwquj/
[16]: http://ismir2012.ismir.net/event/papers/589-ismir-2012.pdf