Breathy, Resonant, Pressed - Automatic Detection Of Phonation Mode From Audio Recordings of Singing
------------------------------------------------------------------------
by Polina Proutskova, Christophe Rhodes, Tim Crawford, Geraint Wiggins
**How to reproduce experiment results**
Paper citation:
Proutskova, P., Rhodes, C., Crawford, T., and Wiggins, G. (2013 in press). Breathy, resonant, pressed - automatic detection of phonation mode from audio recordings of singing. *Journal of New Music Research, special issue on Computational Ethnomusicology*
For the details of the dataset please refer to
[this project][1]
as well as to the [ISMIR paper][2]:
Proutskova, P., Rhodes, C., Wiggins, G., and Crawford, T. (2012). Breathy or resonant – a controlled and curated dataset for phonation mode detection in singing. In *Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR 2012)*.
To reproduce the experiment you will need Matlab (we used version R2009aSV)
Please follow these steps:
1. [Download][3] the dataset
2. [Download][4] the Matlab code
This Matlab package contains the code of [TKK Aparat][5] by Matti Airas
as well as our functions allowing for batch processing with TKK Aparat, grid searches described in the paper as well as plotting. We have included the TKK Aparat code we used in this package to ensure the version compatibility.
3. Set paths:
- include the location of the downloaded Matlab code in your Matlab path
- create a folder in which experiment logs and results will be stored
- in the downloaded Matlab package go to the folder “PP”. In the script “vowels_experiment_all.m” set the path to the new folder in “path_experiment” right after the global declarations.
- In the next section of the same script choose and uncomment (or re-write) the set of vowels you would like the experiment to run for.
- Further down in the script, in the for-cycle, set the correct path to the location of the dataset in “audio_dir”.1 (This arrangement is unfortunate and was due to the debugging of audio file names spellings, which has been finished. Unfortunately, my Matlab version stopped working after I upgraded to a new OSX version and I am not able to correct it now. In future the root folder of the dataset will be set at the beginning of the script in a global variable.)
4. Run the “vowels_experiment_all.m” script from the PP folder. This will produce Matlab workspace output that documents the process of the experiment. The same output is written to the log file with the name [path_experiment, 'vowel_', vowel, '_4class_noTA_log.txt']. The experiment will run sequentially for all vowels set in the “vowels” variable at the beginning of the script and for each vowel it will include the following steps:
- First, a coarse grid search for the best performing values of the variables “number of formants” and “lip radiation” is performed:
a) for each point in the grid, inverse filtering of audio signals is performed with the given values of “number of formants” and “lip radiation” to calculate six glottal waveform descriptors – the low-level features;
b) a mean classification accuracy is determined by means of cross-validation based on phonation mode labels currently stored in the audio file names; we use radial kernel SVM for classification, the parameters C and gamma are optimised via grid search (see the “Methodology” and the “Experiment” sections of the paper).
The results for classification accuracy as well as for SVM parameters C and gamma are plotted and the plots are stored in the files whose filenames are of the form ['acc_plot_', vowel, '_4class_noTA.jpg']. Classification results are saved in files with filenames of the form ['results_acc_', vowel, '.mat'].
- Second, promising areas with highest accuracy results are chosen on the coarse grid.
- Third, fine grid searches are performed around the promising grids of the coarse grid. A point on the fine grid with the highest overall classification accuracy is determined and logged. A confusion matrix for classification results at this grid point is calculated and also written to the output and to the log file.
[1]: https://osf.io/pa3ha/
[2]: http://ismir2012.ismir.net/event/papers/589-ismir-2012.pdf
[3]: https://osf.io/cwquj/
[4]: https://osf.io/wxjfw/
[5]: http://sourceforge.net/projects/aparat/