Comparison of performance of automatic recognition procedures for identifying stutters in speech trained with Event or Interval markers

doi:10.17605/OSF.IO/29K7Q

Title	Authors

Home

# Overview This provides an open resource for the primary materials used for *"Comparison of Performance of Automatic Recognition procedures for Identifying Stutters in Speech Trained with Event or Interval markers"* - Barrett, Tang & Howell (2024). **Liam Barrett, Kevin Tang & Peter Howell. 2024. Comparison of performance of automatic recognizers for stutters in speech trained with event or interval markers. *Frontiers in Psychology* 15. http://dx.doi.org/10.3389/fpsyg.2024.1155285** The features, labels, models and metrics for the paper can be accessed herein under the Files tab. The Files are divided into 5 Folders: 1. Dataset Descriptive Statistics, 2. Features, 3. Labels, 4. Metrics and, 5. Models ## Dataset Descriptive Statistics An excel file containing the descriptive statistics for each sub-dataset. For the paper, two datasets were used to train, validate and test models: UCLASS and KSoF. For the UCLASS dataset, four subsets were created. Three interval-based subsets, where data was segmented into 2-, 3- and 4-second intervals. Additionally, one subset segmented data based off events within the audio signal: syllables. For the KSoF dataset, only 3-second intervals were used. The excel file provides the frequency of prolongations, part-word repetitions, whole word repetitions, breaks and fluent intervals/events. ## Features For each subset, the features used to train, validate and test the models are available as `.csv` files. There are subdivided into KSoF and UCLASS feature sets. Under the `Files/OSF Storage/features/uclass` path, folders containing the acoustic only features for the 3 interval subsets and 1 event subset are available. Each feature file contains a header with the observation information `sess_id`, feature names `PC of zcr`, `PC of energy` etc., and the class of the observation `target`. I.e., for acoustic features from the 3-sec interval UCLASS dataset: | sess_id | PC of zcr | PC of energy | ... | PC of delta mfcc_13 | target | | :--------------------------- | :-------- | :----------- | :-- | :------------------ | :----- | | M_1106_25y0m_1_49.5_52.5.wav | -2.234535 | -3.231726 | ... | 3.291570 | 0 | | M_1106_25y0m_1_51.0_54.0.wav | 1.631374 | 0.764057 | ... | 3.866490 | 0 | | M_1106_25y0m_1_57.0_60.0.wav | 2.248585 | 2.624673 | ... | 5.415814 | 0 | Additionally, the language model features are available under any `.csv` with `_whisper_` in the handle. Similarly, The wav2vec features are available in any `.csv` with `_wav2vec_` handle. ### A note on language model features The `.csv` files for the language model features do not include the acoustic features nor the `target` column. Therefore, the features need to be concatenated if using both Acoustic and Language features. This is given in the worked example at the bottom. ## Labels The class of an interval/event is available under the labels section. This is available for all UCLASS data. For each data subset, a `.csv` is available. This contains the onsets and offsets of each segment (either interval or event). The numerical label and the String labels. I.e., for 3-second intervals of the audio file `M_0030_16y4m_1.wav`. | t_min | t_max | labels_numerical | labels_string | | :---- | :---- | :--------------- | :------------ | | 0 | 3 | 5 | PWR + Block | | 1.5 | 4.5 | 4 | Block | | 3 | 6 | 0 | Fluent | Note, the labels files contain some instances of multi-stuttering intervals. For the current paper, all instances were dropped. Future analyses could utilise this multi-label information. For KSoF labels, please request access from Bayerl et al. at [https://zenodo.org/record/6801844#.Y9PWmXDP1D8](https://zenodo.org/record/6801844#.Y9PWmXDP1D8). Specifically, request the splits used in Schuller et al. ([2022](https://dl.acm.org/doi/abs/10.1145/3503161.3551591?casa_token=VSUAYPZBnz8AAAAA:6lOtEupPrJHRxo607I602KGZs6ySo03bpK956oQuTsx07_v_GtxGXRWBJ5ajACjRSkN-Ag33VaQL)) ## Metrics A `.csv` file containing the primary metrics for each and every model is available under `Files/OSF Storage/metrics/full_class_long.csv`. The primary metrics being: 1. Precision, recall and F1-score for each class of speech 2. Weighted and unweighted precision, recall and F1-Score 3. Accuracy ## Models The saved instances for each model reported in the paper is available under `Files/OSF Storage/models/`. Next, model files can be differentiated by their file handle. `.h5` files are TensorFlow instances of a multilayer perceptron. `.joblib` are scikit-learn instances of a Gaussian support. ## A note on audio While the audio files are not directly available in this storage, the KSoF audio files are available from the authors at [https://zenodo.org/record/6801844#.Y9PWmXDP1D8](https://zenodo.org/record/6801844#.Y9PWmXDP1D8) and the UCLASS audio files are available at [https://www.uclass.psychol.ucl.ac.uk/](https://www.uclass.psychol.ucl.ac.uk/). ## Worked example Using the materials available from this storage, one can reproduce the results from the original paper. This worked example will: 1. Load the acoustic and language features as well as the target vector 2. Load in the G-SVM trained on the 3-sec interval subset 3. Concatenate the acoustic and language features 4. Input the test features to the G-SVM 5. Output the results of the G-SVM's predictions on the test set. It is assumed that the necessary Python modules are installed and the following files have been downloaded and are in the current working directory: `Files/OSF Storage/features/uclass/Principal Components of Acoustic Features from 3-sec Intervals/test_pc_classic_features.csv` `Files/OSF Storage/features/uclass/whisper/3sec_scaled_whisper_test.csv` `Files/OSF Storage/models/acoustic and language/rbf_svm_uclass_3sec_interval_pc_classic_whisper.joblib` ```python # Import Modules import pandas as pd import numpy as np from sklearn.svm import SVC from sklearn.metrics import classification_report, confusion_matrix, roc_curve, auc, roc_auc_score, mean_squared_error from joblib import dump, load # Load in data ## Classic Acoustic classic_test_feat_df = pd.read_csv('test_pc_classic_feat.csv') ## Classic Acoustic wav2vec_test_feat_df = pd.read_csv('test_wav2vec_pcTime_feat.csv') ## Language asr_test_feat_df = pd.read_csv('test_scaled_whisper_feat.csv') ## Concatenate acoustic and language features test_feat_df = pd.concat([classic_test_feat_df.iloc[:, 1:-1], wav2vec_test_feat_df, asr_test_feat_df], axis=1) # note that the first and final columns of the acoustic dataframe are excluded as these are the sess_id and target columns, respectively ## convert to numpy arrays X_test = test_feat_df.to_numpy() y_test = classic_test_feat_df['target'].to_frame(name='target') target_names = ['Fluent','Prolongation', 'PWR', 'WWR', 'Block'] # Load in SVM instance svm = load('rbf_svm_ulcass_3sec_interval.joblib') # Run SVM on test data y_pred = svm.predict(X_test) ## get classification results report = classification_report(y_test, y_pred) print('Classification report \n', report) ```

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.

Create an Account Learn More Hide this message

Main content

Home

Menu

Start managing your projects on the OSF today.

Main content

Links to this project

Home

Menu

Add new wiki page

Page permissions have changed

Wiki page deleted

Connected to the collaborative wiki

Connecting to the collaborative wiki

Collaborative wiki is unavailable

Browser unsupported

Start managing your projects on the OSF today.