Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
# Overview This provides an open resource for the primary materials used for *"Comparison of Performance of Automatic Recognition procedures for Identifying Stutters in Speech Trained with Event or Interval markers"* - Barrett, Tang & Howell (2024). **Liam Barrett, Kevin Tang & Peter Howell. 2024. Comparison of performance of automatic recognizers for stutters in speech trained with event or interval markers. *Frontiers in Psychology* 15. http://dx.doi.org/10.3389/fpsyg.2024.1155285** The features, labels, models and metrics for the paper can be accessed herein under the Files tab. The Files are divided into 5 Folders: 1. Dataset Descriptive Statistics, 2. Features, 3. Labels, 4. Metrics and, 5. Models ## Dataset Descriptive Statistics An excel file containing the descriptive statistics for each sub-dataset. For the paper, two datasets were used to train, validate and test models: UCLASS and KSoF. For the UCLASS dataset, four subsets were created. Three interval-based subsets, where data was segmented into 2-, 3- and 4-second intervals. Additionally, one subset segmented data based off events within the audio signal: syllables. For the KSoF dataset, only 3-second intervals were used. The excel file provides the frequency of prolongations, part-word repetitions, whole word repetitions, breaks and fluent intervals/events. ## Features For each subset, the features used to train, validate and test the models are available as `.csv` files. There are subdivided into KSoF and UCLASS feature sets. Under the `Files/OSF Storage/features/uclass` path, folders containing the acoustic only features for the 3 interval subsets and 1 event subset are available. Each feature file contains a header with the observation information `sess_id`, feature names `PC of zcr`, `PC of energy` etc., and the class of the observation `target`. I.e., for acoustic features from the 3-sec interval UCLASS dataset: | sess_id | PC of zcr | PC of energy | ... | PC of delta mfcc_13 | target | | :--------------------------- | :-------- | :----------- | :-- | :------------------ | :----- | | M_1106_25y0m_1_49.5_52.5.wav | -2.234535 | -3.231726 | ... | 3.291570 | 0 | | M_1106_25y0m_1_51.0_54.0.wav | 1.631374 | 0.764057 | ... | 3.866490 | 0 | | M_1106_25y0m_1_57.0_60.0.wav | 2.248585 | 2.624673 | ... | 5.415814 | 0 | Additionally, the language model features are available under any `.csv` with `_whisper_` in the handle. Similarly, The wav2vec features are available in any `.csv` with `_wav2vec_` handle. ### A note on language model features The `.csv` files for the language model features do not include the acoustic features nor the `target` column. Therefore, the features need to be concatenated if using both Acoustic and Language features. This is given in the worked example at the bottom. ## Labels The class of an interval/event is available under the labels section. This is available for all UCLASS data. For each data subset, a `.csv` is available. This contains the onsets and offsets of each segment (either interval or event). The numerical label and the String labels. I.e., for 3-second intervals of the audio file `M_0030_16y4m_1.wav`. | t_min | t_max | labels_numerical | labels_string | | :---- | :---- | :--------------- | :------------ | | 0 | 3 | 5 | PWR + Block | | 1.5 | 4.5 | 4 | Block | | 3 | 6 | 0 | Fluent | Note, the labels files contain some instances of multi-stuttering intervals. For the current paper, all instances were dropped. Future analyses could utilise this multi-label information. For KSoF labels, please request access from Bayerl et al. at [https://zenodo.org/record/6801844#.Y9PWmXDP1D8](https://zenodo.org/record/6801844#.Y9PWmXDP1D8). Specifically, request the splits used in Schuller et al. ([2022](https://dl.acm.org/doi/abs/10.1145/3503161.3551591?casa_token=VSUAYPZBnz8AAAAA:6lOtEupPrJHRxo607I602KGZs6ySo03bpK956oQuTsx07_v_GtxGXRWBJ5ajACjRSkN-Ag33VaQL)) ## Metrics A `.csv` file containing the primary metrics for each and every model is available under `Files/OSF Storage/metrics/full_class_long.csv`. The primary metrics being: 1. Precision, recall and F1-score for each class of speech 2. Weighted and unweighted precision, recall and F1-Score 3. Accuracy ## Models The saved instances for each model reported in the paper is available under `Files/OSF Storage/models/`. Next, model files can be differentiated by their file handle. `.h5` files are TensorFlow instances of a multilayer perceptron. `.joblib` are scikit-learn instances of a Gaussian support. ## A note on audio While the audio files are not directly available in this storage, the KSoF audio files are available from the authors at [https://zenodo.org/record/6801844#.Y9PWmXDP1D8](https://zenodo.org/record/6801844#.Y9PWmXDP1D8) and the UCLASS audio files are available at [https://www.uclass.psychol.ucl.ac.uk/](https://www.uclass.psychol.ucl.ac.uk/). ## Worked example Using the materials available from this storage, one can reproduce the results from the original paper. This worked example will: 1. Load the acoustic and language features as well as the target vector 2. Load in the G-SVM trained on the 3-sec interval subset 3. Concatenate the acoustic and language features 4. Input the test features to the G-SVM 5. Output the results of the G-SVM's predictions on the test set. It is assumed that the necessary Python modules are installed and the following files have been downloaded and are in the current working directory: `Files/OSF Storage/features/uclass/Principal Components of Acoustic Features from 3-sec Intervals/test_pc_classic_features.csv` `Files/OSF Storage/features/uclass/whisper/3sec_scaled_whisper_test.csv` `Files/OSF Storage/models/acoustic and language/rbf_svm_uclass_3sec_interval_pc_classic_whisper.joblib` ```python # Import Modules import pandas as pd import numpy as np from sklearn.svm import SVC from sklearn.metrics import classification_report, confusion_matrix, roc_curve, auc, roc_auc_score, mean_squared_error from joblib import dump, load # Load in data ## Classic Acoustic classic_test_feat_df = pd.read_csv('test_pc_classic_feat.csv') ## Classic Acoustic wav2vec_test_feat_df = pd.read_csv('test_wav2vec_pcTime_feat.csv') ## Language asr_test_feat_df = pd.read_csv('test_scaled_whisper_feat.csv') ## Concatenate acoustic and language features test_feat_df = pd.concat([classic_test_feat_df.iloc[:, 1:-1], wav2vec_test_feat_df, asr_test_feat_df], axis=1) # note that the first and final columns of the acoustic dataframe are excluded as these are the sess_id and target columns, respectively ## convert to numpy arrays X_test = test_feat_df.to_numpy() y_test = classic_test_feat_df['target'].to_frame(name='target') target_names = ['Fluent','Prolongation', 'PWR', 'WWR', 'Block'] # Load in SVM instance svm = load('rbf_svm_ulcass_3sec_interval.joblib') # Run SVM on test data y_pred = svm.predict(X_test) ## get classification results report = classification_report(y_test, y_pred) print('Classification report \n', report) ```
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.