Main content



Loading wiki pages...

Wiki Version:
DATA TREATMENT PIPELINE This pipeline implements utterance segmentation, speaker diarization, automatic transcription, forced alignment, and acoustic analyses. 1. SPLIT BY UTTERANCE Original script written by Mietta Lennes (25.1.2002) input = long .wav sound file output = textgrid with annotated utterance tier praat --run ~/utt_seg.praat 0 0 0.150 59 19 0 20 0 ~/outputdir/ ~/inputdir/ 2. SPEAKER DIARIZATION See for original code and more details. input = long .wav sound file output = table with diarization time stamps python ~/ ~/inputdir/ 3. ADD DIARIZATION INTERVALS TO TEXTGRID Original script written by Author1 (1.27.2024) input = textgrid with annotated utterance tier (output from step 1) + table with diarization time stamps (output from step 2) output = textgrid with annotated utterance tier and annotated diarization tiers praat --run ~/dia_to_existing_textgrid.praat ~/inputdir/ 4. COMBINE DIARIZATION AND UTTERANCE TIERS Original script written by Author1 (1.27.2024), toNonOverlappingIntervals.proc from input = textgrid with annotated utterance tier and annotated diarization tiers (output from step 3), toNonOverlappingIntervals.proc output = textgrid with single annotated utterance/diarization tier praat --run ~/combine_utt_dia_tiers.praat ~/inputdir/ 5. SPLIT TEXTGRIDS & WAVS BY INTERVALS (ANNOTATED BY UTTERANCE AND DIARIZATION) Original script written by Mietta Lenes (8.3.2002), modified by Danielle Daidone (4.27.2019), then modified by Author1 (1.27.2024). input = long .wav sound file + textgrid with single annotated utterance/diarization tier (output from step 4) output = split textgrids and corresponding .wav sound files praat --run ~/utterance_split_onetier.praat ~/inputdir/ ~/outputdir/ "_" 6. TRANSCRIPTION See for original code and more details. input = split .wav files for target speaker (output from step 5) output = table with transcripts for each split utterance/speaker turn python ~/ ~/inputdir/ 7. CREATE NEW TEXTGRID WITH ANNOTATION BY UTTERANCE SEGMENTATION AND DIARIZATION WITH TRANSCRIPTION Original script written by Author1 (2.4.2024) input = table with transcripts for each split utterance/speaker turn (output from step 6) + long .wav sound file output = long .wav sound file + long textgrid with target speech annotated (by utterance segmentation and diarization) and transcribed praat --run ~/dia_trans_to_textgrid_EN.praat ~/inputdir/ 8. SPLIT NEW TEXTGRIDS & WAVS BY INTERVAL Original script written by Mietta Lenes (8.3.2002), modified by Danielle Daidone (4.27.2019), then modified by Author1 (1.27.2024). input = long .wav sound file + long textgrid with target speech annotated (by utterance segmentation and diarization) and transcribed (output from step 7) output = final split textgrids and corresponding split .wav sound files praat --run ~/utterance_split_onetier_2_EN.praat ~/inputdir/ ~/outputdir/ "_" 9. FORCE ALIGN See for original code and more details. input = final split textgrids and corresponding .wav sound files (output from step 8) output = aligned split textgrids and .wav sound files mfa validate ~/inputdir/ english_us_arpa english_us_arpa mfa align ~/inputdir/ english_us_arpa english_us_arpa ~/inputdir/aligned/ --no_textgrid_cleanup --clean 10. ACOUSTIC ANALYSES: PRAATSAUCE See for original scripts and more details. input = aligned split .wav sound files and textgrids (output from step 9) output = acoustic measures for each vowel (spectral_measures.txt) praat --run ~/shellSauce.praat ~/input.wavdir/ ~/input.textgriddir/ ~/outputdir/ filename_spectral_measures.txt 1 1 2 "^$|^\s+$|-|!|B$|CH$|D$|DH$|DX$|EL$|EM$|EN$|F$|G$|HH$|H$|JH$|K$|L$|M$|N$|NX$|NG$|P$|Q$|R$|S$|SH$|T$|TH$|V$|W$|WH$|Y$|Z$|ZH$|spn$|sp$|sil$" 0 "" "_" 0 "n equidistant points" 9 1 1 1 1 0.05 0.5 6000 0.005 0 20 320 0 0 5 50 1 500 1500 2500 0 0 0 11. ACOUSTIC ANALYSES: PITCH TRACKING, JITTER, SHIMMER, UTT POS Original script written by Author1 (2.29.2024) input = aligned split .wav sound files and textgrids (output from step 9) output = acoustic measures for each vowel (pitchtrack_jitter_shimmer.txt) praat --run ~/extract_pitch_jitter_shimmer_EN.praat ~/inputdir/ ~/outputdir/ 12. RUN DATA ANLAYSIS R SCRIPTS: PRELIMS Original script from CITATION, adjusted to creak by Author1 (2023-2024). input = acoustic measures for each vowel (spectral_measures.txt, pitchtrack_jitter_shimmer.txt) (outputs from step 10 and 11) output = data frames with socio and ling variables (PT_full.csv, PT.csv, PT_clean.csv, PS_long.csv, PS.csv) Rscript creak_script_prelim.R 13. RUN DATA ANLAYSIS R SCRIPTS: CLEANING1 Original script from CITATION, adjusted to creak by Author1 (2023-2024). input = data frames with socio and ling variables (PS.csv) (output from step 12) output = data frames after round 1 cleaning and vmeans calculated (PS_int.csv, PS_final.csv) Rscript creak_script_cleaning1.R 14. RUN DATA ANLAYSIS R SCRIPTS: PLOTS Original script from CITATION, adjusted to creak by Author1 (2023-2024). input = data frames after round 1 cleaning and vmeans calculated (PS_final.csv) (output from step 13) output = plotted results (.bmp) + data frames after round 2 cleaning (PS_stats.csv, PS_cleanH1H2.csv, PS_cleanCPP.csv, PS_cleanHNR05.csv) Rscript creak_script_plots.R 15. RUN DATA ANLAYSIS R SCRIPTS: STATS1 Original script from CITATION, adjusted to creak by Author1 (2023-2024). input = data frames after round 2 cleaning (PS_stats.csv, PS_cleanH1H2.csv, PS_cleanCPP.csv, PS_cleanHNR05.csv) (output from step 14) output = data frames after round 3 cleaning (PS_stats.csv, PS_cleanH1H2.csv, PS_cleanCPP.csv, PS_cleanHNR05.csv) + lmer model summaries and visualizations Rscript creak_script_stats1.R 16. RUN DATA ANLAYSIS R SCRIPTS: STATS2 Original script written by Author1 & Author2 (8.11.2024). input = data frames after round 3 cleaning (PS_stats.csv, PS_cleanH1H2.csv, PS_cleanCPP.csv, PS_cleanHNR05.csv) (output from step 15) output = brms model summaries and visualizations Rscript creak_script_stats2.R 17. RUN DATA ANLAYSIS R SCRIPTS: RERUN PLOTS Original script from CITATION, adjusted to creak by Author1 (2023-2024). input = data frames after round 3 cleaning (PS_stats.csv, PS_cleanH1H2.csv, PS_cleanCPP.csv, PS_cleanHNR05.csv) (output from step 15) output = new plotted results from cleaned final data frames (.bmp) Rscript creak_script_plotsrerun.R
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.