Main content



Loading wiki pages...

Wiki Version:
We present here a description of the ACLEW DAS reliability protocols, a tutorial to teach ACLEW coders the basics of reliability annotating, and an automated testing method for file comparisons. The reliability protocol called for a second, naïve annotator to apply the full ACLEW DAS to a selected segment. <br> **Part 1: Annotation Assignments** This spreadsheet provides detailed information regarding the which lab conducted the annotations on which corpus. Additionally, for each participant there is a complete breakdown of the which annotators completed a given segment. [Annotation Tracker (EN): Lab annotation assignments and annotator tracking sheet]( Note: This spreadsheet has two tabs. The second tab contains a single row for each file and each lab lists the annotator's initials that completed a given segment in the corresponding columns. <br> **Part 2: Reliability Segment Sampling Criteria** This section explains the process used to identify the reliability annotation segments. The selection criteria are listed by the original sampling technique. **Random Sampling Dataset** The random clip selection for reliability was done as follows: 1. for each clip type, for each recording, randomly select an annotated clip in which there is speech of some sort (as indicated by the existing, first-pass annotations), 2. calculate the amount of speech in the first minute of the clip and in the second minute of the clip, 3. calculate a difference score in speech volubility between the first and second minute, 4. within each corpus, rank the clips by their difference scores, selecting the top half (i.e., first-minute speech-dense) for second-pass annotation of minute 1 and the bottom half (i.e., second-minute speech-dense) for second-pass annotation of minute 2. **Note:** Because the clip durations were different for both random and high volubility clips in the TSE and YEL corpora, these steps were tweaked as follows: For the random clips, only consider the first 2 minutes of each to do the first- and second-minute ranking and selection; for the high-volubility clips, only use the turn-taking peak clips and use the whole clip (originally 1 minute duration) as the one-minute for reliability annotation. <br> **Part 3: Reliability Annotation Tutorial** This tutorial provides a step-by-step instructions on setting up the reliability files in ELAN. [Tutorial 1 (EN): Setting up a reliability ELAN file]( <br> **Part 4: Calculate Reliability Scores** Reliability scores are calculated using a combination of tools in Python and R [here]( For speaker type reliability (e.g., target child, female adult, etc.), we use three types of scores: Identification Error Rate, Precision & Recall, and kappa. For vocal maturity and addressee reliability (e.g., "non-canonical" vs. "canonical" and "child-directed" vs. "other-directed") we only use kappa. Details on analytic decisions made for reliability calculation, as well as additional background and history for computing reliability for the ACLEW project can be found in the full reliability report. The newest version of the report can be found [here](, but requires access to the ACLEW GitHub organization. For those without this access, a copy of the report (check filename: may be out of date) is available with the files uploaded to this OSF repository. <br>