Main content
Synthetic Data in Communication Sciences and Disorders: Promoting an Open, Reproducible, and Cumulative Science
Date created: | Last Updated:
: DOI | ARK
Creating DOI. Please wait...
Category: Project
Description: Reproducibility is a core principle of science and access to a study’s data is essential to reproduce its findings. However, data sharing is uncommon in the field of Communication Sciences and Disorders (CSD), often due to concerns related to privacy and disclosure risks. Synthetic data offers a potential solution to this barrier by generating artificial datasets that do not represent real individuals yet retain statistical properties and relationships from the original data. This study evaluates the performance of synthetic data generation using open data from previously published studies across the American Speech-Language-Hearing Association (ASHA) ‘Big Nine’ domains. Findings suggest that synthetic data can effectively maintain statistical properties and relationships across a wide range of data commonly seen in the field of CSD. While some studies with fewer observations than recommended (i.e., n<130) showed lower agreement and greater variability in p-values and effect size estimates, this was not consistently appreciated. Therefore, researchers who use synthetic data should assess its stability in preserving their results. This study concludes with a general framework on sharing open data to facilitate computational reproducibility and foster a cumulative science in the field of CSD.