The ManDi Corpus is a spoken corpus of regional Mandarin dialects and Standard Mandarin. The corpus currently contains a total of 357 recordings from 36 speakers of six Mandarin dialects.
The speakers recorded production of monosyllabic words, disyllabic words, short sentences, a short passage *North Wind and the Sun* and a Chinese modern poem *Wo Chun*, in Standard Mandarin and their own regional dialect--one of six regional Mandarin dialects, i.e. Beijing, Chengdu, Jinan, Taiyuan, Wuhan, and Xi’an Mandarin.
The corpus was collected remotely using participant-controlled smartphone recording apps. Word- and phone-level alignments were generated using Praat and the Montreal Forced Aligner.
The paper "*The ManDi Corpus: A Spoken Corpus of Mandarin Regional Dialects*" has been submitted to the LREC 2022 conference and is now under review. We attached a preprint version of the paper here in the project.