Automatic Speech Recognition (ASR) in medical contexts has the potential to save time, cut costs, increase report accuracy, and reduce physician burnout. However, the health care industry has been slower to adopt this technology, in part due to the importance of avoiding medically-relevant transcription mistakes.
We collected a benchmark of 18 clinician preferences on 149 realistic medical sentences called the Clinician Transcript Preference benchmark (CTP), to evaluate an ASR metric proposed by us, Clinical BERTScore (CBERTScore), that penalizes clinically-relevant mistakes more than others, along with various other standard metrics (WER, BLUE, METEOR, etc). The CBERTScore matches more closely with clinician preferences indicated in this survey, and we release this data as a benchmark option for the community to further develop clinically-aware ASR metrics.