Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
This dataset includes: 1) Transcriptions of programmers speaking single or a few lines of Java code along with the associated actual programming statements. 2) Single line comments extracted from the CodeSearchNet Dataset. The [CodeSearchnet][1] dataset is derived from the curated [CodeXGLue][2] dataset which was used for a code summarization task. 3) A 4-gram word level mixture language model created by mixing a 4-gram [LibriSpeech][3] model, a 4-gram model trained with single line comment dataset from [CodeXGLue][4], and a 4-gram model trained on the SpokenJava transcripts. The language model has a 203K word vocabulary. You read more about the research leading to this dataset in these papers: Nowrin, S. and Vertanen K. [Leveraging Large Pretrained Models for Line-by-Line Spoken Program Recognition][5]. Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing (2024). Nowrin, S. and Vertanen K. [Programming by Voice: Exploring User Preferences and Speaking Styles][6]. Proceedings of the 5th Conference on Conversational User Interfaces (2023). Nowrin, S., Ordóñez, P. and Vertanen K. [Exploring Motor-impaired Programmers' Use of Speech Recognition][7]. Proceedings of the ACM SIGACCESS Conference on Computers and Accessibility (2022). [1]: https://github.com/github/CodeSearchNet [2]: https://github.com/microsoft/CodeXGLUE [3]: https://www.openslr.org/12 [4]: https://github.com/microsoft/CodeXGLUE [5]: https://keithv.com/pub/linebyline/ [6]: https://keithv.com/pub/progvoice/ [7]: https://keithv.com/pub/progspeech/
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.