Main content
SpokenJava
Date created: | Last Updated:
: DOI | ARK
Creating DOI. Please wait...
Category: Data
Description: A dataset for investigating automatic speech recognition in the domain of spoken programming languages.
This dataset includes:
-
Transcriptions of programmers speaking single or a few lines of Java code along with the associated actual programming statements.
-
Single line comments extracted from the CodexGlue dataset CodeXGLue. The CodexGlue dataset is derived from the curated CodeSearchnet dataset which was used for a code summarization task.
-
A 4-gram word level mixture language model created by…
Files
Files can now be accessed and managed under the Files tab.
Citation
Recent Activity
Unable to retrieve logs at this time. Please refresh the page or contact support@osf.io if the problem persists.