Main content



Loading wiki pages...

Wiki Version:
**Draper VDISC Dataset - Vulnerability Detection in Source Code** The dataset consists of the source code of 1.27 million functions mined from open source software, labeled by static analysis for potential vulnerabilities. For more details on the dataset and benchmark results, see The data is provided in three HDF5 files corresponding to an 80:10:10 train/validate/test split, matching the splits used in our paper. The combined file size is roughly 1 GB. Each function's raw source code, starting from the function name, is stored as a variable-length UTF-8 string. Five binary 'vulnerability' labels are provided for each function, corresponding to the four most common CWEs in our data plus all others: CWE-120 (3.7% of functions) CWE-119 (1.9% of functions) CWE-469 (0.95% of functions) CWE-476 (0.21% of functions) CWE-other (2.7% of functions) Functions may have more than one detected CWE each. **Please cite our paper if you use this dataset in a publication:** This project was sponsored by the Air Force Research Laboratory (AFRL) as part of the DARPA MUSE ( program. About Draper ( - Draper is an independent, not-for-profit corporation, which means its primary commitment is to the success of customers' missions rather than to shareholders. For either government or private sector customers, Draper leverages its deep experience and innovative thinking to be an effective engineering research and development partner, designing solutions or objectively evaluating the ideas or products of others. Draper will partner with other organizations — from large for-profit prime contractors, to government agencies, to university researchers — in a variety of capacities. Services Draper provides range from concept development through delivered solution and lifecycle support. Draper's multidisciplinary teams of engineers and scientists can deliver useful solutions to even the most critical problems.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.