Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
**Voting data from U.S. elections, showing how each voter voted on multiple issues** This is a collection of data files containing vote information from U.S. General and Primary Elections from 14 counties in Colorado and 2 counties in California. For each voter, it contains their votes for several different offices (e.g. U.S. president, representatives, state senators and assembly members, judges, school board members) and ballot propositions (yes/no). For some offices, voters are allowed to vote for several candidates. The dataset is intended for use in research on collective decision making with multiple issues. The dataset was collected for the experiments performed in the paper > Nikhil Chandak, Shashwat Goel, and Dominik Peters. "Proportional Aggregation of Preferences for Sequential Decision Making." arXiv:[2306.14858](https://arxiv.org/abs/2306.14858) (2023). The data was collected from the public web pages of election officers of the different counties. Counties that provided disaggregated Cast Vote Record (CVR) data were included. The collected raw files are available in `raw-cvr-files.zip` (135 MB, unzips to 1.85 GB), where Excel files have been converted to CSV files. For each county and each election for which CVR data was available, we parsed the votes and separated them into different ballot types: since some offices are local, different voters will receive ballot papers with different issues on them. For each ballot type, we generated a JSON file which lists all applicable issues and all available alternatives, as well as all the votes. An example file name is `colorado_elpaso_2022_general_14.json`, indicating the county (El Paso County, Colorado), the election (2022 General election), and an ID of the ballot type (14). A small number of votes were redacted by the election officers for various reasons such as privacy, and those votes are not mentioned or included in the JSON files. Here is an excerpt of that file: ```json { "issues": [ { "name": "United States Senator (Vote For=1)", "alternatives": [ "Michael Bennet (DEM)", "Joe O'Dea (REP)", "T.J. Cole (UNI)", "Brian Peotter (LBR)", ... ] }, { "name": "Representative to the 118th United States Congress - District 5 (Vote For=1)", "alternatives": [ "David Torres (DEM)", "Doug Lamborn (REP)", ... ] }, ... { "name": "Amendment D (Constitutional) (Vote For=1)", "alternatives": [ "Yes/For", "No/Against" ] }, ... ], "ballots": [ {"count": 34, "ballot": [[1], [1], [0], [0], [1], ... }, ... ] } ``` The first entry of the `ballots` list is to be understood as follows: According to the `count` field, there were 34 voters who submitted identical votes. These votes are specified in the `ballot` field, which is a list of the same length as the `issues` list. Going through the issues in the same order in which they appear in the `issues` list, these 34 voters in the first issue (United States Senator) voted for option `1`, which is `Joe O'Dea (REP)`, the number `1` referring to the index of the chosen alternative in the list `alternatives` within the first issue (using 0-indexing). Note that we can see votes like `[1]`, but it is also possible that a voter abstained on an issue `[]` or voted for several alternatives `[0,3]`. The script `cvr-to-json.py` was used to translate the CSV files to JSON files. Not all obtained CSV files were successfully parsed.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.