# Data is Personal: Attitudes and Perceptions of Data Visualization in Rural Pennsylvania
**Abstract:** Many of the guidelines that inform how designers create data visualizations originate in studies that unintentionally exclude populations that are most likely to be among the "data poor". In this paper, we explore which factors may drive attention and trust in rural populations with diverse socioeconomic and educational backgrounds - a segment that is largely underrepresented in the data visualization literature. In 42 semi-structured interviews in rural Pennsylvania (USA), we find that a complex set of factors intermix to inform attitudes and perceptions of data visualization - including educational background, political affiliation, and personal experience.
## Stimuli
The following stimuli were found from a variety of sources. **Important Note:** for this study, we consider _source_ as the _source where the visualization was found_, **not** _the source of the data or analysis_.
All images are included in the `stimuli` folder:
- `Diagram A (NIDA).png`
- Source: National Institute on Drug Abuse (NIDA)
- Type: Bar
- Topic: Severity of cannibis vs. Bar other drugs
- `Diagram B (Breitbart).png`
- Source: Breitbart
- Type: Bar / Line
- Topic: Comparison of drug, vehicle, and firearm deaths over time
- `Diagram C (NIDA).gif`
- Source: National Institute on Drug Abuse (NIDA)
- Type: Bar/Pictograph
- Topic: Drug use in youths on the street vs. in households
- `Diagram D (Economist).png`
- Source: The Economist
- Type: Map
- Topic: Overdose deaths involving opioids by county
- `Diagram E (Drexel University).jpg`
- Source: Drexel University
- Type: Heat map
- Topic: Opioid deaths involving opioids by county
- `Diagram F (AGRiMED Industries).jpg`
- Source: AgriMed (Medical Cannibis)
- Type: Infographic
- Topic: Overdose increase from pain medication
- `Diagram G (National Vital Statistics System_CDC).gif`
- Source: National Vital Statistics System (NVSS) - Center for Disease Control (CDC)
- Type: Line
- Topic: Drug overdoses over time
- `Diagram H (NY Times).png`
- Source: The New York Times
- Type: Map
- Topic: Overdose deaths by county (15-to-44 year olds)
- `Diagram I (Business Insider).png`
- Source: Business Insider
- Type: Line
- Topic: Overdose death rates over time
- `Diagram J (Alternatives in Treatment).jpg`
- Source: Alternatives in Treatment (Rehab Center)
- Type: Infographic
- Topic: The science of drug abuse
## Data
Because many participants live in small, rural communities that can make them easily identifiable, we do not include `school district` or `career` or `earnings` information in this data.
**Note:** The data below reflects 42 interviews with participants. An additional participant (PID 23) was excluded from this data because they abandoned the study almost immediately after it started. As a result, the PIDs range from 1-43.
### ranking_data.csv
Contains data reflecting the initial ranking of data visualizations by participants. Also contains basic demographic information - file is structured in such a way to make it easy to import into R.
- `pid` Participant ID
- `graphname` label of the graph
- `g_rank` the _original_ ranking (from most useful (1) to least useful (10)) given to that graph
- `age` age range of the participant. Responses include:
- `18-24`, `25-34`, `35-44`, `45-54`, `55-64`, `65-74`, `75 and Over`
- `ed` the highest level of education attained by the participant. Responses include:
- `Some high school, no diploma`, `High school graduate`, `Some college credit, no degree`, `Professional degree`, `Associate degree`, `Bachelor's degree`, `Postgraduate (Masters)`, `Doctorate degree`
- `fam` professed familiarity with charts and graphs. 1 is low, 7 is high.
- `pol` professed political identity on a scale from `very liberal (1)` to `very conservative (7)`
- `impact` the extent to which participants had been personally impacted by substance abuse (the topic of the graphs). 1 is low, 7 is high.
### ranking_change.csv
Data reflecting how participants changed the rankings of their graph after the source was revealed (only includes participants that made modifications)
- `pid` Participant ID
- `graphname` label of the graph
- `change_rank` how initial rankings of participants changed after the source was revealed.
- **Important Note:** for simplicity during analysis, we recorded changes towards _less useful_ as negative shifts and changes towards _more useful_ as positive shifts. However, if this data is merged with the data in `ranking_data.csv`, you should be warned that in a 1-to-10 ranking, those shifts are actually inverse (positive to less useful, negative to more useful)
- `pol` professed political identity on a scale from `very liberal (1)` to `very conservative (7)`
### codes_by_graph.csv
Data that was derived from our qualitative analysis. For each graph, we give the number of instances in which participant perceptions were labeled with a specific code. The full details of our analysis process are provided in the paper.
- `Code` the list of codes used to label interview transcripts regarding participant perceptions of graphs.
- **Note:** the `Uncategorized` code refers to perceptions that the research team thought were valuable but did not think fit cleanly enough with other codes in this list. These decisions were often not easy to make, so researchers seeking to reanalyze or reinterpret this work may want to take a close look at this list.
- `A - J` each graph has a corresponding column. The letter corresponds to the chart as it is labeled in our materials folder (and labeled in the paper as well).