# **Revealing Perceptual Proxies with Adversarial Examples**
_In Proceedings of IEEE Conference on Information Visualization (InfoVis)_, _IEEE Transactions on Visualization and Computer Graphics (TVCG)_
**Authors**: Brian D. Ondov, Fumeng Yang, Matthew Kay, Niklas Elmqvist, Steven Franconeri
**Abstract**:
Data visualizations convert numbers into visual marks so that our visual system can extract data from an image instead of raw numbers. Clearly, the visual system does not compute these values as a computer would, as an arithmetic mean or a correlation.Instead, it extracts these patterns using _perceptual proxies_; heuristic shortcuts of the visual marks, such as a center of mass or a shape envelope.
Understanding which proxies people use would lead to more effective visualizations.
We present the results of a series of crowdsourced experiments that measure how powerfully a set of candidate proxies can explain human performance when comparing the mean and range of pairs of data series presented as bar charts.
We generated datasets where the correct answer---the series with the larger arithmetic mean or range---was pitted against an "adversarial" series that should be seen as larger if the viewer uses a particular candidate proxy.
We used both Bayesian logistic regression models and a robust Bayesian mixed-effects linear model to measure how strongly each adversarial proxy could drive viewers to answer incorrectly and whether different individuals may use different proxies.
Finally, we attempt to construct adversarial datasets from scratch, using an iterative crowdsourcing procedure to perform black-box optimization.
**DOI**: TODO
## **bibtex**
```
@article{ondovb2020revealing,
title={Revealing Perceptual Proxies with Adversarial Examples},
author={Ondov, Brian D. and Yang, Fumeng and Kay, Matthew and Elmqvist, Niklas and Franconeri, Steven},
journal={IEEE transactions on visualization and computer graphics},
volume={25},
number={1},
year={2020},
publisher={IEEE}
}
```
## **analysis-final**
This folder contains R scripts for analyzing the data of both experiments.
The `.Rmd` (RMarkdown) files are source code. The `.html` files are compiled outputs for better reading experience. The `models` folder contains all the fitted brms models. The scripts reply on **data-final** and **images-final** folders to read and write files.
- **experimental{1|2}-final.{Rmd|html}**: analysis scripts
- **models**
- **m\_brms\_s\_{mean|range}\_P{0...64}\_{proxy}.rds**: the logistic regression models used to infer titer thresholds for each participant x condition. There are 645 such files in total. The analysis scripts do not necessarily need these model files.
- **m\_brms\_student\_titers\_{mean|range}\_brms.rds**: the linear mixed-effects model to infer the effects of proxy. There are 2 such files in total.
## **data-final**
- **2019**
- **2019-{mean|range}.csv**: the date from 2019 Jardine et al. These two datasets were used to compute correlation matrices to help select proxies.
- **experiment1**
- **df\_trials\_{mean|range}\_exp1.csv**
The data from experiment1 for each task. The columns are pretty such self-explainable: **participantID, task, proxy, trialIndex, titer, response, actual.** The **response** and **actual** columns are participants' answers and the correct answers: if they are the same, the participant selected the correct chart.
- **df\_titer\_thresholds\_from\_small\_brms\_{mean|range}.csv**: The titer thresholds derived from experimental data along with measurement error. We can get these two files by running the **experiment1-final.Rmd** script on the 645 brms models.
- **experiment2**
These files contain all trials for Experiment 2, which performed human-intelligence optimization.
- **exp2-{mean|range}-all.csv**
"PID" refers to the anonymized participant ID and corresponds to the same
participant IDs in the tables for Experiment 1 (as both experiments were run
together with the same participants).
For each trial, the "data" columns refer to the values of d in Algorithm 1.
The "data_pert" columns refer to the perturbed vector d'. Both d and d' were
converted from data space into bar lengths in the "bar" columns. Data space lies
within 0-1, and bar length is in units of chart width (1 being the entire
width). "Pert_loc" refers to whether the perturbed chart (d') was shown to the
participant to the left or the right of the unperturbed chart (d_i). The first
"data" for a given participant (d_0 in Algorithm 1) is either randomly
initialized (in Epoch 1) or the final value (d_k) from the previous epoch. the
value of d_k for the final epoch is listed in a separate row for each thread,
where the value for "epoch" is "[final]".
The files exp2-mean-all.csv and exp2-range-all.csv have trials for all
participants that performed Experiment 2 optimizations. If they started from
a previous participant's result (Epoch > 1), that previous participant will
be listed in "PID_prev". The state of the chart after a participant completed
their trials (that is, after the final adjustment step following their last
trial) is given where "trial" is "[final]". If this final chart was evaluated by
other participants, the fraction of times it was chosen versus a random chart is
given by "wins".
The "name" field lists which named optimization from the paper (M1-M4; R1-R4)
the participant is part of, if any.
- **exp2-{mean|range}.csv** contain
named optimization threads, and are in order by epoch.
- **exp2-random-all-{mean|range}.csv**
The simulated data. We simulated 1000 epoches. Each epoch starts with random initialization and passes to 100 (20 x 5) guessing trials. These simulated data tell how each proxy looks like in random guessing.
Columns:
**iteration**, which epoch
**task**, mean or range
**trialIndex**, which trial, out of 20 x 5
**trials**, the total number of trials in the epoch, always 100
**a{1..7}**, the value of each of the 7 bars
**mean,..,bar_mid**, the values of proxies
**b{1..7}**, the value of each bar for the other chart
**b_mean,...,b_bar_mid**, the values of proxies in the other chart
**pick** which between the two charts was picked
- **meta.csv**
The formatted proxies values in the final charts. These could be computed from the raw experiment 2 data.
Columns:
**proxy**, which proxy
**value**, the value of that proxy
**status**, if it is the final optimized chart or the initialization chart (not used)
**task**, mean or range
**batch**, which epoch
**batch_num**, which epoch (numeric)
## **experiment-final**
This folder contains script to run the experiments. In www/ is content to be hosted by an HTTP server, and in workflow/ are scripts to manage submission, which can be done from any machine with internet access.
Note that the scripts in both these folders assume the site is hosted at:
https://legacydirs.umiacs.umd.edu/~ondovb/mturk/
...and that the results for each participant reside in Amazon S3 Bucket named:
ondovb-mturk
Also note that some scripts require S3 credentials, which have been removed for this archive (marked with "REDACTED").
## **images-final**
The images generated by the analysis scripts