Main content



Loading wiki pages...

Wiki Version:
# QueryVis SIGMOD 2020 User Study This permanent [**OSF wiki page**]( contains a detailed description of the full experimental pipeline (pre-registration, data preparation, data analysis and result presentation) of the user study reported in SIGMOD 2020 paper #637 titled "QueryVis: Logic-based Diagrams help Users Understand Complicated SQL Queries Faster" as submitted to [**ACM SIGMOD 2021 Reproducibility**]( In particular, the page contains all necessary code to reproduce the data analysis using the experimental data gathered during the user study, as well as the code and documents deployed for the study on [**Amazon Mechanical Turk** (AMT)]( All our files can be viewed by clicking the [**files tab**]( in the top menu. Please notice that we [**preregistered**]( our study and data analysis scripts before we began collecting data. ## Research Paper The official and full versions of the paper are available via: - [Official version]( in the ACM Digital Library - [Full version on OSF]( - [Full version as arXiv 2004.11375]( When citing the work, we suggest the [bib file from DBLP]( ## User Study Documents - [`Qualification Test Questions.pdf`]( shows the 6 SQL qualification questions. AMT workers needed to get 4/6 correct in order to be able to participate. - [`User Study Tutorial.pdf`]( shows the tutorial presented to each participant before they took our test. - [`User Study Questions.pdf`]( shows the 12 questions we asked all our participants. The questions are shown in the 'BOTH' condition where both the SQL and respective QV diagram were shown. Each participant however sees each question in one of three conditions depending on their assigned sequence number as we explain in section 1.2 in our [pre-registration][8]. ## Experimental data - [`pilot_data_fall_2019/*.csv`]( contains all the per-participant data collected during our pilot user study. - [`full_study_data_winter_2020/*.csv`]( contains all the per-participant data collected for the full user study. ## Pre-registrations Our pre-registrations are available in the Registrations tab and at - [`Spring 2020 AMT Study Preregistration - Executed.ipynb`][11] This file contains the pre-registration of our user study with details of our experimental analysis pipeline and implementation. Please see [our immutable timestamped pre-registration][12] for more details, including alternate versions of the file. - [`Spring 2020 AMT Study Preregistration Amended Stopping Rule - Executed.ipynb`][13] This file contains our second pre-registration of our user study with details of our experimental analysis pipeline and implementation. It includes an amendment to our stopping rule as described at the beginning of the document. Please see [our second immutable timestamped pre-registration][14] for more details, including alternate versions of the file. Our pre-registration notebooks use pilot data for our user study. We registered them before we collected any data analyzed in the full user study. ## Data Analysis Code * **Dependencies**: Our data analysis uses Python version 3.6.9 with following packages needed (a full list is available at ``` altair==3.1.0 altair-data-server==0.1.0 jupyterlab==1.2.4 matplotlib==3.0.2 numpy==1.15.4 pandas==0.23.4 scipy==1.3.0 seaborn==0.9.0 statsmodels==0.9.0 resample==0.21 ``` * **Using our Jupyther notebook**: The data analysis of our full experiment (as specified by our pre-registrations that we registered before we collected and analyzed our experimental data) is available as one notebook: [`Spring 2020 Data Analysis Full Study - Executed.ipynb`]( This is the file that reproduces our figures we discuss in our paper. * **Variant including Group by questions**: By default the notebook follows the analysis in our SIGMOD paper with 9 Multiple Choice Questions (MCQs). To reproduce the figures of the user study as reported in the appendix of our [full preprint version of the paper][16] (thus the analysis of the results on all 12 Multiple Choice Questions, including the 3 Group-By questions) you will need to change `remove_group_by_questions` variable to False. The variable can be found in the **Global Variables Setup** section: ``` # Set this Variable to true to remove the Group By questions (7, 8, 9) from the analysis pipeline remove_group_by_questions = True ``` * **Notebook Versions**: Note that for all our data analysis notebooks we provided several versions. A version with just the code (`.ipynb`) is provided to support easy diffing of the analysis code used in the paper against our preregistered analysis code. A separate version includes all the cell output (`- Executed.ipynb`). For long-term preservation, each of these are also saved as PDFs (`.pdf`, `- Executed.pdf`). ### Setup instructions 1. Download all files. To do that go to [the project on OSF][19], click on **OSF Storage** under the files tab, then click the **Download as zip** button. 2. Unzip the downloaded files and CD into the root of the downloaded directory. 3. To ensure correct package behavior we require the use of Python3. We also recommend the use of pip to install python package dependencies. For more details on using virtual environments check the documentation at * On macOS or Linux, run these three commands *separately* in case there are errors: ``` python3 -m venv env source env/bin/activate which python ``` * On Windows, run these three commands *separately* in case there are errors: ``` python -m venv .\env\Scripts\activate.bat where.exe python ``` Check the path(s) provided by `which python` or `where.exe python` — the first one listed *should* be inside the `env` folder you just created. ### Run instructions 1. Run `jupyter lab`. It should open your browser and let you select any Jupyter Notebook .ipynb file. 2. Run individual cells with ctrl+enter. In the menu you can run all cells and restart the kernel to clear variables. 3. Make sure that jupyter lab has enabled as its kernel the virtual environment you created (you can set the kernel through `Kernel β†’ Change Kernel`). 4. For convenience we re-display at the very end of our notebook a set of the final graphs that we used in our paper. For more details on how to run Jupyter Notebooks check the documentation page at: More detailed instructions and troubleshooting with running the jupyter notebook files please check our [][20] file. ## User Study Implementation (AMT) code Our user study was conducted on Amazon Mechanical Turk (AMT), a popular crowdsourcing platform. Our user study implementation code can be found at on github. The code can be run using Python version 3.6.9 and uses the packages specified in For convenience the code can also be viewed under the [files][17] tab and can be easily downloaded from the [``][18] file. [1]: [2]: [3]: [4]: [5]: [6]: [7]: [8]: [9]: [10]: [11]: [12]: [13]: [14]: [15]: [16]: [17]: [18]: [19]: [20]: