Main content



Loading wiki pages...

Wiki Version:
**KLI Workshop May 17 - 20: The forgotten art of exploratory data analysis** **Teachers:** Anna Szabelska, Olivier Dujols, and Hans IJzerman **Times and Dates:** 9am - 12pm on May 17, 18, 19, 20, and 21, 2021 **Description:** Most of social psychology takes a confirmatory approach to research. And yet, “science does not begin with a tidy question” (Tukey, 1980). But how do we get from the messy social world to tidy questions? In this workshop, we will familiarize you with exploratory research and show why exploratory research is a necessary step before confirmatory approaches. We will highlight specific examples of exploratory research like data profiling. We will also discuss the importance of visualisations and compare various types of regression with machine learning tree methods. If nothing else, you will pick up some of the basics of R if you follow this workshop, allowing further self-learning afterwards (we will provide a list of useful resources at the end of our workshop). All information for the workshop can be accessed via our OSF Page (all exercises and slides are here; note that the OSF page will be updated until the day of the workshop. Right before the workshop we will make the page public). **Content:** - We will go through basic statistics (much of this you will know already, but going through the basics will help contextualize our later information). - To prepare exploratory research, we will run through basic R for descriptive statistics (we will try to convince you why it is more versatile than point-and-click software and show you how to use packages and how to use the help section). We will go through the research cycle and point to why exploratory research is vital for a mature research workflow. - We then finally get to the basics of exploration and show how to go from descriptive statistics to through inferential statistics. - We will dedicate at least two days to machine learning and explaining how to build machine learning models. This will culminate in each student building their own machine learning model based on the dataset of their choice, and have the model ready for making predictions. - If time permits, we will show how to rely on abductive inference, meaning that we will generate formal predictions from data, in a way that we sharpen theoretical principles generated before seeing the data. **Literature/preparation (if you do not know this yet and/or if you do not have the software yet):** - Please watch [this quick (15 minutes) introduction to R as a programming language, to R as software, and to RStudio][1]. This video will also guide you through downloading and installing R and RStudio. - Please download [R][2] and [RStudio][3]. - Please complete [this primer][4]. - If your schedule permits, you can go through [our exploratory data analysis tutorial][5]. This tutorial forms the basis for this workshop. IMPORTANT: Before the workshop, please identify and download a dataset you would like to work on (you can work on a project in a group if you’d prefer). If you don’t know one that you would like to work on, a good resource to find publicly available datasets is [Cameron Brick’s Google Sheet][6]. From the dataset you will choose, we will not be relying on more than 20 predictors for the analyses. Also, please make sure that of those 20 predictors and your dependent variable, your data is tidy, so that: 1. Every column is a variable 2. Every row is an observation 3. Every cell is a single value [1]: [2]: [3]: [4]: [5]: [6]: