**An Introduction to Machine Learning in R**
The social sciences focus on explaining human experience and behavior with methods of inferential statistics. This is not always in line with the intention to predict variables and associated outcomes with utmost precision (Yarkoni, 2017). Models and techniques from the field of machine learning were developed to achieve a maximum of predictive performance. Although machine learning models have long been considered black-boxes, recent developments have greatly increased their interpretability. Therefore, researchers in the social sciences show increasing interest in adopting these methods.
In this workshop, we will give a non-technical introduction to the basic concepts and ideas of machine learning. For demonstration, we will use the openly available PhoneStudy dataset on personality prediction from everyday smartphone behavior (Stachl et al., 2020). We will discuss the bias variance tradeoff, overfitting, resampling techniques, model evaluation and variable selection. Participants will be introduced to the Random Forest (Breiman, 2001), a powerful, nonlinear machine learning algorithm that is known for its high predictive performance in many application settings. To demonstrate the strengths of the Random Forest, we will compare its performance with (regularized) linear regression models in a series of benchmark experiments. In addition to performance evaluation, researchers are often interested in the importance and the effects of predictors. We will introduce how to quantify and visualize predictor effects in machine learning models. We will also discuss model fairness.
After this workshop, participants will be able to responsibly apply basic machine learning techniques to their own research.
References:
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
Stachl, C., Au, Q., Schoedel, R., Gosling, S. D., Harari, G. M., Buschek, D., ... & Bühner, M. (2020). Predicting personality from patterns of behavior collected with smartphones. Proceedings of the National Academy of Sciences, 117(30), 17680-17687.
Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100-1122.
----------
Prior to the workshop, please prepare the following things:
- install the *newest* versions of R and Rstudio:
- R from https://cran.r-project.org/
- RStudio Desktop (free) from https://www.rstudio.com/products/rstudio/download/
- download and unpack the **MLWorkshop.zip** folder from this OSF repository
- open the file **MLWorkshop.Rproj**
-> Rstudio should start and automatically construct a project-specific *renv* directory for the packages used in this workshop
- run `renv::restore()` in the Rstudio console
-> Rstudio should automatically download all required packages to the project-specific *renv* library
- Optional but recommended:
- if you do not already have LaTeX installed on your computer, we recommend to install *TinyTeX* by running the following commands in Rstudio:
`install.packages('tinytex')`
`tinytex::install_tinytex()`
- open the file **ml_workshop_slides.Rmd** and click on the *knit* button in Rstudio (the knitting symbol close to the save button)
-> Rstudio will try build the slides used for the workshop; if this runs successfully without errors (warnings are ok), you can be sure that your full setup works
- Now you are ready for the workshop :)
----------
For questions or comments, you can contact the authors at
**florian.pargent@psy.lmu.de**
**clemens.stachl@unisg.ch**
**ramona.schoedel@psy.lmu.de**
----------
**Interesting Papers:**
Breiman, L. (2001). Statistical Modeling: The Two Cultures. Statistical Science, 16(3), 199–215. https://doi.org/10.2307/2676681
Yarkoni, T., & Westfall, J. (2017). Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspectives on Psychological Science, 12(6), 1100–1122. https://doi.org/10.1177/1745691617693393
Pargent, F., & Albert-Von Der Gönna, J. (2018). Predictive Modeling with Psychological Panel Data. Zeitschrift Fur Psychologie / Journal of Psychology, 226(4), 246–258. https://doi.org/10.1027/2151-2604/a000343
Stachl, C., Pargent, F., Hilbert, S., Harari, G. M., Schoedel, R., Vaid, S., Gosling, S. D., & Bühner, M. (2020). Personality Research and Assessment in the Era of Machine Learning. European Journal of Personality, 34(5), 613–631. https://doi.org/10.1002/per.2257
Stachl, C., Au, Q., Schoedel, R., Gosling, S. D., Harari, G. M., Buschek, D., Völkel, S. T., Schuwerk, T., Oldemeier, M., Ullmann, T., Hussmann, H., Bischl, B., & Bühner, M. (2020). Predicting personality from patterns of behavior collected with smartphones. Proceedings of the National Academy of Sciences of the United States of America, 117(30), 17680–17687. https://doi.org/10.1073/pnas.1920484117