Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
# Multivariate statistics practical in R 2018/autumn - Syllabus Neptun codes: - PSZM17-105 (MA) - DPSZ16-ISMF-101 (PhD) - DPSZ16-KVAN-105 (PhD) This class is supported by [DataCamp][1], an online learning platform for data science. ![][2] **Lecturer:** Tamás Nagy, PhD **Email:** nagy.tamas@ppk.elte.hu **Course material:** [osf.io/nt8ps/][3] **Time:** Thursday 15:00-16:30 (first occasion: Sep 13) **Place:** Izabella street 46, Room: IZU 402 **Personal consultation**: https://nagytamas_stathelp.youcanbook.me/ ---------- ## Aim of the course The course aims to provide a general knowledge and hands-on experience in data analysis, showing the current best practices in R. After the course, students should be able to collect, organize, and share data. Moreover, students should be able to analyze, visualize data in R, interpret results, and write up data analysis. ## How to successfully complete the course - Midterm exam (30%) - Final exam (30%) - Homework assignments (30%) in datacamp (based on **chapters** finished until deadline. Note: a full course is 4-5 chapters) - Active participation (10%) - Proactive attitude to being able to follow the class ## Grading principles 0-60%: 1 (fail), 61-70%: 2 (pass), 71-80%: 3, 81-90%: 4, 91-100%: 5 ## Course material - Grolemund, G., & Wickham, H. (2017). *R for Data Science.* Retrieved from [http://r4ds.had.co.nz/index.html][4] - Ismay, C., Kim A. Y. (2018). *An Introduction to Statistical and Data Sciences via R.* https://moderndive.com/ - You will have to solve homework through companion online course, made free for the participants of the class. ## Optional material - Wheelan, C. (2013). *Naked Statistics: Stripping the Dread from the Data*. New York: W. W. Norton & Company. ---------- ## [How to ask questions][5] 1. Make your question as simple and concise as possible 2. Introduce the background of your question, but keep only the relevant parts 3. Search and research to check if your question has any answers online. In your question, please include the steps you took to find the answer so far 4. If your question is programming related, try to create a [minimal reproducible example][6] ---------- ## Software requirements - [Latest version of R][7] - [Latest version of R Studio][8] ---------- ## Syllabus ### 0. Introduction to the course (Sep 13) ### 1. Working with data and R (Sep 20) - What is data? - Collecting data - Types of data - How to use basic spreadsheets ([Google sheets][9]) - Preparing a dataset and documentation - Sharing data - Main steps in data analysis Reading: - [Wickham, H. (2014). Tidy data. *Journal of Statistical Software*, 46(10), 1–23. https://doi.org/10.18637/jss.v059.i10][10] - [Data import in r4ds][11] Practice: - Imputing data manually - Creating a sharable dataset - Manual transformations on datasets - Making sense of long/wide/tidy formats Homework: - First two chapters of [Introduction to R][12] course (Intro to basics, Vectors) ### 2. Visualizing and understanding data (Sep 27) - Summary tables - Most common plots, and how to understand them (scatter plot/bin plot, histogram/density plot, time series plot, bar plot, pie/tile chart, area plot/stacked bar chart, box/violin plot, heatmap, connectivity plot, geoplot, surface plot) - Reporting data Practice: - Plotting datasets - Exploring and understanding figures - Writing summaries of visualizations Homework: - [Data Visualization with ggplot2 (Part 1)][16] WHOLE COURSE - [Data Visualization with ggplot2 (Part 2)][17] WHOLE COURSE Recommended: - [Data visualization in r4ds][13] - [Graphics for communication in r4ds][14] - [ggplot2 cheat sheet][15] ### 3. Preparing data for analysis (Okt 4) - Reading and importing datasets from various formats - Cleaning data - How to handle missing data - Exploratory data analysis (distributions, descriptive statistics) - Examining data for anomalies - Transforming data (common transformations, centering, scaling) Homework: - [Data Manipulation in R with dplyr in datacamp][22] (whole course) - [Importing Data in R (Part 1) in datacamp][23] (chapters 2 and 3) Recommended: - Introduction to the tidyverse in datacamp (1st and 3rd chapter) - [Data transformation in r4ds][20] - [Tidy data in r4ds][18] - [Data wrangling cheat sheet][21] Practice: - Cleaning and wrangling a messy dataset - Performing exploratory data analysis - Correcting data using transformations ### 4. Exploratory data analysis (Oct 11) - Exploratory data analysis (distributions, missing values, descriptive statistics, associations) - Examining data for anomalies Reading: - [Exploratory data analysis in r4ds][24] Practice: - Cleaning and wrangling a messy dataset - Performing exploratory data analysis - Correcting data using transformations ### 5. Association of variables (Oct 18) - Examining the association between two variables (covariance, Pearson, Spearman, and Kendall's correlation) - Checking and correcting for normality - Comparing correlations - Association between categorical variables (Chi-square test) Reading: - Field, A., Miles, J., & Field, Z. (2012). Chapter 6. - Correlation. In *Discovering statistics Using R* (pp. 205–244). Practice: - Performing and plotting correlations - Checking normality - Reporting correlation ### 6. Linear regression (Oct 25) - Controlling variables in statistical models - Linear regression - Making predictions - Diagnostics and reliability - How to build a statistical model? - Model selection approaches - Interactions - Using regression in longitudinal data Reading: - Field, A., Miles, J., & Field, Z. (2012). Chapter 7. - Regression. In *Discovering statistics Using R* (pp. 245–311). Practice: - Building linear models - Model diagnostics - Predicting the outcome variable based on the predictor(s) - Interpreting and writing-up results ### Spring break (Nov 1) ### 7. Midterm exam (Nov 8) - Classroom online exam - Will consist multiple choice questions, numeric questions, short essay questions, and practical tasks, where datasets has to be submitted, and statistical analysis has to be performed - Data transformation and analysis will be carried out on datasets custom made for each student, therefore answers will be different for everyone - Everything can be used, except for communication with others - Plagiarism and cheating will result a failed exam ### 8. Further variants of regression (Nov 15) - Logistic regression for binary data - Poisson and negative binomial regression for count data - A glimpse into non-linear regression and generalized additive models Reading: - Field, A., Miles, J., & Field, Z. (2012). Chapter 8. - Logistic regression. In *Discovering statistics Using R* (pp. 312–358). - [Perform Poisson regression in R][25] Practice: - Performing binomial logistic regression - Performing Poisson regression ### 9. Comparing means (Nov 22) - T-test - ANOVA - Correcting for multiple comparisons - Non-parametric alternatives Reading: - Field, A., Miles, J., & Field, Z. (2012). Chapter 9. - Comparing two means. In *Discovering statistics Using R* (pp. 359–397). - Field, A., Miles, J., & Field, Z. (2012). Chapter 10. - Comparing several means: ANOVA (GLM 1). In *Discovering statistics Using R* (pp. 398–461). Practice: - Performing a t-test and Welch's d-test - Performing ANOVA - Performing post-hoc tests and contrasts - Plotting group means ### 10. Factorial ANOVA, ANCOVA, repeated-measures ANOVA (Nov 29) - Interactions - ANCOVA - Repeated-measures ANOVA - Sphericity - Correcting for sphericity Reading: - Field, A., Miles, J., & Field, Z. (2012). Chapter 11. - Analysis of covariance, ANCOVA (GLM 2). In *Discovering statistics Using R* (pp. 462–497). - Field, A., Miles, J., & Field, Z. (2012). Chapter 12. - Factorial ANOVA (GLM 3). In *Discovering statistics Using R* (pp. 498–548). - Field, A., Miles, J., & Field, Z. (2012). Chapter 13. - Repeated-measures designs (GLM 4). In *Discovering statistics Using R* (pp. 549–603). Practice: - Performing ANCOVA - Performing repeated-measures ANOVA, correcting for sphericity - Plotting interactions - Writing-up results ### 11. Dimension reduction (Dec 6) - Principal component analysis - Exploratory factor analysis - Selecting the number of components/factors Reading: - Field, A., Miles, J., & Field, Z. (2012). Chapter 17. - Exploratory factor analysis. In *Discovering statistics Using R* (pp. 749–811). Practice: - Performing PCA - Performing EFA ### 12. Final exam (Dec 13) - Questions and tasks will cover the **whole semester** - Classroom online exam - Will consist multiple choice questions, numeric questions, short essay questions, and practical tasks, where datasets has to be submitted, and statistical analysis has to be performed - Data transformation and analysis will be carried out on datasets custom made for each student, therefore answers will be different for everyone - Everything can be used, except for communication with others - Plagiarism and cheating will result a failed exam ---------- [1]: https://www.datacamp.com [2]: https://osf.io/4xc9v/download [3]: https://osf.io/nt8ps [4]: http://r4ds.had.co.nz/index.html [5]: https://stackoverflow.com/help/how-to-ask [6]: https://stackoverflow.com/help/mcve [7]: https://cran.r-project.org/bin/windows/base/ [8]: https://www.rstudio.com/products/rstudio/download/#download [9]: https://www.google.com/sheets/about/ [10]: https://www.jstatsoft.org/article/view/v059i10/v59i10.pdf [11]: http://r4ds.had.co.nz/data-import.html [12]: https://www.datacamp.com/courses/free-introduction-to-r [13]: http://r4ds.had.co.nz/data-visualisation.html [14]: http://r4ds.had.co.nz/graphics-for-communication.html [15]: https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf [16]: https://www.datacamp.com/courses/data-visualization-with-ggplot2-1 [17]: https://www.datacamp.com/courses/data-visualization-with-ggplot2-2 [18]: http://r4ds.had.co.nz/tidy-data.html [19]: http://r4ds.had.co.nz/exploratory-data-analysis.html [20]: http://r4ds.had.co.nz/transform.html [21]: https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf [22]: https://www.datacamp.com/courses/dplyr-data-manipulation-r-tutorial [23]: http://Importing%20Data%20in%20R%20%28Part%201%29 [24]: http://r4ds.had.co.nz/exploratory-data-analysis.html [25]: https://stats.idre.ucla.edu/r/dae/poisson-regression/
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.