During the workshop, there were a number of questions that took the form of, "How do I do X in R" that we could not answer at the time. We said we would put up those examples on this wiki, and here they are!
@[toc](Example Code Snippets)
----------
## Data Filtering ##
The following code shows you how to filter your dataset by a particular variable. The most reliable way to do this is to use the `subset()` function, which takes at least two arguments: (a) a dataset; (b) a logical statement reflecting how the filtering should be conducted. Note that whatever the logical statement returns to be true is *kept*.
Let's say I have a dataset named `dataset` and I wanted to keep only the rows where the participant passed the attention check, stored in a variable called `attention.check` where 1 means they passed. I would use the following code to create a new dataset called `filtered.dataset`. From that point onward, I would only use `filtered.dataset` in descriptives and analyses.
filtered.dataset <- subset(dataset, attention.check==1)
The logical operators that you can use to filter are:
|Meaning|Operator|
|-|-|
|equal to| ==|
|not equal to| !=|
|greater than| > |
|greater than or equal to | >=|
|lesser than| < |
|lesser than or equal to | <= |
|
Also, you can combine two operators with an "&" for "and" or a pipe "|" for "or." For example, let's say that you want your dataset to only involve people for whom they (a) passed the attention check and (b) did not ask for their data to be deleted after debriefing, which means only keeping people who do not have a value of "0" for the variable called `post.consent`. To filter the dataset by both attention check and post-consent, I would use this code:
filtered.dataset <- subset(dataset, attention.check==1&post.consent!=0)
----------
## Confidence Intervals for Regression Slopes ##
Let's say that you've run through line 262 in the "SIPS R Workshop Syntax.R" file, such that you have already run the moderated regression on line 262 and saved it in an object named `moderated.model`. Normally, you would print the output with `summary(moderated.model)`, but it does not give you a confidence interval. To print the 95% confidence intervals for all the regression parameters, wrap the `confint()` function around your moderated regression object:
confint(moderated.model)
----------
## Standardized Regression Slopes ##
You may also want standardized regression slopes. You can use the `lm.beta()` function from the `QuantPsyc` package. If you've never installed `QuantPsyc` before, then run this line of code:
install.packages("QuantPsyc")
library(QuantPsyc)
Once you've installed `QuantPsyc`, then you only have to load it. To load the Quant Psyc package, type `library(QuantPsyc)` when you open R. Now you are ready to get standardized regression slopes with `lm.beta`!
If you have a regression model that has no interaction terms (e.g., the model named `main.effect.model` on line 217 in "SIPS R Workshop Syntax.R"), then you can just wrap `lm.beta` around it. The following syntax will give you standardized slopes for the example "main effect model" created on line 217 in "SIPS R Workshop Syntax.R":
lm.beta(main.effect.model)
However, if you have interaction terms in your model, then you will need to re-run your moderated model with an interaction variable you've created manually. This is similar to the way SPSS requires you to create interaction variables as the products of the relevant predictors. Our goal is to get standardized slopes for the model we've named "moderated.model" in the example syntax (line 262). It involved the predictors `condition` and `c.nfc` (mean-centered Need For Cognition), and we created the `moderated.model` object with the following syntax (line 262 of "SIPS R Workshop Syntax.R"):
moderated.model<-lm(argument.quality~condition*c.nfc, data=dataset)
We told `lm()` that we wanted the main effects and interaction term for `condition` and `c.nfc` by using the asterisk `*` to separate them. However, we are going to create a new variable that represents the interaction of condition and Need for Cognition, `conditionXc.nfc`, and then manually specify the model by adding all the variables:
conditionXc.nfc<-condition*c.nfc
manually.moderated.model<-lm(argument.quality ~ condition+c.nfc+conditionXc.nfc, data=dataset)
If you print the results of `summary(manually.moderated.model)`, then you will get the same results as `summary(moderated.model)` -- if you don't, then something is wrong. However, you are now able to get a correct estimate of the standardized slopes for the interaction term in our model by using `lm.beta`:
lm.beta(manually.moderated.model)
## Interaction Plots with All Continuous Predictors ##
In the workshop, we demonstrated how to create interaction plots with GGPlot in interactions where you have one continuous variable and one categorical variable (see lines 312-316 of "SIPS R Workshop Syntax.R"). It's a little different when you have two continuous variables, though. The following code demonstrates how to estimate the simple slopes of one continuous predictor at +1 SD ("High") and -1 SD ("Low") values of another continuous predictor.
For this example, we will look at how the personality trait dimension, agreeableness, interacts with need for cognition in predicting perceived argument quality. The reason I am running a new model is because I needed an example of an interaction between two continuous variables. First, I need to center the `Agreeableness` variable. Then, I run a regression model where I test an interaction between agreeableness and need for cognition:
dataset <- within( dataset, {
c.agreeableness <- Agreeableness - mean( Agreeableness, na.rm=T )
})
average.model <- lm(argument.quality ~ c.agreeableness*c.nfc, data=dataset)
summary(average.model)
Now, we want to estimate the simple slopes of agreeableness for people who are high or low in need for congition, we need to create two new variables that are centered around +1 SD and -1 SD of Need for Cognition:
dataset <- within( dataset, {
lo.nfc <- c.nfc + sd( c.nfc, na.rm=T )
hi.nfc <- c.nfc - sd( c.nfc, na.rm=T )
})
In a regression model, the main effects are dependent on the zero-values of any variables used in interaction terms. So, we can find the simple slope of agreeableness when need for cognition is low from the main effect of agreeableness when `lo.nfc` is used in the interaction term. Conversely, we find the simple slope of agreeableness when need for cognition is high from the main effect of agreeableness when `hi.nfc` is in the interaction term:
lo.nfc.model <- lm(argument.quality ~ c.agreeableness*lo.nfc, data=dataset)
summary(lo.nfc.model)
hi.nfc.model <- lm(argument.quality ~ c.agreeableness*hi.nfc, data=dataset)
summary(hi.nfc.model)
Finally, we are ready to create our ggplot interaction plot, plotting the slopes for agreeableness in these `lo.nfc.model`, `average.model`, and `hi.nfc.model` model objects.
continuous.interaction.plot <- ggplot(data = dataset, aes(x=jitter(Agreeableness), y=ArgumentQuality))+
geom_point(alpha=.05)+
geom_abline(aes(intercept=coef(lo.nfc.model)[1], slope=coef(lo.nfc.model)['c.agreeableness'], linetype='-1 SD NFC'))+
geom_abline(aes(intercept=coef(average.model)[1], slope=coef(average.model)['c.agreeableness'], linetype='Mean NFC'))+
geom_abline(aes(intercept=coef(hi.nfc.model)[1], slope=coef(hi.nfc.model)['c.agreeableness'], linetype='+1 SD NFC'))+
labs(x = 'Agreeableness', y = 'Perceived Argument Quality')+
scale_linetype_manual(values=c('dotted','dashed','solid'),
breaks=c('-1 SD NFC','Mean NFC','+1 SD NFC'),name='Simple\nSlope')
continuous.interaction.plot