Dan Mirman
29 November 2019
Individual differences provide an additional level of analysis for understanding phenomena
Example: National Youth Survey longitudinal data on tolerance for deviant behavior, exposure to deviant behavior, and gender. Subset of 16 subjects printed in Willett (1997, Table 11.1). (DeviantBehavior
)
Tolerance
was measured by asking whether it is wrong for someone their age to: cheat on tests, purposely destroy property of others, use marijuana, steal something worth less than $5, hit or threaten someone without reason, use alcohol, break into a building or vehicle to steal, sell hard drugs, or steal something worth more than $50.
## SubjID Gender Exposure Age Tolerance
## S0009 : 5 Female:45 Min. :0.810 Min. :11 Min. :1.00
## S0045 : 5 Male :35 1st Qu.:0.922 1st Qu.:12 1st Qu.:1.22
## S0268 : 5 Median :1.145 Median :13 Median :1.50
## S0314 : 5 Mean :1.191 Mean :13 Mean :1.62
## S0442 : 5 3rd Qu.:1.395 3rd Qu.:14 3rd Qu.:1.99
## S0514 : 5 Max. :1.990 Max. :15 Max. :3.46
## (Other):50
m.base <- lmer(Tolerance ~ Age*Gender + (Age | SubjID),
contrasts=list(Gender="contr.sum"), data=DeviantBehavior)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00632735
## (tol = 0.002, component 1)
## Estimate Std..Error t.value p.normal p.normal.star
## (Intercept) -0.12563 0.52444 -0.2395 0.810682
## Age 0.13488 0.04403 3.0634 0.002189 **
## Gender1 0.35552 0.52444 0.6779 0.497834
## Age:Gender1 -0.03255 0.04403 -0.7392 0.459779
Significant increase with age, no significant effects of gender
Exposure
scale is not clear and other effects would be estimated at Exposure=0
, which is not an attested value.
If we center Exposure
, its effects and the effects of other predictors will be easier to interpret.
# full model
m.exp1 <- lmer(Tolerance ~ Age*Gender*Exposure.center + (Age | SubjID),
data=DeviantBehavior, contrasts=list(Gender="contr.sum"))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge with max|grad| = 0.00536071 (tol = 0.002,
## component 1)
## Estimate Std..Error t.value p.normal p.normal.star
## (Intercept) -0.38880 0.43008 -0.904 3.660e-01
## Age 0.15603 0.03492 4.469 7.876e-06 ***
## Gender1 0.65192 0.43008 1.516 1.296e-01
## Exposure.center -4.22668 1.47540 -2.865 4.173e-03 **
## Age:Gender1 -0.05921 0.03492 -1.696 8.992e-02 .
## Age:Exposure.center 0.38532 0.11978 3.217 1.296e-03 **
## Gender1:Exposure.center 3.62705 1.47540 2.458 1.396e-02 *
## Age:Gender1:Exposure.center -0.28576 0.11978 -2.386 1.705e-02 *
The three-way Age-by-Gender-by-Exposure interaction is a relationship among four variables (Tolerance for deviant behavior, Exposure to deviant behavior, Age, and Gender), three of which are continuous variables. This is hard to visualize.
To make it easier to visualize, can split the Exposure
into levels
# a simple median split
DeviantBehavior$ExposureMedSpl <- ifelse(
DeviantBehavior$Exposure >= median(DeviantBehavior$Exposure), "High", "Low")
# plot the interaction
ggplot(DeviantBehavior, aes(Age, Tolerance, color=ExposureMedSpl)) + facet_wrap(~ Gender) +
stat_summary(fun.y=mean, geom="point", size=3) + stat_summary(fun.y=mean, geom="line") +
stat_summary(fun.data=mean_se, geom="errorbar", width=0.2) +
labs(x="Age", y="Tolerance for deviant behavior") +
theme_bw() + legend_positioning(c(0,1))
Adolescents with higher Exposure to deviant behavior tend to have increased Tolerance for deviant behavior as they get older, and this is more true for males than females.
To make it easier to visualize, can split the Exposure
into levels: tertile split
# define break points
b <- quantile(DeviantBehavior$Exposure, probs=seq(0, 1, by=1/3))
# split continuous predictor and provide level labels
DeviantBehavior$Exposure3 <- cut(DeviantBehavior$Exposure,
breaks=b, include.lowest=T,
labels=c("Low", "Medium", "High"))
# make the plot
ggplot(DeviantBehavior, aes(Age, Tolerance, color=Exposure3)) + facet_wrap(~ Gender) +
stat_summary(fun.y=mean, geom="point", size=3) + stat_summary(fun.y=mean, geom="line") +
stat_summary(fun.data=mean_se, geom="errorbar", width=0.2) +
labs(x="Age", y="Tolerance for deviant behavior") +
theme_bw() + legend_positioning(c(0,1))
For such situations, random effects provide a way to quantify individual effect sizes in the context of a model of overall group performance.
A simple example:
Participant A: \(\zeta_{A1} - \zeta_{A0} = 1 - (-1) = 2\)
Participant B: \(\zeta_{B1} - \zeta_{B0} = (-1) - 1 = -2\)
Data (Made-up): Effect of school mental health services on educational achievement (EducMH
)
Condition
= Treatment (students who received mental health services) vs. Control (academically matched group of students who did not receive services)SDQ
= Strengths and Difficulties Questionnaire: a brief behavioral screening for mental health, only available for Treatment group. Lower scores are better (Total difficulties).Math
= Score on standardized math test## ID Condition Year SDQ
## 101 : 6 Control :540 Min. :2009 Min. : 6.6
## 102 : 6 Treatment:540 1st Qu.:2010 1st Qu.:16.2
## 103 : 6 Median :2012 Median :17.9
## 104 : 6 Mean :2012 Mean :17.9
## 105 : 6 3rd Qu.:2013 3rd Qu.:19.6
## 106 : 6 Max. :2014 Max. :33.8
## (Other):1044 NA's :540
## Math
## Min. : 81.7
## 1st Qu.:147.5
## Median :165.1
## Mean :164.8
## 3rd Qu.:181.2
## Max. :259.3
##
Data (Made-up): Effect of school mental health services on educational achievement (EducMH
)
Question 1: Did the school mental health services improve academic achievement? That is, did the two groups differ on math achievement at baseline and over the 6 years of the study?
Question 2: For the treatment group, was individual-level improvement in mental health associated with improvement in math scores?
Question 1: Did the school mental health services improve academic achievement? That is, did the two groups differ on math achievement at baseline and over the 6 years of the study?
# adjust time variable to have a sensible intercept
EducMH$Time <- EducMH$Year - 2009
# fit the models
m.base <- lmer(Math ~ Time + (Time | ID), data=EducMH, REML=F)
m.0 <- lmer(Math ~ Time + Condition + (Time | ID), data=EducMH, REML=F)
m.1 <- lmer(Math ~ Time*Condition + (Time | ID), data=EducMH, REML=F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00387535
## (tol = 0.002, component 1)
## Data: EducMH
## Models:
## m.base: Math ~ Time + (Time | ID)
## m.0: Math ~ Time + Condition + (Time | ID)
## m.1: Math ~ Time * Condition + (Time | ID)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## m.base 6 7847 7876 -3917 7835
## m.0 7 7848 7883 -3917 7834 0.15 1 0.6947
## m.1 8 7843 7883 -3914 7827 7.02 1 0.0081 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Estimate Std..Error t.value p.normal p.normal.star
## (Intercept) 149.297 2.0297 73.5547 0.000000 ***
## Time 5.216 0.3737 13.9560 0.000000 ***
## ConditionTreatment 1.337 2.8705 0.4658 0.641352
## Time:ConditionTreatment 1.414 0.5285 2.6755 0.007461 **
There was no group difference at baseline, but there was a group difference on slope. That is, math achievement in the two groups started out the same, but increased more quickly in the Treatment group.
Question 2: For the treatment group, was individual-level improvement in mental health associated with improvement in math scores?
First step: For the treatment group, plot change in the SDQ over time showing both group mean and individual variability
ggplot(subset(EducMH, Condition == "Treatment"), aes(Year, SDQ)) +
geom_line(aes(group=ID), color="gray") +
stat_summary(fun.y=mean, geom="line") +
stat_summary(fun.data=mean_se, geom="errorbar", width=0.3) +
theme_bw(base_size=12)
Within the treatment group, it looks like there is lots of variability in response to the mental health services. Some people responded really well (big decreases in difficulties on SDQ), some people didn’t respond well (increased difficulties according to SDQ).
We want to know whether this variability is associated with variability in improved math achievement.
Question 2: For the treatment group, was individual-level improvement in mental health associated with improvement in math scores?
Analysis strategy:
Question 2: For the treatment group, was individual-level improvement in mental health associated with improvement in math scores?
Analysis strategy:
Question 2: For the treatment group, was individual-level improvement in mental health associated with improvement in math scores?
Analysis strategy:
#psy811::get_ranef() will extract the named random effect and clean them up a bit
re.math <- get_ranef(m.math, "ID")
re.sdq <- get_ranef(m.sdq, "ID")
#merge() will combine those into one data frame, but needs some help because the variable names are all the same
re <- merge(re.math, re.sdq, by="ID", suffixes = c(".math", ".sdq"))
summary(re)
## ID Intercept.math Time.math Intercept.sdq
## Min. :201 Min. :-59.50 Min. :-3.372 Min. :-2.0326
## 1st Qu.:223 1st Qu.: -9.85 1st Qu.:-0.733 1st Qu.:-0.7895
## Median :246 Median : -0.01 Median : 0.118 Median : 0.0341
## Mean :246 Mean : 0.00 Mean : 0.000 Mean : 0.0000
## 3rd Qu.:268 3rd Qu.: 11.48 3rd Qu.: 0.694 3rd Qu.: 0.6024
## Max. :290 Max. : 57.64 Max. : 2.747 Max. : 3.0828
## Time.sdq
## Min. :-2.4951
## 1st Qu.:-0.7468
## Median : 0.0989
## Mean : 0.0000
## 3rd Qu.: 0.5482
## Max. : 3.0944
## ID Intercept.math Time.math Intercept.sdq Time.sdq
## 1 201 0.8412 -1.542050 -0.9793 0.8381
## 2 202 9.6071 0.029529 0.5728 0.3952
## 3 203 -14.6657 -0.573849 0.5872 1.1176
## 4 204 -3.5781 -0.290299 -0.8056 0.4423
## 5 205 2.1411 -0.852670 0.4991 0.8463
## 6 206 31.4592 0.006922 0.2029 0.6119
Question 2: For the treatment group, was individual-level improvement in mental health associated with improvement in math scores?
Analysis strategy:
##
## Pearson's product-moment correlation
##
## data: re$Time.math and re$Time.sdq
## t = -11, df = 88, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.8455 -0.6749
## sample estimates:
## cor
## -0.7739
ggplot(re, aes(Time.math, Time.sdq)) + geom_point() + stat_smooth(method="lm") +
labs(x="Relative Rate of Increase in Math Score",
y="Relative Rate of Decrease in SDQ Total Difficulties Score") +
theme_bw(base_size=12)
Strong correlation (\(r = -0.77\), \(p < 0.0001\)) indicating that response to mental health intervention (decreased difficulties) was associated with larger increases in math achievement. Note that the key quantities here are slopes. That is, the rate of decreased mental health difficulties is associated with a higher rate of math achievement.