Individual differences

Individual differences provide an additional level of analysis for understanding phenomena

At a group level, a treatment works better than a placebo, but why does it work better for some people than for others?
People solve easy problems faster than hard problems, but why are some people a lot faster on the easy problems and other people only a little faster?
t-test and ANOVA methods treat individual differences as noise
Multilevel regression provides two ways to quantify and analyze individual differences

“External” individual differences can be added as fixed effects

Example: National Youth Survey longitudinal data on tolerance for deviant behavior, exposure to deviant behavior, and gender. Subset of 16 subjects printed in Willett (1997, Table 11.1). (DeviantBehavior)

Tolerance was measured by asking whether it is wrong for someone their age to: cheat on tests, purposely destroy property of others, use marijuana, steal something worth less than $5, hit or threaten someone without reason, use alcohol, break into a building or vehicle to steal, sell hard drugs, or steal something worth more than $50.

##      SubjID      Gender      Exposure          Age       Tolerance   
##  S0009  : 5   Female:45   Min.   :0.810   Min.   :11   Min.   :1.00  
##  S0045  : 5   Male  :35   1st Qu.:0.922   1st Qu.:12   1st Qu.:1.22  
##  S0268  : 5               Median :1.145   Median :13   Median :1.50  
##  S0314  : 5               Mean   :1.191   Mean   :13   Mean   :1.62  
##  S0442  : 5               3rd Qu.:1.395   3rd Qu.:14   3rd Qu.:1.99  
##  S0514  : 5               Max.   :1.990   Max.   :15   Max.   :3.46  
##  (Other):50

Does tolerance increase with age and is it modulated by gender?

m.base <- lmer(Tolerance ~ Age*Gender + (Age | SubjID),
                 contrasts=list(Gender="contr.sum"), data=DeviantBehavior)

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00632735 (tol = 0.002, component 1)

coef(summary(m.base))

##             Estimate Std. Error    df t value Pr(>|t|)
## (Intercept) -0.12563    0.52444 14.04 -0.2395 0.814143
## Age          0.13488    0.04403 14.04  3.0634 0.008399
## Gender1      0.35552    0.52444 14.04  0.6779 0.508852
## Age:Gender1 -0.03255    0.04403 14.04 -0.7392 0.471957

Significant increase with age, no significant effects of gender

Is this modulated by exposure to deviant behavior?

Exposure scale is not clear and other effects would be estimated at Exposure=0, which is not an attested value.

If we center Exposure, its effects and the effects of other predictors will be easier to interpret.

# center Exposure, but do not re-scale it
DeviantBehavior$Exposure.center <- scale(DeviantBehavior$Exposure, scale=FALSE)

Is this modulated by exposure to deviant behavior?

# full model
m.exp1 <- lmer(Tolerance ~ Age*Gender*Exposure.center + (Age | SubjID), 
               data=DeviantBehavior, contrasts=list(Gender="contr.sum"))
coef(summary(m.exp1))

##                             Estimate Std. Error df t value  Pr(>|t|)
## (Intercept)                 -0.38880    0.43008 12  -0.904 0.3837728
## Age                          0.15603    0.03492 12   4.468 0.0007678
## Gender1                      0.65192    0.43008 12   1.516 0.1554571
## Exposure.center             -4.22668    1.47540 12  -2.865 0.0142277
## Age:Gender1                 -0.05921    0.03492 12  -1.696 0.1156925
## Age:Exposure.center          0.38532    0.11979 12   3.217 0.0073996
## Gender1:Exposure.center      3.62705    1.47540 12   2.458 0.0301300
## Age:Gender1:Exposure.center -0.28576    0.11979 12  -2.386 0.0344162

Significant main effects: Age and Exposure
Significant interactions: Age-by-Exposure, Gender-by-Exposure, Age-by-Gender-by-Exposure

How to plot the three-way interaction?

The three-way Age-by-Gender-by-Exposure interaction is a relationship among four variables (Tolerance for deviant behavior, Exposure to deviant behavior, Age, and Gender), three of which are continuous variables. This is hard to visualize.

To make it easier to visualize, can split the Exposure into levels

# a simple median split
DeviantBehavior$ExposureMedSpl <- ifelse(
  DeviantBehavior$Exposure >= median(DeviantBehavior$Exposure), "High", "Low")
# plot the interaction
ggplot(DeviantBehavior, aes(Age, Tolerance, color=ExposureMedSpl)) +
  facet_wrap(~ Gender) + 
  stat_summary(fun=mean, geom="point", size=3) +
  stat_summary(fun=mean, geom="line") +
  stat_summary(fun.data=mean_se, geom="errorbar", width=0.2) + 
  labs(x="Age", y="Tolerance for deviant behavior") + 
  theme_bw() + legend_positioning(c(0,1))

Adolescents with higher Exposure to deviant behavior tend to have increased Tolerance for deviant behavior as they get older, and this is more true for males than females.

How to plot the three-way interaction?

To make it easier to visualize, can split the Exposure into levels: tertile split

# define break points
b <- quantile(DeviantBehavior$Exposure, probs=seq(0, 1, by=1/3)) 
# split continuous predictor and provide level labels
DeviantBehavior$Exposure3 <- cut(DeviantBehavior$Exposure, 
                                 breaks=b, include.lowest=T, 
                                 labels=c("Low", "Medium", "High"))
# make the plot
ggplot(DeviantBehavior, aes(Age, Tolerance, color=Exposure3)) + facet_wrap(~ Gender) + 
  stat_summary(fun.y=mean, geom="point", size=3) + stat_summary(fun.y=mean, geom="line") +
  stat_summary(fun.data=mean_se, geom="errorbar", width=0.2) + 
  labs(x="Age", y="Tolerance for deviant behavior") + 
  theme_bw() + legend_positioning(c(0,1))

“Internal” individual differences

No “external” measure that can be entered as a fixed effect
Individual differences needed for a different analysis (e.g., VLSM)

For such situations, random effects provide a way to quantify individual effect sizes in the context of a model of overall group performance.

A simple example:

Participant A: $\zeta_{A1} - \zeta_{A0} = 1 - (-1) = 2$

Participant B: $\zeta_{B1} - \zeta_{B0} = (-1) - 1 = -2$

“Internal” individual differences: Example

Data (Made-up): Effect of school mental health services on educational achievement (EducMH)

Condition = Treatment (students who received mental health services) vs. Control (academically matched group of students who did not receive services)
SDQ = Strengths and Difficulties Questionnaire: a brief behavioral screening for mental health, only available for Treatment group. Lower scores are better (Total difficulties).
Math = Score on standardized math test

summary(EducMH)

##        ID           Condition        Year           SDQ            Math      
##  101    :   6   Control  :540   Min.   :2009   Min.   : 6.6   Min.   : 81.7  
##  102    :   6   Treatment:540   1st Qu.:2010   1st Qu.:16.2   1st Qu.:147.5  
##  103    :   6                   Median :2012   Median :17.9   Median :165.1  
##  104    :   6                   Mean   :2012   Mean   :17.9   Mean   :164.8  
##  105    :   6                   3rd Qu.:2013   3rd Qu.:19.6   3rd Qu.:181.2  
##  106    :   6                   Max.   :2014   Max.   :33.8   Max.   :259.3  
##  (Other):1044                                  NA's   :540

“Internal” individual differences: Example

Data (Made-up): Effect of school mental health services on educational achievement (EducMH)

Question 1: Did the school mental health services improve academic achievement? That is, did the two groups differ on math achievement at baseline and over the 6 years of the study?

Question 2: For the treatment group, was individual-level improvement in mental health associated with improvement in math scores?

“Internal” individual differences: Example

Question 1: Did the school mental health services improve academic achievement? That is, did the two groups differ on math achievement at baseline and over the 6 years of the study?

# adjust time variable to have a sensible intercept
EducMH$Time <- EducMH$Year - 2009
# fit the models
m.base <- lmer(Math ~ Time + (Time | ID), data=EducMH, REML=F)
m.0 <- lmer(Math ~ Time + Condition + (Time | ID), data=EducMH, REML=F)
m.1 <- lmer(Math ~ Time*Condition + (Time | ID), data=EducMH, REML=F)
# compare the models
anova(m.base, m.0, m.1)

## Data: EducMH
## Models:
## m.base: Math ~ Time + (Time | ID)
## m.0: Math ~ Time + Condition + (Time | ID)
## m.1: Math ~ Time * Condition + (Time | ID)
##        npar  AIC  BIC logLik deviance Chisq Df Pr(>Chisq)   
## m.base    6 7847 7876  -3917     7835                       
## m.0       7 7848 7883  -3917     7834  0.15  1     0.6947   
## m.1       8 7843 7883  -3914     7827  7.02  1     0.0081 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#summary(m.1)
coef(summary(m.1))

##                         Estimate Std. Error  df t value   Pr(>|t|)
## (Intercept)              149.297     2.0295 180 73.5636 3.006e-136
## Time                       5.216     0.3737 180 13.9560  1.776e-30
## ConditionTreatment         1.337     2.8701 180  0.4659  6.419e-01
## Time:ConditionTreatment    1.414     0.5285 180  2.6755  8.149e-03

There was no group difference at baseline, but there was a group difference on slope. That is, math achievement in the two groups started out the same, but increased more quickly in the Treatment group.

“Internal” individual differences: Example

Question 2: For the treatment group, was individual-level improvement in mental health associated with improvement in math scores?

First step: For the treatment group, plot change in the SDQ over time showing both group mean and individual variability

ggplot(subset(EducMH, Condition == "Treatment"), aes(Year, SDQ)) + 
  geom_line(aes(group=ID), color="gray") +
  stat_summary(fun=mean, geom="line") +
  stat_summary(fun.data=mean_se, geom="errorbar", width=0.3) +
  theme_bw(base_size=12)

Within the treatment group, it looks like there is lots of variability in response to the mental health services. Some people responded really well (big decreases in difficulties on SDQ), some people didn’t respond well (increased difficulties according to SDQ).

We want to know whether this variability is associated with variability in improved math achievement.

“Internal” individual differences: Example

Question 2: For the treatment group, was individual-level improvement in mental health associated with improvement in math scores?

Analysis strategy:

Build separate models for change in SDQ and change in Math scores over time
Use random effects to quantify individual differences in change over time for the two scores
Test the correlation between change in SDQ and in Math achievement (and make a scatterplot showing this).

“Internal” individual differences: Example

Question 2: For the treatment group, was individual-level improvement in mental health associated with improvement in math scores?

Analysis strategy:

Build separate models for change in SDQ and change in Math scores over time

m.math <- lmer(Math ~ Time + (Time | ID), 
               data=subset(EducMH, Condition == "Treatment"), REML=F)
m.sdq <- lmer(SDQ ~ Time + (Time | ID), 
              data=subset(EducMH, Condition == "Treatment"), REML=F)

“Internal” individual differences: Example

Question 2: For the treatment group, was individual-level improvement in mental health associated with improvement in math scores?

Analysis strategy:

Build separate models for change in SDQ and change in Math scores over time
Use random effects to quantify individual differences in change over time for the two scores

#psy811::get_ranef() will extract the named random effect and clean them up a bit
re.math <- get_ranef(m.math, "ID")
re.sdq <- get_ranef(m.sdq, "ID")
#merge() will combine those into one data frame, but needs some help because the variable names are all the same
re <- merge(re.math, re.sdq, by="ID", suffixes = c(".math", ".sdq"))
summary(re)

##        ID      Intercept.math     Time.math      Intercept.sdq    
##  Min.   :201   Min.   :-59.50   Min.   :-3.372   Min.   :-2.0326  
##  1st Qu.:223   1st Qu.: -9.85   1st Qu.:-0.733   1st Qu.:-0.7895  
##  Median :246   Median : -0.01   Median : 0.118   Median : 0.0341  
##  Mean   :246   Mean   :  0.00   Mean   : 0.000   Mean   : 0.0000  
##  3rd Qu.:268   3rd Qu.: 11.48   3rd Qu.: 0.694   3rd Qu.: 0.6024  
##  Max.   :290   Max.   : 57.64   Max.   : 2.747   Max.   : 3.0828  
##     Time.sdq      
##  Min.   :-2.4951  
##  1st Qu.:-0.7468  
##  Median : 0.0989  
##  Mean   : 0.0000  
##  3rd Qu.: 0.5482  
##  Max.   : 3.0944

head(re)

##    ID Intercept.math Time.math Intercept.sdq Time.sdq
## 1 201         0.8412 -1.542050       -0.9793   0.8381
## 2 202         9.6071  0.029529        0.5728   0.3952
## 3 203       -14.6657 -0.573849        0.5872   1.1176
## 4 204        -3.5781 -0.290299       -0.8056   0.4423
## 5 205         2.1411 -0.852670        0.4991   0.8463
## 6 206        31.4592  0.006922        0.2029   0.6119

“Internal” individual differences: Example

Question 2: For the treatment group, was individual-level improvement in mental health associated with improvement in math scores?

Analysis strategy:

Build separate models for change in SDQ and change in Math scores over time
Use random effects to quantify individual differences in change over time for the two scores
Test the correlation between change in SDQ and in Math achievement (and make a scatterplot showing this).

cor.test(re$Time.math, re$Time.sdq)

## 
##  Pearson's product-moment correlation
## 
## data:  re$Time.math and re$Time.sdq
## t = -11, df = 88, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8455 -0.6749
## sample estimates:
##     cor 
## -0.7739

ggplot(re, aes(Time.math, Time.sdq)) + geom_point() + stat_smooth(method="lm") + 
  labs(x="Relative Rate of Increase in Math Score", 
       y="Relative Rate of Decrease in SDQ Total Difficulties Score") + 
  theme_bw(base_size=12)

## `geom_smooth()` using formula 'y ~ x'

Strong correlation ($r = -0.77$, $p < 0.0001$) indicating that response to mental health intervention (decreased difficulties) was associated with larger increases in math achievement. Note that the key quantities here are slopes. That is, the rate of decreased mental health difficulties is associated with a higher rate of math achievement.

Exercise 5

The CohortRhyme data set contains data from an eye-tracking experiment (Mirman et al., 2011) that investigated phonological competition between cohorts (e.g., penny - pencil) and rhymes (e.g., carrot - parrot).

Three groups of participants were tested: 5 Broca’s aphasics, 3 Wernicke’s aphasics, and 12 control participants.

summary(CohortRhyme)

##      subjID          Group          Time         timeBin             Object   
##  101    :  80   Control :960   Min.   :  50   Min.   : 1.00   Competitor:800  
##  102    :  80   Broca   :400   1st Qu.: 525   1st Qu.: 5.75   Unrelated :800  
##  103    :  80   Wernicke:240   Median :1000   Median :10.50   Target    :  0  
##  104    :  80                  Mean   :1000   Mean   :10.50                   
##  105    :  80                  3rd Qu.:1475   3rd Qu.:15.25                   
##  106    :  80                  Max.   :1950   Max.   :20.00                   
##  (Other):1120                                                                 
##      Type        FixProp      
##  Cohort:800   Min.   :0.0000  
##  Rhyme :800   1st Qu.:0.0000  
##               Median :0.0417  
##               Mean   :0.0665  
##               3rd Qu.:0.1000  
##               Max.   :0.5833  
##

Make a multi-panel plot showing each kind of competition for each group
Use fourth-order orthogonal polynomials to analyze (separately) the cohort and rhyme competition effects. Do the diagnosis groups differ in competition effect sizes?
Use random effects to compute individual competition effect sizes (note: you’ll need to use a model without any group fixed effects to get random effects relative to the overall sample rather than the diagnosis sub-group).
Test correlations between cohort and rhyme competition effects for patients and controls.
Make a multi-panel scatterplot showing these correlations.

Growth Curve Analysis, Part 5

Individual differences

“External” individual differences can be added as fixed effects

Does tolerance increase with age and is it modulated by gender?

Is this modulated by exposure to deviant behavior?

Is this modulated by exposure to deviant behavior?

How to plot the three-way interaction?

How to plot the three-way interaction?

“Internal” individual differences

“Internal” individual differences: Example

“Internal” individual differences: Example

“Internal” individual differences: Example

“Internal” individual differences: Example

“Internal” individual differences: Example

“Internal” individual differences: Example

“Internal” individual differences: Example

“Internal” individual differences: Example

Exercise 5

Coda

Thank you!