Individual Differences

Individual differences

Individual differences provide an additional level of analysis for understanding phenomena

At a group level, a treatment works better than a placebo, but why does it work better for some people than for others?
People solve easy problems faster than hard problems, but why are some people a lot faster on the easy problems and other people only a little faster?
t-test and ANOVA methods treat individual differences as noise
Multilevel regression provides two ways to quantify and analyze individual differences

“External” individual differences can be added as fixed effects

Example: National Youth Survey longitudinal data on tolerance for deviant behavior, exposure to deviant behavior, and gender. Subset of 16 subjects printed in Willett (1997, Table 11.1). (DeviantBehavior: http://dmirman.github.io/ET2019/DeviantBehavior.rda)

Tolerance was measured by asking whether it is wrong for someone their age to: cheat on tests, purposely destroy property of others, use marijuana, steal something worth less than $5, hit or threaten someone without reason, use alcohol, break into a building or vehicle to steal, sell hard drugs, or steal something worth more than $50.

##      SubjID      Gender      Exposure          Age       Tolerance   
##  S0009  : 5   Female:45   Min.   :0.810   Min.   :11   Min.   :1.00  
##  S0045  : 5   Male  :35   1st Qu.:0.922   1st Qu.:12   1st Qu.:1.22  
##  S0268  : 5               Median :1.145   Median :13   Median :1.50  
##  S0314  : 5               Mean   :1.191   Mean   :13   Mean   :1.62  
##  S0442  : 5               3rd Qu.:1.395   3rd Qu.:14   3rd Qu.:1.99  
##  S0514  : 5               Max.   :1.990   Max.   :15   Max.   :3.46  
##  (Other):50

Does tolerance increase with age and is it modulated by gender?

m.base <- lmer(Tolerance ~ Age*Gender + (Age | SubjID),
                 contrasts=list(Gender="contr.sum"), data=DeviantBehavior)
get_pvalues(m.base)

##             Estimate Std..Error t.value p.normal p.normal.star
## (Intercept) -0.12563    0.52513 -0.2392 0.810928              
## Age          0.13488    0.04409  3.0593 0.002218            **
## Gender1      0.35552    0.52513  0.6770 0.498406              
## Age:Gender1 -0.03255    0.04409 -0.7382 0.460371

Significant increase with age, no significant effects of gender

Is this modulated by exposure to deviant behavior?

Exposure scale is not clear and other effects would be estimated at Exposure=0, which is not an attested value.

If we center Exposure, its effects and the effects of other predictors will be easier to interpret.

# center Exposure, but do not re-scale it
DeviantBehavior$Exposure.center <- scale(DeviantBehavior$Exposure, scale=FALSE)

Is this modulated by exposure to deviant behavior?

# full model
m.exp1 <- lmer(Tolerance ~ Age*Gender*Exposure.center + (Age | SubjID), 
               data=DeviantBehavior, contrasts=list(Gender="contr.sum"))
get_pvalues(m.exp1)

##                             Estimate Std..Error t.value  p.normal p.normal.star
## (Intercept)                 -0.38880    0.43008  -0.904 3.660e-01              
## Age                          0.15603    0.03492   4.468 7.881e-06           ***
## Gender1                      0.65192    0.43008   1.516 1.296e-01              
## Exposure.center             -4.22668    1.47541  -2.865 4.173e-03            **
## Age:Gender1                 -0.05921    0.03492  -1.696 8.993e-02             .
## Age:Exposure.center          0.38532    0.11979   3.217 1.297e-03            **
## Gender1:Exposure.center      3.62705    1.47541   2.458 1.396e-02             *
## Age:Gender1:Exposure.center -0.28576    0.11979  -2.385 1.706e-02             *

Significant main effects: Age and Exposure
Significant interactions:
- Age-by-Exposure
- Gender-by-Exposure
- Age-by-Gender-by-Exposure

How to plot the three-way interaction?

The three-way Age-by-Gender-by-Exposure interaction is a relationship among four variables (Tolerance for deviant behavior, Exposure to deviant behavior, Age, and Gender), three of which are continuous variables. This is hard to visualize.

To make it easier to visualize, can split the Exposure into levels

# a simple median split
DeviantBehavior$ExposureMedSpl <- ifelse(
  DeviantBehavior$Exposure >= median(DeviantBehavior$Exposure), "High", "Low")
# plot the interaction
ggplot(DeviantBehavior, aes(Age, Tolerance, color=ExposureMedSpl)) + facet_wrap(~ Gender) + 
  stat_summary(fun.y=mean, geom="point", size=3) + stat_summary(fun.y=mean, geom="line") +
  stat_summary(fun.data=mean_se, geom="errorbar", width=0.2) + 
  labs(x="Age", y="Tolerance for deviant behavior") + 
  theme_bw() + legend_positioning(c(0,1))

Adolescents with higher Exposure to deviant behavior tend to have increased Tolerance for deviant behavior as they get older, and this is more true for males than females.

How to plot the three-way interaction?

To make it easier to visualize, can split the Exposure into levels: tertile split

# define break points
b <- quantile(DeviantBehavior$Exposure, probs=seq(0, 1, by=1/3)) 
# split continuous predictor and provide level labels
DeviantBehavior$Exposure3 <- cut(DeviantBehavior$Exposure, 
                                 breaks=b, include.lowest=T, 
                                 labels=c("Low", "Medium", "High"))
# make the plot
ggplot(DeviantBehavior, aes(Age, Tolerance, color=Exposure3)) + facet_wrap(~ Gender) + 
  stat_summary(fun.y=mean, geom="point", size=3) + stat_summary(fun.y=mean, geom="line") +
  stat_summary(fun.data=mean_se, geom="errorbar", width=0.2) + 
  labs(x="Age", y="Tolerance for deviant behavior") + 
  theme_bw() + legend_positioning(c(0,1))

“Internal” individual differences

No “external” measure that can be entered as a fixed effect
Individual differences needed for a different analysis (e.g., VLSM)

For such situations, random effects provide a way to quantify individual effect sizes in the context of a model of overall group performance.

A simple example:

Participant A: $\zeta_{A1} - \zeta_{A0} = 1 - (-1) = 2$

Participant B: $\zeta_{B1} - \zeta_{B0} = (-1) - 1 = -2$

“Internal” individual differences: Example

Data: Function and Thematic competition for 17 LH stroke patients (FunctThemePts)

##       subj         Condition           Object          Time      
##  206    : 486   Function:4113   Target    :2733   Min.   :-1000  
##  281    : 486   Thematic:4086   Competitor:2733   1st Qu.:    0  
##  419    : 486                   Unrelated :2733   Median : 1000  
##  1088   : 486                                     Mean   : 1000  
##  1238   : 486                                     3rd Qu.: 2000  
##  1392   : 486                                     Max.   : 3000  
##  (Other):5283                                                    
##     timeBin      meanFix           sumFix            N       
##  Min.   : 0   Min.   :0.0000   Min.   : 0.00   Min.   :12.0  
##  1st Qu.:20   1st Qu.:0.0312   1st Qu.: 1.00   1st Qu.:15.0  
##  Median :40   Median :0.1250   Median : 2.00   Median :16.0  
##  Mean   :40   Mean   :0.1777   Mean   : 3.26   Mean   :15.4  
##  3rd Qu.:60   3rd Qu.:0.2500   3rd Qu.: 5.00   3rd Qu.:16.0  
##  Max.   :80   Max.   :1.0000   Max.   :16.00   Max.   :16.0  
##

Question: is there a correlation between Function and Thematic competition across patients?

“Internal” individual differences: Example - fit the models

#prep data for GCA
FunctThemePts.gca <- code_poly(subset(FunctThemePts, Time >= 500 & Time <= 2000 & 
                                        Object != "Target"), 
                               predictor="Time", poly.order=4, draw.poly = FALSE)

# function competition: full model
m.funct <- lmer(meanFix ~ (poly1+poly2+poly3+poly4)*Object +
                  (poly1+poly2+poly3+poly4 | subj) + (poly1+poly2 | subj:Object),
                data=subset(FunctThemePts.gca, Condition == "Function"), REML=F)
# thematic competition: full model
m.theme <- lmer(meanFix ~ (poly1+poly2+poly3+poly4)*Object +
                  (poly1+poly2+poly3+poly4 | subj) + (poly1+poly2 | subj:Object),
                data=subset(FunctThemePts.gca, Condition == "Thematic"), REML=F)

“Internal” individual differences: Example - examine random effects

str(ranef(m.funct), vec.len=2)

## List of 2
##  $ subj:Object:'data.frame': 34 obs. of  3 variables:
##   ..$ (Intercept): num [1:34] -0.0327 0.0254 ...
##   ..$ poly1      : num [1:34] 0.0339 0.0726 ...
##   ..$ poly2      : num [1:34] -0.1538 0.0143 ...
##  $ subj       :'data.frame': 17 obs. of  5 variables:
##   ..$ (Intercept): num [1:17] 0.025 -0.0134 ...
##   ..$ poly1      : num [1:17] 0.0924 -0.0454 ...
##   ..$ poly2      : num [1:17] -0.1915 0.0375 ...
##   ..$ poly3      : num [1:17] 0.1124 -0.0677 ...
##   ..$ poly4      : num [1:17] 0.0386 0.0474 ...
##  - attr(*, "class")= chr "ranef.mer"

head(ranef(m.funct)$"subj:Object")

##                (Intercept)    poly1     poly2
## 206:Competitor  -0.0327460  0.03386 -0.153809
## 206:Unrelated    0.0253878  0.07261  0.014314
## 281:Competitor   0.0066498 -0.04395 -0.059606
## 281:Unrelated   -0.0200312 -0.05530  0.115819
## 419:Competitor  -0.0006385  0.24667 -0.001246
## 419:Unrelated   -0.0092935 -0.15294  0.056835

We can use these to calculate individual effect sizes, just like in the simple demo example.

“Internal” individual differences: Example - extract random effects

Use gazer::get_ranef(), which is a helper function for extracting random effects

re.funct <- get_ranef(m.funct, "subj:Object")
head(re.funct)

##                 Intercept    poly1     poly2 subj     Object
## 206:Competitor -0.0327460  0.03386 -0.153809  206 Competitor
## 206:Unrelated   0.0253878  0.07261  0.014314  206  Unrelated
## 281:Competitor  0.0066498 -0.04395 -0.059606  281 Competitor
## 281:Unrelated  -0.0200312 -0.05530  0.115819  281  Unrelated
## 419:Competitor -0.0006385  0.24667 -0.001246  419 Competitor
## 419:Unrelated  -0.0092935 -0.15294  0.056835  419  Unrelated

“Internal” individual differences: Example - compute individual effect sizes from random effects

Compute individual effect size as random effects difference between the two Conditions for each Subject for Intercept and Linear time term

ES.funct <- re.funct %>% group_by(subj) %>% 
  summarize(Function_Intercept = Intercept[Object=="Competitor"] - Intercept[Object=="Unrelated"],
            Function_Linear = poly1[Object=="Competitor"] - poly1[Object=="Unrelated"])
head(ES.funct)

## # A tibble: 6 x 3
##   subj  Function_Intercept Function_Linear
##   <chr>              <dbl>           <dbl>
## 1 1088            -0.00328          -0.156
## 2 1238            -0.0133           -0.140
## 3 1392            -0.00320           0.191
## 4 1626             0.0281            0.278
## 5 1687             0.0565           -0.251
## 6 1846            -0.0788            0.399

“Internal” individual differences: Example - compute individual effect sizes from random effects

Repeat those steps for Thematic condition

re.theme <- get_ranef(m.theme, "subj:Object")
ES.theme <- re.theme %>% group_by(subj) %>% 
  summarize(Thematic_Intercept = Intercept[Object=="Competitor"] - Intercept[Object=="Unrelated"],
            Thematic_Linear = poly1[Object=="Competitor"] - poly1[Object=="Unrelated"])
head(ES.theme)

## # A tibble: 6 x 3
##   subj  Thematic_Intercept Thematic_Linear
##   <chr>              <dbl>           <dbl>
## 1 1088            -0.0846          -0.0619
## 2 1238            -0.0221          -0.0156
## 3 1392             0.0615          -0.353 
## 4 1626             0.0246          -0.469 
## 5 1687            -0.0226          -0.0485
## 6 1846             0.00176          0.155

“Internal” individual differences: Example - analyze association between effect sizes

# combine condition effect sizes into one data frame
ES <- merge(ES.funct, ES.theme)
summary(ES)

##      subj           Function_Intercept Function_Linear  
##  Length:17          Min.   :-0.0788    Min.   :-0.3659  
##  Class :character   1st Qu.:-0.0508    1st Qu.:-0.2511  
##  Mode  :character   Median :-0.0027    Median :-0.0388  
##                     Mean   : 0.0000    Mean   : 0.0000  
##                     3rd Qu.: 0.0267    3rd Qu.: 0.1912  
##                     Max.   : 0.1076    Max.   : 0.4237  
##  Thematic_Intercept Thematic_Linear  
##  Min.   :-0.08556   Min.   :-0.4691  
##  1st Qu.:-0.02205   1st Qu.:-0.1529  
##  Median :-0.00187   Median :-0.0485  
##  Mean   : 0.00000   Mean   : 0.0000  
##  3rd Qu.: 0.02460   3rd Qu.: 0.1554  
##  Max.   : 0.07799   Max.   : 0.6714

# compute correlations between effect sizes
cor.test(ES$Function_Intercept, ES$Thematic_Intercept)

## 
##  Pearson's product-moment correlation
## 
## data:  ES$Function_Intercept and ES$Thematic_Intercept
## t = -2.4, df = 15, p-value = 0.03
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8007 -0.0530
## sample estimates:
##     cor 
## -0.5204

cor.test(ES$Function_Linear, ES$Thematic_Linear)

## 
##  Pearson's product-moment correlation
## 
## data:  ES$Function_Linear and ES$Thematic_Linear
## t = -3.4, df = 15, p-value = 0.004
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8637 -0.2545
## sample estimates:
##    cor 
## -0.655

“Internal” individual differences: Example - scatterplot

A two-panel scatterplot that shows correlations between Function and Thematic competition effect sizes on each of the terms (Intercept, Linear).

# need to re-arrange the data
ES.p <- gather(ES, key = "key", value = "value", 2:5) %>% # "gather" into long form
  separate(key, c("Condition", "Term"), sep = "_") %>% # separate the labels into two columns
  spread(key = Condition, value = value) # "spread" so conditions are in separate columns
summary(ES.p)

##      subj               Term              Function          Thematic      
##  Length:34          Length:34          Min.   :-0.3659   Min.   :-0.4691  
##  Class :character   Class :character   1st Qu.:-0.0736   1st Qu.:-0.0647  
##  Mode  :character   Mode  :character   Median :-0.0029   Median :-0.0077  
##                                        Mean   : 0.0000   Mean   : 0.0000  
##                                        3rd Qu.: 0.0436   3rd Qu.: 0.0294  
##                                        Max.   : 0.4237   Max.   : 0.6714

# now can make the plot
ggplot(ES.p, aes(Function, Thematic)) + 
  facet_wrap(~ Term, scales="free") + geom_point() + 
  stat_smooth(method="lm", se=F) # add trendline

“Internal” individual differences: Why bother?

Why bother with the model-fitting when you could just take the difference of the observed data to get effect sizes?

That would over-estimate the individual differences
Stein’s paradox (Efron & Morris, 1977): An individual’s mid-season batting average is not the best prediction of final batting average. Including MLB grand mean batting average and variability of batting averages improves the estimate.
Why?
Individual model assigns all of the individual effect size to that individual
Group (multilevel) model estimates a mean group effect, individual differences from that mean, and noise.
This allows the estimate of the individual effect size to be informed by (a) a model of the performance of the group of which this individual is a member and (b) the fact that some of the observed individual difference is just random noise.
This “shrinks” the individual effect size estimate toward the group mean in proportion to the model’s cetainty about that mean and estimate of the noisiness of the data (aka partial pooling).

“Internal” individual differences: Why bother? Partial poooling

http://tjmahr.github.io/plotting-partial-pooling-in-mixed-effects-models/

“Internal” individual differences: Why bother? Example

Individual participant Function competition effect sizes

Exercise 5

The CohortRhyme data set contains data from an eye-tracking experiment (Mirman et al., 2011) that investigated phonological competition between cohorts (e.g., penny - pencil) and rhymes (e.g., carrot - parrot).

Three groups of participants were tested: 5 Broca’s aphasics, 3 Wernicke’s aphasics, and 12 control participants.

summary(CohortRhyme)

##      subjID          Group          Time         timeBin     
##  101    :  80   Control :960   Min.   :  50   Min.   : 1.00  
##  102    :  80   Broca   :400   1st Qu.: 525   1st Qu.: 5.75  
##  103    :  80   Wernicke:240   Median :1000   Median :10.50  
##  104    :  80                  Mean   :1000   Mean   :10.50  
##  105    :  80                  3rd Qu.:1475   3rd Qu.:15.25  
##  106    :  80                  Max.   :1950   Max.   :20.00  
##  (Other):1120                                                
##         Object        Type        FixProp      
##  Competitor:800   Cohort:800   Min.   :0.0000  
##  Unrelated :800   Rhyme :800   1st Qu.:0.0000  
##  Target    :  0                Median :0.0417  
##                                Mean   :0.0665  
##                                3rd Qu.:0.1000  
##                                Max.   :0.5833  
##

Make a multi-panel plot showing each kind of competition for each group
Use fourth-order orthogonal polynomials to analyze (separately) the cohort and rhyme competition effects. Do the diagnosis groups differ in competition effect sizes?
Use random effects to compute individual competition effect sizes (note: you’ll need to use a model without any group fixed effects to get random effects relative to the overall sample rather than the diagnosis sub-group).
Test correlations between cohort and rhyme competition effects for patients and controls.
Make a multi-panel scatterplot showing these correlations.