Growth Curve Analysis, Part 3

Workshop Schedule

Day 1: GCA Basics
Time	Content
9am - 10am	Part 1: Introduction, challenges of time course data, GCA basics
10am - 10:30am	Break, hands-on exercise
10:30 - Noon	Part 2: Modeling nonlinear change, within-subject effects
Noon - 1:30pm	Lunch break, Q&A, hands-on exercise
1:30pm - 4:30pm	Hands-on analysis time

Day 2: Advanced Topics
Time	Content
9am - 10am	Part 3: Visualizing higher-order polynomial effects, contrast coding, multiple comparisons
10am - 10:45am	Hands-on analysis time
10:45am - 11:15am	Part 4: Logistic GCA
11:15 - Noon	Hands-on analysis time
Noon - 1:30pm	Lunch break, Q&A
1:30pm - 2:30pm	Part 5: Quantifying and analyzing individual differences
2:30pm - 4:30pm	Hands-on analysis time

Visualizing higher-order polynomical terms

A problem with polynomials: it can be hard to interpret effects on the higher-order terms.

Example: function vs. thematic competition in VWP

ggplot(subset(FunctTheme, Object!="Target"), aes(Time, meanFix, color=Object, fill=Object)) + 
  facet_wrap(~ Condition) + stat_summary(fun.y=mean, geom="line") + 
  stat_summary(fun.data=mean_se, geom="ribbon", color=NA, alpha=0.3) + 
  theme_bw() + labs(x="Time Since Word Onset (ms)", y="Fixation Proportion") +
  legend_positioning(c(1,1))

Function-Thematic GCA

FT.gca <- code_poly(subset(FunctTheme, Object!="Target"), predictor = "Time", poly.order = 4)

m.ft <- lmer(meanFix ~ (poly1+poly2+poly3+poly4)*Object*Condition + 
               (1+poly1+poly2+poly3+poly4 | Subject) +
               (1+poly1+poly2+poly3+poly4 | Subject:Object:Condition),
             data=FT.gca, REML=F)

Function-Thematic GCA: check model fit

FT.gca$GCA_Full <- fitted(m.ft) # add fitted data to original data frame
ggplot(FT.gca, aes(Time, meanFix, color=Object)) + facet_wrap(~ Condition) + 
  stat_summary(fun.data=mean_se, geom="pointrange") + 
  stat_summary(aes(y=GCA_Full), fun.y=mean, geom="line") + 
  theme_bw() + labs(x="Time Since Word Onset (ms)", y="Fixation Proportion") +
  legend_positioning(c(1,1))

Function-Thematic GCA: p-values

get_pvalues(m.ft)[16:20, ]

##                                          Estimate Std..Error t.value
## ObjectUnrelated:ConditionThematic       -0.004164    0.01709 -0.2436
## poly1:ObjectUnrelated:ConditionThematic  0.065878    0.07745  0.8506
## poly2:ObjectUnrelated:ConditionThematic -0.047568    0.04362 -1.0906
## poly3:ObjectUnrelated:ConditionThematic -0.156184    0.05181 -3.0145
## poly4:ObjectUnrelated:ConditionThematic  0.075709    0.03308  2.2888
##                                         p.normal p.normal.star
## ObjectUnrelated:ConditionThematic       0.807512              
## poly1:ObjectUnrelated:ConditionThematic 0.394969              
## poly2:ObjectUnrelated:ConditionThematic 0.275452              
## poly3:ObjectUnrelated:ConditionThematic 0.002574            **
## poly4:ObjectUnrelated:ConditionThematic 0.022090             *

What do those significant effects on the 3rd- and 4th-order polynomial terms mean? Do they correspond to the early-vs-later competition time course difference?

Function-Thematic GCA: fit reduced model

m.reduced <- lmer(meanFix ~ (poly1+poly2+poly3+poly4)*Object + # full Object effects
                    (poly1+poly2+poly3+poly4)*Condition + # full Condition effects
                    (poly1+poly2)*Object*Condition + # reduced Obj-by-Cond interactions
                    (1+poly1+poly2+poly3+poly4 | Subject) +
                    (1+poly1+poly2 | Subject:Object:Condition), # reduced random effect
              data=FT.gca, REML=F)

Note that both the fixed effects and random effects for the interaction are removed from this reduced model.

For the statistical model comparison, we only reduce the fixed effects to evaluate group-level differences. This is for a visual (qualitative) comparison – to see what happens when higher-order interaction terms are the same for all conditions.

Function-Thematic GCA: visually compare model fits

FT.gca$GCA_Reduced <- fitted(m.reduced) # add fitted data to original data frame
ggplot(FT.gca, aes(Time, meanFix, color=Object)) + facet_wrap(~ Condition) + 
  stat_summary(fun.data=mean_se, geom="pointrange") + 
  stat_summary(aes(y=GCA_Full), fun.y=mean, geom="line") + 
  stat_summary(aes(y=GCA_Reduced), fun.y=mean, geom="line", linetype="dashed") + 
  theme_bw() + labs(x="Time Since Word Onset (ms)", y="Fixation Proportion") +
  legend_positioning(c(1,1))

Ok, sort of looks like maybe the Function competition effect is earlier in reduced model and Thematic competition effect is later (i.e., smaller competition time course difference), but hard to tell.

Function-Thematic GCA: competition effect sizes

It might be easier if we compute competition effect sizes (Comp - Unrel), so we’ll have half as much to plot.

library(plyr)
ES <- ddply(FT.gca, .(Subject, Time, Condition), summarize, 
            Competition = meanFix[Object=="Competitor"] - meanFix[Object=="Unrelated"], 
            GCA_Full = GCA_Full[Object=="Competitor"] - GCA_Full[Object=="Unrelated"], 
            GCA_Reduced = GCA_Reduced[Object=="Competitor"] - GCA_Reduced[Object=="Unrelated"])
summary(ES)

##     Subject         Time         Condition    Competition     
##  21     : 34   Min.   : 500   Function:255   Min.   :-0.2812  
##  24     : 34   1st Qu.: 700   Thematic:255   1st Qu.:-0.0625  
##  25     : 34   Median : 900                  Median : 0.0000  
##  27     : 34   Mean   : 900                  Mean   : 0.0215  
##  28     : 34   3rd Qu.:1100                  3rd Qu.: 0.0702  
##  40     : 34   Max.   :1300                  Max.   : 0.4000  
##  (Other):306                                                  
##     GCA_Full        GCA_Reduced     
##  Min.   :-0.2394   Min.   :-0.3123  
##  1st Qu.:-0.0375   1st Qu.:-0.0231  
##  Median : 0.0143   Median : 0.0209  
##  Mean   : 0.0215   Mean   : 0.0214  
##  3rd Qu.: 0.0759   3rd Qu.: 0.0630  
##  Max.   : 0.3320   Max.   : 0.2610  
##

Function-Thematic GCA: plot competition effect sizes

ggplot(ES, aes(Time, Competition, color=Condition)) + 
  stat_summary(fun.y=mean, geom="point") + 
  stat_summary(aes(y=GCA_Full), fun.y=mean, geom="line") + 
  stat_summary(aes(y=GCA_Reduced), fun.y=mean, geom="line", linetype="dashed") + 
  theme_bw() + labs(x="Time Since Word Onset (ms)", y="Competition") + 
  legend_positioning(c(1,1))

No can see that solid lines pull apart the time course, but the dashed lines are almost the same. That is, the full model (with poly3:Object:Condition and poly4:Object:Condition) captured the time course difference, but the reduced model did not.

Contrast coding

For categorical predictor variables, the default in R is to use treatment coding: the reference (first) level is set as the baseline (0) and parameters for the other levels (1) are estimated.

This is fine in many cases, but can be confusing when you have multiple categorical predictors because treatment coding produces simple effect estimates rather than main effect estimates.

Example: MotorLearning - effects of task difficulty and impairment (alcohol) on motor learning.

ggplot(MotorLearning, aes(Trial, Accuracy, color=Difficulty)) + 
  facet_wrap(~ Condition) + stat_summary(fun.y=mean, geom="line") +
  stat_summary(fun.data=mean_se, geom="pointrange") + theme_bw()

Motor Learning: fit GCA model

# prep data for GCA
MotorLearning.gca <- code_poly(df=MotorLearning, predictor="Trial", poly.order=3, draw.poly=FALSE)
# fit full model
m.ML <- lmer(Accuracy ~ (poly1+poly2+poly3)*Difficulty*Condition + 
               (poly1+poly2+poly3 | SubjID) + 
               (poly1+poly2+poly3 | SubjID:Difficulty:Condition), 
             data=MotorLearning.gca, REML=F)
# coefficients and p-values
get_pvalues(m.ML)[1:9, ]

##                     Estimate Std..Error t.value  p.normal p.normal.star
## (Intercept)          0.62111    0.04583 13.5525 0.000e+00           ***
## poly1                1.10820    0.13543  8.1831 2.220e-16           ***
## poly2               -0.44104    0.08624 -5.1143 3.149e-07           ***
## poly3                0.04990    0.06881  0.7252 4.683e-01              
## DifficultyLow        0.09014    0.05121  1.7600 7.840e-02             .
## ConditionImpaired   -0.07986    0.05121 -1.5594 1.189e-01              
## poly1:DifficultyLow -0.22495    0.16574 -1.3573 1.747e-01              
## poly2:DifficultyLow -0.22908    0.10893 -2.1031 3.546e-02             *
## poly3:DifficultyLow  0.18467    0.08498  2.1732 2.977e-02             *

DifficultyLow parameter is not significant, which seems strange because there is a very consistent task difficulty effect.

This parameter is just for the simple effect of Difficulty in the Control condition. Not the main effect Difficulty across both conditions.

Motor Learning: re-fit GCA model

m.ML_sum <- lmer(Accuracy ~ (poly1+poly2+poly3)*Difficulty*Condition +
                   (poly1+poly2+poly3 | SubjID) +
                   (poly1+poly2+poly3 | SubjID:Difficulty:Condition), 
               contrasts = list(Condition = "contr.sum"),
               data=MotorLearning.gca, REML=F)
# coefficients and p-values
get_pvalues(m.ML_sum)[1:9, ]

##                     Estimate Std..Error t.value  p.normal p.normal.star
## (Intercept)          0.58118    0.03801 15.2907 0.000e+00           ***
## poly1                1.08472    0.10711 10.1269 0.000e+00           ***
## poly2               -0.29988    0.06686 -4.4852 7.284e-06           ***
## poly3                0.05346    0.05412  0.9877 3.233e-01              
## DifficultyLow        0.10535    0.03621  2.9090 3.626e-03            **
## Condition1           0.03993    0.02561  1.5594 1.189e-01              
## poly1:DifficultyLow -0.15326    0.11719 -1.3077 1.910e-01              
## poly2:DifficultyLow -0.34688    0.07702 -4.5036 6.682e-06           ***
## poly3:DifficultyLow  0.15038    0.06009  2.5026 1.233e-02             *

Because Condition was sum-coded (-1, 1), now the DifficultyLow parameter corresponds to the main effect of task difficulty across both conditions.

Multiple comparisons

When a categorical variable has 2 levels, the parameter estimate is the comparison between them.
When a categorical variable has more than 2 levels, you might want multiple pairwise comparisons

Example: change in aphasic picture naming error patterns during recovery (NamingRecovery)

ggplot(NamingRecovery, aes(TestTime, Semantic.error, color=Diagnosis)) + 
  stat_summary(fun.y=mean, geom="point", size=3) +
  stat_summary(fun.data=mean_se, geom="errorbar", width=0.2) + 
  stat_summary(fun.y=mean, geom="line") +
  labs(x="Test Number", y="Semantic Errors (Proportion)") + 
  theme_bw() + expand_limits(y=c(0, 0.1))

Semantic errors: fit a linear model

m.sem <- lmer(Semantic.error ~ TestTime*Diagnosis + 
                (TestTime | SubjectID), data=NamingRecovery, REML=F)
# plot model fit
ggplot(NamingRecovery, aes(TestTime, Semantic.error, color=Diagnosis)) + 
  stat_summary(fun.y=mean, geom="point", size=3) +
  stat_summary(fun.data=mean_se, geom="errorbar", width=0.2) + 
  stat_summary(aes(y=fitted(m.sem)), fun.y=mean, geom="line") +
  theme_bw() + expand_limits(y=0)

Semantic errors: p-values

get_pvalues(m.sem)

##                               Estimate Std..Error t.value  p.normal p.normal.star
## (Intercept)                   0.045767   0.007776  5.8854 3.970e-09           ***
## TestTime                     -0.008685   0.003524 -2.4644 1.372e-02             *
## DiagnosisConduction          -0.015149   0.010039 -1.5090 1.313e-01              
## DiagnosisWernicke            -0.004899   0.010287 -0.4762 6.339e-01              
## TestTime:DiagnosisConduction  0.007308   0.004550  1.6063 1.082e-01              
## TestTime:DiagnosisWernicke    0.012854   0.004662  2.7571 5.832e-03            **

Anomic group is the reference group, parameter estimates compare the other groups to it.

No Group differences at baseline, Anomic and Wernicke groups have different slopes
But what about the Wernicke - Conduction comparison?

Brute-force option: make Wernicke the reference group

NamingRecovery$DiagnosisW <- relevel(NamingRecovery$Diagnosis, "Wernicke")
# re-fit the model
m.semW <- lmer(Semantic.error ~ TestTime*DiagnosisW + 
                 (TestTime | SubjectID), data=NamingRecovery, REML=F)
get_pvalues(m.semW)

##                                Estimate Std..Error t.value  p.normal p.normal.star
## (Intercept)                    0.040868   0.006734  6.0684 1.292e-09           ***
## TestTime                       0.004169   0.003052  1.3659 1.720e-01              
## DiagnosisWAnomic               0.004899   0.010287  0.4762 6.339e-01              
## DiagnosisWConduction          -0.010250   0.009256 -1.1074 2.681e-01              
## TestTime:DiagnosisWAnomic     -0.012854   0.004662 -2.7571 5.832e-03            **
## TestTime:DiagnosisWConduction -0.005545   0.004195 -1.3220 1.862e-01

No differences between Conduction and Wernicke groups, other effects stay the same
This approach works fine, but requires creating new variables and fitting extra models, which can get onerous and error-prone when there are more levels.
Original model actually has all of the necessary info: can get Wernicke - Conduction comparison by using Anomic group as common baseline.

Multiple comparisons from a single model

Need to build a contrast matrix to define the parameter comparisons w.r.t. model’s parameter estimates

coef(summary(m.sem))

##                               Estimate Std. Error t value
## (Intercept)                   0.045767   0.007776  5.8854
## TestTime                     -0.008685   0.003524 -2.4644
## DiagnosisConduction          -0.015149   0.010039 -1.5090
## DiagnosisWernicke            -0.004899   0.010287 -0.4762
## TestTime:DiagnosisConduction  0.007308   0.004550  1.6063
## TestTime:DiagnosisWernicke    0.012854   0.004662  2.7571

#in contrast matrix, each row is a comparison
#   each element is a weight for parameters from the original model 
contrast.matrix <- rbind(
  "Conduction w.r.t. Anomic" = c(0, 0, 1, 0, 0, 0), #3rd param in model
  "Wernicke w.r.t. Anomic" = c(0, 0, 0, 1, 0, 0), #4th param in model
  "Wernicke w.r.t. Conduction" = c(0, 0, -1, 1, 0, 0), #3rd vs. 4th in model: Wernicke-Conduction baseline comparison treating Anomic as common baseline
  "Slope: Conduction w.r.t. Anomic" = c(0, 0, 0, 0, 1, 0), #5th param in model
  "Slope: Wernicke w.r.t. Anomic" = c(0, 0, 0, 0, 0, 1), #6th param in model
  "Slope: Wernicke w.r.t. Conduction" = c(0, 0, 0, 0, -1, 1)) #5th vs. 6th in model: Wernicke-Conduction slope comparison treating Anomic as common baseline
contrast.matrix

##                                   [,1] [,2] [,3] [,4] [,5] [,6]
## Conduction w.r.t. Anomic             0    0    1    0    0    0
## Wernicke w.r.t. Anomic               0    0    0    1    0    0
## Wernicke w.r.t. Conduction           0    0   -1    1    0    0
## Slope: Conduction w.r.t. Anomic      0    0    0    0    1    0
## Slope: Wernicke w.r.t. Anomic        0    0    0    0    0    1
## Slope: Wernicke w.r.t. Conduction    0    0    0    0   -1    1

Multiple comparisons from a single model

Use the glht() function from the multcomp package to compute the comparisons defined by the contrast matrix.

library(multcomp)
comps <- glht(m.sem, contrast.matrix)
summary(comps, test = adjusted("none"))

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Fit: lmer(formula = Semantic.error ~ TestTime * Diagnosis + (TestTime | 
##     SubjectID), data = NamingRecovery, REML = F)
## 
## Linear Hypotheses:
##                                        Estimate Std. Error z value Pr(>|z|)   
## Conduction w.r.t. Anomic == 0          -0.01515    0.01004   -1.51   0.1313   
## Wernicke w.r.t. Anomic == 0            -0.00490    0.01029   -0.48   0.6339   
## Wernicke w.r.t. Conduction == 0         0.01025    0.00926    1.11   0.2681   
## Slope: Conduction w.r.t. Anomic == 0    0.00731    0.00455    1.61   0.1082   
## Slope: Wernicke w.r.t. Anomic == 0      0.01285    0.00466    2.76   0.0058 **
## Slope: Wernicke w.r.t. Conduction == 0  0.00555    0.00419    1.32   0.1862   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- none method)

By default, glht() does some kind of correction for multiple comparisons. I don’t know if that correction is appropriate or not. I prefer to not have it do any correction, so it will just return uncorrected normal-approximation p-values, which I can correct for multiple comparisons as needed.

Hands-on analysis exercises

Exercise 4: Contrast coding in analysis of function vs. thematic competition

Consider the model we fit earlier: what do each of the parameter estimates mean? Which aspect of the data or comparison is captured by each parameter estimate?
Re-code the factors to make the parameter estimates more intuitive to interpret, then fit a new GCA model and explain any changes in the parameter estimates
- Set the Unrelated object to be the reference level for the Object factor (use the relevel() function)
- Change the contrast coding for the Condition factor from treatment to sum

Exercise 5: Use the full model of the WISQARS suicide data to test all pairwise comparisons between regions.

Which regions differ from which other regions in terms of baseline suicide rate?
Which regions differ from which other regions in terms of rate of change of suicide rate?

wisqars.suicide$Time <- wisqars.suicide$Year - 1999 # reminder to adjust intercept
ggplot(wisqars.suicide, aes(Time, Crude.Rate, color=Region)) + 
  stat_summary(fun.data=mean_se, geom="pointrange") + 
  stat_summary(fun.y=mean, geom="line")