Growth Curve Analysis, Part 4

Dan Mirman

28 November 2019

Contrast coding

For categorical predictor variables, the default in R is to use treatment coding: the reference (first) level is set as the baseline (0) and parameters for the other levels (1) are estimated.

This is fine in many cases, but can be confusing when you have multiple categorical predictors because treatment coding produces simple effect estimates rather than main effect estimates.

Example: MotorLearning - effects of task difficulty and impairment (alcohol) on motor learning.

ggplot(MotorLearning, aes(Trial, Accuracy, color=Difficulty)) + 
  facet_wrap(~ Condition) + stat_summary(fun.y=mean, geom="line") +
  stat_summary(fun.data=mean_se, geom="pointrange") + theme_bw()

Motor Learning: fit GCA model

# prep data for GCA
MotorLearning.gca <- code_poly(df=MotorLearning, predictor="Trial", poly.order=3, draw.poly=FALSE)
# fit full model
m.ML <- lmer(Accuracy ~ (poly1+poly2+poly3)*Difficulty*Condition + 
               (poly1+poly2+poly3 | SubjID) + 
               (poly1+poly2+poly3 | SubjID:Difficulty:Condition), 
             data=MotorLearning.gca, REML=F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0423881
## (tol = 0.002, component 1)
# coefficients and p-values
get_pvalues(m.ML)[1:9, ]
##                     Estimate Std..Error t.value  p.normal p.normal.star
## (Intercept)          0.62111    0.04572 13.5865 0.000e+00           ***
## poly1                1.10820    0.13495  8.2120 2.220e-16           ***
## poly2               -0.44104    0.08602 -5.1273 2.939e-07           ***
## poly3                0.04990    0.06875  0.7258 4.680e-01              
## DifficultyLow        0.09014    0.05123  1.7594 7.851e-02             .
## ConditionImpaired   -0.07986    0.05123 -1.5588 1.190e-01              
## poly1:DifficultyLow -0.22495    0.16530 -1.3608 1.736e-01              
## poly2:DifficultyLow -0.22908    0.10889 -2.1038 3.539e-02             *
## poly3:DifficultyLow  0.18467    0.08510  2.1700 3.001e-02             *

DifficultyLow parameter is not significant, which seems strange because there is a very consistent task difficulty effect.

This parameter is just for the simple effect of Difficulty in the Control condition. Not the main effect Difficulty across both conditions.

Motor Learning: re-fit GCA model

m.ML_sum <- lmer(Accuracy ~ (poly1+poly2+poly3)*Difficulty*Condition +
                   (poly1+poly2+poly3 | SubjID) +
                   (poly1+poly2+poly3 | SubjID:Difficulty:Condition), 
               contrasts = list(Condition = "contr.sum"),
               data=MotorLearning.gca, REML=F)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.0282593
## (tol = 0.002, component 1)
# coefficients and p-values
get_pvalues(m.ML_sum)[1:9, ]
##                     Estimate Std..Error t.value  p.normal p.normal.star
## (Intercept)          0.58118    0.03794 15.3202 0.000e+00           ***
## poly1                1.08472    0.10705 10.1326 0.000e+00           ***
## poly2               -0.29988    0.06673 -4.4937 7.001e-06           ***
## poly3                0.05346    0.05424  0.9857 3.243e-01              
## DifficultyLow        0.10535    0.03622  2.9086 3.630e-03            **
## Condition1           0.03993    0.02561  1.5591 1.190e-01              
## poly1:DifficultyLow -0.15326    0.11725 -1.3070 1.912e-01              
## poly2:DifficultyLow -0.34688    0.07703 -4.5031 6.696e-06           ***
## poly3:DifficultyLow  0.15038    0.06014  2.5005 1.240e-02             *

Because Condition was sum-coded (-1, 1), now the DifficultyLow parameter corresponds to the main effect of task difficulty across both conditions.

Multiple comparisons

Example: change in aphasic picture naming error patterns during recovery (NamingRecovery)

ggplot(NamingRecovery, aes(TestTime, Semantic.error, color=Diagnosis)) + 
  stat_summary(fun.y=mean, geom="point", size=3) +
  stat_summary(fun.data=mean_se, geom="errorbar", width=0.2) + 
  stat_summary(fun.y=mean, geom="line") +
  labs(x="Test Number", y="Semantic Errors (Proportion)") + 
  theme_bw() + expand_limits(y=c(0, 0.1))

Semantic errors: fit a linear model

m.sem <- lmer(Semantic.error ~ TestTime*Diagnosis + 
                (TestTime | SubjectID), data=NamingRecovery, REML=F)
# plot model fit
ggplot(NamingRecovery, aes(TestTime, Semantic.error, color=Diagnosis)) + 
  stat_summary(fun.y=mean, geom="point", size=3) +
  stat_summary(fun.data=mean_se, geom="errorbar", width=0.2) + 
  stat_summary(aes(y=fitted(m.sem)), fun.y=mean, geom="line") +
  theme_bw() + expand_limits(y=0)

Semantic errors: p-values

get_pvalues(m.sem)
##                               Estimate Std..Error t.value  p.normal p.normal.star
## (Intercept)                   0.045767   0.007776  5.8853 3.974e-09           ***
## TestTime                     -0.008685   0.003524 -2.4646 1.372e-02             *
## DiagnosisConduction          -0.015149   0.010039 -1.5090 1.313e-01              
## DiagnosisWernicke            -0.004899   0.010287 -0.4762 6.339e-01              
## TestTime:DiagnosisConduction  0.007308   0.004549  1.6064 1.082e-01              
## TestTime:DiagnosisWernicke    0.012854   0.004662  2.7573 5.829e-03            **

Anomic group is the reference group, parameter estimates compare the other groups to it.

Brute-force option: make Wernicke the reference group

NamingRecovery$DiagnosisW <- relevel(NamingRecovery$Diagnosis, "Wernicke")
# re-fit the model
m.semW <- lmer(Semantic.error ~ TestTime*DiagnosisW + 
                 (TestTime | SubjectID), data=NamingRecovery, REML=F)
get_pvalues(m.semW)
##                                Estimate Std..Error t.value  p.normal p.normal.star
## (Intercept)                    0.040867   0.006735  6.0683 1.293e-09           ***
## TestTime                       0.004169   0.003052  1.3660 1.719e-01              
## DiagnosisWAnomic               0.004899   0.010287  0.4762 6.339e-01              
## DiagnosisWConduction          -0.010250   0.009256 -1.1074 2.681e-01              
## TestTime:DiagnosisWAnomic     -0.012854   0.004662 -2.7573 5.829e-03            **
## TestTime:DiagnosisWConduction -0.005545   0.004194 -1.3221 1.861e-01

Multiple comparisons from a single model

Need to build a contrast matrix to define the parameter comparisons w.r.t. model’s parameter estimates

coef(summary(m.sem))
##                               Estimate Std. Error t value
## (Intercept)                   0.045767   0.007776  5.8853
## TestTime                     -0.008685   0.003524 -2.4646
## DiagnosisConduction          -0.015149   0.010039 -1.5090
## DiagnosisWernicke            -0.004899   0.010287 -0.4762
## TestTime:DiagnosisConduction  0.007308   0.004549  1.6064
## TestTime:DiagnosisWernicke    0.012854   0.004662  2.7573
#in contrast matrix, each row is a comparison
#   each element is a weight for parameters from the original model 
contrast.matrix <- rbind(
  "Conduction w.r.t. Anomic" = c(0, 0, 1, 0, 0, 0), #3rd param in model
  "Wernicke w.r.t. Anomic" = c(0, 0, 0, 1, 0, 0), #4th param in model
  "Wernicke w.r.t. Conduction" = c(0, 0, -1, 1, 0, 0), #3rd vs. 4th in model: Wernicke-Conduction baseline comparison treating Anomic as common baseline
  "Slope: Conduction w.r.t. Anomic" = c(0, 0, 0, 0, 1, 0), #5th param in model
  "Slope: Wernicke w.r.t. Anomic" = c(0, 0, 0, 0, 0, 1), #6th param in model
  "Slope: Wernicke w.r.t. Conduction" = c(0, 0, 0, 0, -1, 1)) #5th vs. 6th in model: Wernicke-Conduction slope comparison treating Anomic as common baseline
contrast.matrix
##                                   [,1] [,2] [,3] [,4] [,5] [,6]
## Conduction w.r.t. Anomic             0    0    1    0    0    0
## Wernicke w.r.t. Anomic               0    0    0    1    0    0
## Wernicke w.r.t. Conduction           0    0   -1    1    0    0
## Slope: Conduction w.r.t. Anomic      0    0    0    0    1    0
## Slope: Wernicke w.r.t. Anomic        0    0    0    0    0    1
## Slope: Wernicke w.r.t. Conduction    0    0    0    0   -1    1

Multiple comparisons from a single model

Use the glht() function from the multcomp package to compute the comparisons defined by the contrast matrix.

library(multcomp)
comps <- glht(m.sem, contrast.matrix)
summary(comps, test = adjusted("none"))
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Fit: lmer(formula = Semantic.error ~ TestTime * Diagnosis + (TestTime | 
##     SubjectID), data = NamingRecovery, REML = F)
## 
## Linear Hypotheses:
##                                        Estimate Std. Error z value Pr(>|z|)   
## Conduction w.r.t. Anomic == 0          -0.01515    0.01004   -1.51   0.1313   
## Wernicke w.r.t. Anomic == 0            -0.00490    0.01029   -0.48   0.6339   
## Wernicke w.r.t. Conduction == 0         0.01025    0.00926    1.11   0.2681   
## Slope: Conduction w.r.t. Anomic == 0    0.00731    0.00455    1.61   0.1082   
## Slope: Wernicke w.r.t. Anomic == 0      0.01285    0.00466    2.76   0.0058 **
## Slope: Wernicke w.r.t. Conduction == 0  0.00555    0.00419    1.32   0.1861   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- none method)

By default, glht() does some kind of correction for multiple comparisons. I don’t know if that correction is appropriate or not. I prefer to not have it do any correction, so it will just return uncorrected normal-approximation p-values, which I can correct for multiple comparisons as needed.

Exercise 4

Use the full model of the WISQARS suicide data to test all pairwise comparisons between regions.

wisqars.suicide$Time <- wisqars.suicide$Year - 1999 # reminder to adjust intercept
ggplot(wisqars.suicide, aes(Time, Crude.Rate, color=Region)) + 
  stat_summary(fun.data=mean_se, geom="pointrange") + 
  stat_summary(fun.y=mean, geom="line")