Growth Curve Analysis

Workshop Schedule

Day 1: GCA Basics
Time	Content
9am - 10am	Part 1: Introduction, challenges of time course data, GCA basics
10am - 10:30am	Break, hands-on exercise
10:30 - Noon	Part 2: Modeling nonlinear change, within-subject effects
Noon - 1:30pm	Lunch break, Q&A, hands-on exercise
1:30pm - 4:30pm	Hands-on analysis time

Day 2: Advanced Topics
Time	Content
9am - 10am	Part 3: Visualizing higher-order polynomial effects, contrast coding, multiple comparisons
10am - 10:45am	Hands-on analysis time
10:45am - 11:15am	Part 4: Logistic GCA
11:15 - Noon	Hands-on analysis time
Noon - 1:30pm	Lunch break, Q&A
1:30pm - 2:30pm	Part 5: Quantifying and analyzing individual differences
2:30pm - 4:30pm	Hands-on analysis time

What are time course data?

Longitudinal data
Repeated observations
Example: Height tracking
Specific case of “repeated measures” or “nested” data

Two key features

Nested data are not independent

A child that is taller-than-average at time t, is likely to be taller-than-average at time t+1
Non-independence is related to individual differences
Nesting can happen on multiple levels: children within families or hospitals

Two key features

Nested data are not independent

A child that is taller-than-average at time t, is likely to be taller-than-average at time t+1
Non-independence is related to individual differences
Nesting can happen on multiple levels: children within families or hospitals

Related by continuous variable (i.e., time, but could be [letter] size, number of distractors, etc.)

Ought to model this variable as continuous
Can quantify trajectories/shapes of change

Gradual change

Novel word learning is faster for high TP than low TP words
But, t-test on overall mean accuracy is marginal (p=0.096)
Repeated measures ANOVA shows main effect of Block, marginal effect of TP, and no interaction (F<1)
Block-by-block t-test significant only in block 4 and marginal in block 5

Replication crisis

5/20 replications had no significant effect in any block

Visualizing time course data: ggplot2

summary(Orange)

##  Tree       age       circumference  
##  3:7   Min.   : 118   Min.   : 30.0  
##  1:7   1st Qu.: 484   1st Qu.: 65.5  
##  5:7   Median :1004   Median :115.0  
##  2:7   Mean   : 922   Mean   :115.9  
##  4:7   3rd Qu.:1372   3rd Qu.:161.5  
##        Max.   :1582   Max.   :214.0

Basic line + points plot

ggplot(Orange, aes(x=age, y=circumference, color=Tree)) + 
  geom_point() + geom_line()

Black & White version

ggplot(Orange, aes(x=age, y=circumference, shape=Tree, linetype=Tree)) + 
  geom_point() + geom_line()

Compute summary on-the-fly

ggplot(Orange, aes(age, circumference)) + 
  geom_point() + stat_summary(fun.y=mean, geom="line")

Exclude cases on-the-fly

ggplot(subset(Orange, Tree!="5"), aes(age, circumference)) + 
  geom_point() + stat_summary(fun.y=mean, geom="line")

Other summary statistics

ggplot(Orange, aes(age, circumference)) + 
  stat_summary(fun.data=mean_se, geom="pointrange")

Publication-ready graphs

ggplot(Orange, aes(age, circumference)) + 
  stat_summary(fun.y=mean, geom="line") +
  stat_summary(fun.data=mean_se, geom="pointrange") +
  theme_bw(base_size=12) + labs(x="Age in days", y="Size")

ggsave("fig-Orange.pdf", width=3, height=3)

“Small Multiples” aka facets

ggplot(Orange, aes(age, circumference)) + facet_wrap(~ Tree) + 
  geom_line()

Linear regression: A brief review

\(Y = \beta_{0} + \beta_{1} \cdot Time\)
\(Y_{ij} = \beta_{0i} + \beta_{1i} \cdot Time_{j} + \epsilon_{ij}\)
Fixed effects: \(\beta_{0}\) (Intercept), \(\beta_{1}\) (Slope)
Random effects: \(\epsilon_{ij}\) (Residual error)

Multilevel regression: Fixed effects

Level 1: \(Y_{ij} = \beta_{0i} + \beta_{1i} \cdot Time_{j} + \epsilon_{ij}\)

Level 2: model of the Level 1 parameter(s)

\(\beta_{0i} = \gamma_{00} + \zeta_{0i}\)
- \(\gamma_{00}\) is the population mean
- \(\zeta_{0i}\) is individual deviation from the mean
\(\beta_{0i} = \gamma_{00} + \gamma_{0C} \cdot C + \zeta_{0i}\)
- \(\gamma_{0C}\) is the fixed effect of condition \(C\) on the intercept
\(\beta_{1i} = \gamma_{10} + \gamma_{1C} \cdot C + \zeta_{1i}\)
- \(\gamma_{1C}\) is the fixed effect of condition \(C\) on the slope

Multilevel regression: Random effects

Level 1: \(Y_{ij} = \beta_{0i} + \beta_{1i} \cdot Time_{j} + \epsilon_{ij}\)

Level 2:

\(\beta_{0i} = \gamma_{00} + \gamma_{0C} \cdot C + \zeta_{0i}\)

\(\beta_{1i} = \gamma_{10} + \gamma_{1C} \cdot C + \zeta_{1i}\)

Residual errors

\(\zeta_{0i}\) unexplained variance in intercept
\(\zeta_{1i}\) unexplained variance in slope
Unexplained variance reflects individual differences
Random effects require a lot of data to estimate

Fixed vs. Random effects

Fixed effects

Interesting in themselves
Reproducible fixed properties of the world (nouns vs. verbs, WM load, age, etc.)
Unique, unconstrained parameter estimate for each condition

Random effects

Randomly sampled observational units over which you intend to generalize (particular nouns/verbs, particular individuals, etc.)
Unexplained variance
Drawn from normal distribution with mean 0

Maximum Likelihood Estimation

Find an estimate of parameters that maximizes the likelihood of observing the actual data
Simple regression: OLS produces MLE parameters by solving an equation
Multilevel regression: use iterative algorithm to gradually converge to MLE estimates
Goodness of fit measure: log likelihood (LL)
- Not inherently meaningful (unlike \(R^2\))
- Change in LL indicates improvement of the fit of the model
- Changes in \(-2LL\) (aka “Likelihood Ratio”“) are distributed as \(\chi^2\)
- Requires models be nested (parameters added or removed)
- DF = number of parameters added

A simple (linear) GCA example

# the psy811 package includes helper functions and example data sets
# to install: devtools::install_github("dmirman/psy811")
library(psy811)
summary(VisualSearchEx)

##   Participant        Dx        Set.Size          RT       
##  0042   :  4   Aphasic:60   Min.   : 1.0   Min.   :  414  
##  0044   :  4   Control:72   1st Qu.: 4.0   1st Qu.: 1132  
##  0083   :  4                Median :10.0   Median : 1814  
##  0166   :  4                Mean   :12.8   Mean   : 2261  
##  0186   :  4                3rd Qu.:18.8   3rd Qu.: 2808  
##  0190   :  4                Max.   :30.0   Max.   :12201  
##  (Other):108

ggplot(VisualSearchEx, aes(Set.Size, RT, color=Dx)) + stat_summary(fun.data=mean_se, geom="pointrange")

A simple (linear) GCA example: Fit the models

library(lme4)
# a null, intercept-only model
vs.null <- lmer(RT ~ 1 + (Set.Size | Participant), data=VisualSearchEx, REML=FALSE)
# add effect of set size
vs <- lmer(RT ~ Set.Size + (Set.Size | Participant), data=VisualSearchEx, REML=F)
# add effect of diagnosis
vs.0 <- lmer(RT ~ Set.Size + Dx + (Set.Size | Participant), data=VisualSearchEx, REML=F)
# add set size by diagnosis interaction
vs.1 <- lmer(RT ~ Set.Size * Dx + (Set.Size | Participant), data=VisualSearchEx, REML=F)
# compare model fits
anova(vs.null, vs, vs.0, vs.1)

## Data: VisualSearchEx
## Models:
## vs.null: RT ~ 1 + (Set.Size | Participant)
## vs: RT ~ Set.Size + (Set.Size | Participant)
## vs.0: RT ~ Set.Size + Dx + (Set.Size | Participant)
## vs.1: RT ~ Set.Size * Dx + (Set.Size | Participant)
##         Df  AIC  BIC logLik deviance Chisq Chi Df Pr(>Chisq)    
## vs.null  5 2283 2297  -1136     2273                            
## vs       6 2248 2265  -1118     2236 36.90      1    1.2e-09 ***
## vs.0     7 2241 2261  -1114     2227  8.58      1     0.0034 ** 
## vs.1     8 2241 2264  -1113     2225  2.01      1     0.1567    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A simple (linear) GCA example: Interpet results

	Df	AIC	BIC	logLik	deviance	Chisq	Chi Df	Pr(>Chisq)
vs.null	5	2283	2297	-1136	2273	NA	NA	NA
vs	6	2248	2265	-1118	2236	36.902	1	0.0000
vs.0	7	2241	2261	-1114	2227	8.585	1	0.0034
vs.1	8	2241	2264	-1113	2225	2.006	1	0.1567

Compared to null model, adding set size (vs) substantially improves model fit: response times are affected by number of distractors
Adding effect of Diagnosis on intercept (vs.0) significantly improves model fit: stroke survivors respond more slowly than controls do
Adding interaction of set size and Diagnosis, i.e., effect of Diagnosis on slope (vs.1), does not significantly improve model fit: stroke survivors are not more affected by distractors than controls are

A simple (linear) GCA example: Inspect model

summary(vs.1)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: RT ~ Set.Size * Dx + (Set.Size | Participant)
##    Data: VisualSearchEx
## 
##      AIC      BIC   logLik deviance df.resid 
##     2241     2264    -1113     2225      124 
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -3.759 -0.317 -0.079  0.317  6.229 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev. Corr
##  Participant (Intercept) 613393   783.2        
##              Set.Size       380    19.5    1.00
##  Residual                756830   870.0        
## Number of obs: 132, groups:  Participant, 33
## 
## Fixed effects:
##                    Estimate Std. Error t value
## (Intercept)          2078.7      264.4    7.86
## Set.Size               73.5       11.2    6.54
## DxControl           -1106.1      357.9   -3.09
## Set.Size:DxControl    -21.7       15.2   -1.43
## 
## Correlation of Fixed Effects:
##             (Intr) Set.Sz DxCntr
## Set.Size    -0.090              
## DxControl   -0.739  0.066       
## St.Sz:DxCnt  0.066 -0.739 -0.090

A simple (linear) GCA example: Parameter p-values

get_pvalues(vs.1) #default: normal approximation

##                    Estimate Std..Error t.value  p.normal p.normal.star
## (Intercept)         2078.75     264.36   7.863 3.775e-15           ***
## Set.Size              73.49      11.23   6.545 5.955e-11           ***
## DxControl          -1106.05     357.95  -3.090 2.002e-03            **
## Set.Size:DxControl   -21.74      15.20  -1.430 1.528e-01

get_pvalues(vs.1, method = "KR") # alternative: Kenward-Roger approximation for DF

##                    Estimate Std..Error t.value df.KR      p.KR p.KR.star
## (Intercept)         2078.75     264.36   7.863    31 7.103e-09       ***
## Set.Size              73.49      11.23   6.545    31 2.630e-07       ***
## DxControl          -1106.05     357.95  -3.090    31 4.204e-03        **
## Set.Size:DxControl   -21.74      15.20  -1.430    31 1.628e-01

A simple (linear) GCA example: Plot model fit

ggplot(VisualSearchEx, aes(Set.Size, RT, color=Dx)) + 
  stat_summary(fun.data=mean_se, geom="pointrange") + 
  stat_summary(aes(y=fitted(vs.0)), fun.y=mean, geom="line")

A simple (linear) GCA example: Compare model fits

ggplot(VisualSearchEx, aes(Set.Size, RT, color=Dx)) + 
  stat_summary(fun.data=mean_se, geom="pointrange") + 
  stat_summary(aes(y=fitted(vs.0)), fun.y=mean, geom="line") +
  stat_summary(aes(y=fitted(vs.1)), fun.y=mean, geom="line", linetype="dashed")

BREAK

Exercise 1A: Analyze the state-level suicide rate data from the WISQARS (wisqars.suicide)

did the regions differ in their baseline (1999) suicide rates?
did the regions differ in their rates of change of suidice rate?
plot observed data and model fits

Exercise 1B: Analyze the recovery of object naming ability in aphasia (NamingRecovery)

did the patients recovery any naming ability?
did recovery differ across aphasia sub-types?