Schedule

Schedule

Time	Topic	Data set(s)
9-9:30am	Introduction	Visual Search
9:30-10am	Exercise	WISQARS, Naming Recovery
10-10:45am	Non-linear change	Word Learning
10:45-11:30am	Exercise	CP
11:30-noon	Within-subject effects	Target Fixation
12-1pm	Lunch break; Exercise	Az
1-1:45pm	Logistic GCA; Exercise	Target Fixation; Word Learning
1:45-2:30pm	Individual Differences	Deviant Behavior; School Mental Health
2:30-3pm	Exercise; Open Q&A	NA

Preliminaries

Packages

tidyverse, patchwork: data management and graphing data and model fits
lme4, lmerTest: fitting growth curve models and estimating p-values
psy811: example data sets and helper functions. To install: devtools::install_github("dmirman/psy811")

Pre-processing eye-tracking data

I recommend our gazeR package: http://github.com/dmirman/gazer. GazeR also contains the GCA helper functions (psy811 has more example data sets so we’ll use that one for this workshop).

https://rdcu.be/b3z6q

What are time course data?

Longitudinal data
Repeated observations
Example: Height tracking
Specific case of “repeated measures” or “nested” data

Two key features

Nested data are not independent

A child that is taller-than-average at time t, is likely to be taller-than-average at time t+1
Non-independence is related to individual differences
Nesting can happen on multiple levels: children within families or hospitals

Two key features

Nested data are not independent

A child that is taller-than-average at time t, is likely to be taller-than-average at time t+1
Non-independence is related to individual differences
Nesting can happen on multiple levels: children within families or hospitals

Related by continuous variable (i.e., time, but could be [letter] size, number of distractors, etc.)

Ought to model this variable as continuous
Can quantify trajectories/shapes of change

Gradual change

Novel word learning is faster for high TP than low TP words
But, t-test on overall mean accuracy is marginal (p=0.096)
Repeated measures ANOVA shows main effect of Block, marginal effect of TP, and no interaction (F<1)
Block-by-block t-test significant only in block 4 and marginal in block 5

Replication crisis

5/20 replications had no significant effect in any block
Solution: Use a regression-based approach to analyze the full trajectory

Linear regression: A brief review

\(Y = \beta_{0} + \beta_{1} \cdot Time\)
\(Y_{ij} = \beta_{0i} + \beta_{1i} \cdot Time_{j} + \epsilon_{ij}\)
Fixed effects: \(\beta_{0}\) (Intercept), \(\beta_{1}\) (Slope)
Random effects: \(\epsilon_{ij}\) (Residual error)

Multilevel regression: Fixed effects

Level 1: \(Y_{ij} = \beta_{0i} + \beta_{1i} \cdot Time_{j} + \epsilon_{ij}\)

Level 2: model of the Level 1 parameter(s)

\(\beta_{0i} = \gamma_{00} + \zeta_{0i}\)
- \(\gamma_{00}\) is the population mean
- \(\zeta_{0i}\) is individual deviation from the mean
\(\beta_{0i} = \gamma_{00} + \gamma_{0C} \cdot C + \zeta_{0i}\)
- \(\gamma_{0C}\) is the fixed effect of condition \(C\) on the intercept
\(\beta_{1i} = \gamma_{10} + \gamma_{1C} \cdot C + \zeta_{1i}\)
- \(\gamma_{1C}\) is the fixed effect of condition \(C\) on the slope

Multilevel regression: Random effects

Level 1: \(Y_{ij} = \beta_{0i} + \beta_{1i} \cdot Time_{j} + \epsilon_{ij}\)

Level 2:

\(\beta_{0i} = \gamma_{00} + \gamma_{0C} \cdot C + \zeta_{0i}\)

\(\beta_{1i} = \gamma_{10} + \gamma_{1C} \cdot C + \zeta_{1i}\)

Residual errors

\(\zeta_{0i}\) unexplained variance in intercept
\(\zeta_{1i}\) unexplained variance in slope
Unexplained variance reflects individual differences
Random effects require a lot of data to estimate

Fixed vs. Random effects

Fixed effects

Interesting in themselves
Reproducible fixed properties of the world (nouns vs. verbs, WM load, age, etc.)
Unique, unconstrained parameter estimate for each condition

Random effects

Randomly sampled observational units over which you intend to generalize (particular nouns/verbs, particular individuals, etc.)
Unexplained variance
Drawn from normal distribution with mean 0

Maximum Likelihood Estimation

Find an estimate of parameters that maximizes the likelihood of observing the actual data
Simple regression: OLS produces MLE parameters by solving an equation
Multilevel regression: use iterative algorithm to gradually converge to MLE estimates
Goodness of fit measure: log likelihood (LL)
- Not inherently meaningful (unlike \(R^2\))
- Change in LL indicates improvement of the fit of the model
- Changes in \(-2LL\) (aka “Likelihood Ratio”") are distributed as \(\chi^2\)
- Requires models be nested (parameters added or removed)
- DF = number of parameters added

A simple (linear) GCA example

# the psy811 package includes helper functions and example data sets
# to install: devtools::install_github("dmirman/psy811")
library(psy811)
summary(VisualSearchEx)

##   Participant        Dx        Set.Size          RT       
##  0042   :  4   Aphasic:60   Min.   : 1.0   Min.   :  414  
##  0044   :  4   Control:72   1st Qu.: 4.0   1st Qu.: 1132  
##  0083   :  4                Median :10.0   Median : 1814  
##  0166   :  4                Mean   :12.8   Mean   : 2261  
##  0186   :  4                3rd Qu.:18.8   3rd Qu.: 2808  
##  0190   :  4                Max.   :30.0   Max.   :12201  
##  (Other):108

ggplot(VisualSearchEx, aes(Set.Size, RT, color=Dx)) +
  stat_summary(fun.data=mean_se, geom="pointrange")

A simple (linear) GCA example: Fit the models

library(lme4)
library(lmerTest) #for estimated df and p-values
# a null, intercept-only model
vs.null <- lmer(RT ~ 1 + (Set.Size | Participant), data=VisualSearchEx, REML=FALSE)
# add effect of set size
vs <- lmer(RT ~ Set.Size + (Set.Size | Participant), data=VisualSearchEx, REML=F)
# add effect of diagnosis
vs.0 <- lmer(RT ~ Set.Size + Dx + (Set.Size | Participant), data=VisualSearchEx, REML=F)
# add set size by diagnosis interaction
vs.1 <- lmer(RT ~ Set.Size * Dx + (Set.Size | Participant), data=VisualSearchEx, REML=F)
# compare model fits
anova(vs.null, vs, vs.0, vs.1)

## Data: VisualSearchEx
## Models:
## vs.null: RT ~ 1 + (Set.Size | Participant)
## vs: RT ~ Set.Size + (Set.Size | Participant)
## vs.0: RT ~ Set.Size + Dx + (Set.Size | Participant)
## vs.1: RT ~ Set.Size * Dx + (Set.Size | Participant)
##         npar  AIC  BIC logLik deviance Chisq Df Pr(>Chisq)    
## vs.null    5 2283 2297  -1136     2273                        
## vs         6 2248 2265  -1118     2236 36.90  1    1.2e-09 ***
## vs.0       7 2241 2261  -1114     2227  8.58  1     0.0034 ** 
## vs.1       8 2241 2264  -1113     2225  2.01  1     0.1566    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A simple (linear) GCA example: Interpet results

Model comparisons

	npar	AIC	BIC	logLik	deviance	Chisq	Df	Pr(>Chisq)
vs.null	5	2283	2297	-1136	2273	NA	NA	NA
vs	6	2248	2265	-1118	2236	36.902	1	0.000
vs.0	7	2241	2261	-1114	2227	8.585	1	0.003
vs.1	8	2241	2264	-1113	2225	2.006	1	0.157

Compared to null model, adding set size (vs) substantially improves model fit: response times are affected by number of distractors
Adding effect of Diagnosis on intercept (vs.0) significantly improves model fit: stroke survivors respond more slowly than controls do
Adding interaction of set size and Diagnosis, i.e., effect of Diagnosis on slope (vs.1), does not significantly improve model fit: stroke survivors are not more affected by distractors than controls are

A simple (linear) GCA example: Inspect model

summary(vs.1)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: RT ~ Set.Size * Dx + (Set.Size | Participant)
##    Data: VisualSearchEx
## 
##      AIC      BIC   logLik deviance df.resid 
##     2241     2264    -1113     2225      124 
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -3.759 -0.317 -0.079  0.317  6.229 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev. Corr
##  Participant (Intercept) 613397   783.2        
##              Set.Size       380    19.5    1.00
##  Residual                756827   870.0        
## Number of obs: 132, groups:  Participant, 33
## 
## Fixed effects:
##                    Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)          2078.7      264.4    35.7    7.86  2.6e-09 ***
## Set.Size               73.5       11.2    54.9    6.54  2.1e-08 ***
## DxControl           -1106.1      357.9    35.7   -3.09   0.0039 ** 
## Set.Size:DxControl    -21.7       15.2    54.9   -1.43   0.1585    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) Set.Sz DxCntr
## Set.Size    -0.090              
## DxControl   -0.739  0.066       
## St.Sz:DxCnt  0.066 -0.739 -0.090
## convergence code: 0
## boundary (singular) fit: see ?isSingular

A simple (linear) GCA example: Plot model fit

ggplot(VisualSearchEx, aes(Set.Size, RT, color=Dx)) + 
  stat_summary(fun.data=mean_se, geom="pointrange") + 
  stat_summary(aes(y=fitted(vs.0)), fun=mean, geom="line")

A simple (linear) GCA example: Compare model fits

ggplot(VisualSearchEx, aes(Set.Size, RT, color=Dx)) + 
  stat_summary(fun.data=mean_se, geom="pointrange") + 
  stat_summary(aes(y=fitted(vs.0)), fun=mean, geom="line") +
  stat_summary(aes(y=fitted(vs.1)), fun=mean, geom="line", linetype="dashed")

Practice Exercises

Exercise 1A: Analyze the US state-level suicide rate data from the WISQARS (wisqars.suicide)

did the regions differ in their baseline (1999) suicide rates?
did the regions differ in their rates of change of suidice rate?
plot observed data and model fits

Exercise 1B: Analyze the recovery of object naming ability in aphasia (NamingRecovery)

did the patients recovery any naming ability?
did recovery differ across aphasia sub-types?

Growth Curve Analysis

Schedule

Preliminaries

Packages

Pre-processing eye-tracking data

What are time course data?

Two key features

Two key features

Gradual change

Replication crisis

Linear regression: A brief review

Multilevel regression: Fixed effects

Multilevel regression: Random effects

Fixed vs. Random effects

Maximum Likelihood Estimation

A simple (linear) GCA example

A simple (linear) GCA example: Fit the models

A simple (linear) GCA example: Interpet results

A simple (linear) GCA example: Inspect model

A simple (linear) GCA example: Plot model fit

A simple (linear) GCA example: Compare model fits

Practice Exercises