Growth Curve Analysis, Part 4

Dan Mirman

22 May 2020

Schedule

Schedule
Time Topic Data set(s)
9-9:30am Introduction Visual Search
9:30-10am Exercise WISQARS, Naming Recovery
10-10:45am Non-linear change Word Learning
10:45-11:30am Exercise CP
11:30-noon Within-subject effects Target Fixation
12-1pm Lunch break; Exercise Az
1-1:45pm Logistic GCA; Exercise Target Fixation; Word Learning
1:45-2:30pm Individual Differences Deviant Behavior; School Mental Health
2:30-3pm Exercise; Open Q&A NA

Why logistic regression? (A brief review)

Aggregated binary outcomes (e.g., accuracy, fixation proportion) can look approximately continuous, but they

Why logistic regression? (A brief review)

These properties can produce incorrect results in linear regression.

Logistic regression (A brief review)

Outcome is log-odds (also called “logit”): \(logit(Yes, No) = \log \left(\frac{Yes}{No}\right)\)

Compare to proportions: \(p(Yes, No) = \frac{Yes}{Yes+No}\)

Note: logit is undefined (Inf) when \(p=0\) or \(p=1\), this makes it hard to fit logistic models to data with such extreme values (e.g., fixations on objects that are very rarely fixated).

Logistic GCA: Data

##     Subject         Time         timeBin   Condition     meanFix      
##  708    : 30   Min.   : 300   Min.   : 1   High:150   Min.   :0.0286  
##  712    : 30   1st Qu.: 450   1st Qu.: 4   Low :150   1st Qu.:0.2778  
##  715    : 30   Median : 650   Median : 8              Median :0.4558  
##  720    : 30   Mean   : 650   Mean   : 8              Mean   :0.4483  
##  722    : 30   3rd Qu.: 850   3rd Qu.:12              3rd Qu.:0.6111  
##  725    : 30   Max.   :1000   Max.   :15              Max.   :0.8286  
##  (Other):120                                                          
##      sumFix           N       
##  Min.   : 1.0   Min.   :33.0  
##  1st Qu.:10.0   1st Qu.:35.8  
##  Median :16.0   Median :36.0  
##  Mean   :15.9   Mean   :35.5  
##  3rd Qu.:21.2   3rd Qu.:36.0  
##  Max.   :29.0   Max.   :36.0  
## 

Logistic GCA: Fit the model

Logistic model code is almost the same as linear model code. Three differences:

  1. glmer() instead of lmer()
  2. outcome is 1/0 or aggregated number of 1s, 0s
  3. add family=binomial
#make 3rd-order orth poly
TargetFix <- code_poly(TargetFix, predictor="timeBin", poly.order=3, draw.poly=F)
# fit logisitc GCA model
m.log <- glmer(cbind(sumFix, N-sumFix) ~ (poly1+poly2+poly3)*Condition +
                 (poly1+poly2+poly3 | Subject) +
                 (poly1+poly2 | Subject:Condition),
               data=TargetFix, family=binomial)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00859796 (tol = 0.002, component 1)

Logistic GCA models are much slower to fit and are prone to convergence failure

Logistic GCA: Model summary

summary(m.log)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: cbind(sumFix, N - sumFix) ~ (poly1 + poly2 + poly3) * Condition +  
##     (poly1 + poly2 + poly3 | Subject) + (poly1 + poly2 | Subject:Condition)
##    Data: TargetFix
## 
##      AIC      BIC   logLik deviance df.resid 
##   1419.1   1508.0   -685.6   1371.1      276 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.7544 -0.4098 -0.0031  0.3786  2.0623 
## 
## Random effects:
##  Groups            Name        Variance Std.Dev. Corr             
##  Subject:Condition (Intercept) 0.03233  0.1798                    
##                    poly1       0.40183  0.6339   -0.68            
##                    poly2       0.14804  0.3848   -0.23  0.73      
##  Subject           (Intercept) 0.00175  0.0419                    
##                    poly1       0.34347  0.5861    1.00            
##                    poly2       0.00200  0.0447   -1.00 -1.00      
##                    poly3       0.02747  0.1657   -1.00 -1.00  1.00
## Number of obs: 300, groups:  Subject:Condition, 20; Subject, 10
## 
## Fixed effects:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -0.1168     0.0655   -1.78  0.07449 .  
## poly1                2.8186     0.2983    9.45  < 2e-16 ***
## poly2               -0.5591     0.1695   -3.30  0.00097 ***
## poly3               -0.3208     0.1277   -2.51  0.01200 *  
## ConditionLow        -0.2615     0.0909   -2.88  0.00404 ** 
## poly1:ConditionLow   0.0638     0.3313    0.19  0.84723    
## poly2:ConditionLow   0.6951     0.2398    2.90  0.00375 ** 
## poly3:ConditionLow  -0.0707     0.1662   -0.43  0.67060    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) poly1  poly2  poly3  CndtnL pl1:CL pl2:CL
## poly1       -0.288                                          
## poly2       -0.128  0.272                                   
## poly3       -0.100 -0.228 -0.015                            
## ConditionLw -0.690  0.297  0.081  0.012                     
## ply1:CndtnL  0.372 -0.552 -0.292 -0.024 -0.541              
## ply2:CndtnL  0.080 -0.230 -0.701  0.034 -0.116  0.415       
## ply3:CndtnL  0.013 -0.020  0.037 -0.637 -0.003  0.031 -0.056
## convergence code: 0
## Model failed to converge with max|grad| = 0.00859796 (tol = 0.002, component 1)

Logistic GCA: Simplify random effects

In the original model, the Subject random effects correlations were all 1.0 or -1.0. That suggests that the random effects structure is over-parameterized, which might be causing the convergence warning.

Try removing those correlations by using || in the random effects specification – this tells glmer() not to estimate the correlations (NB: this only works if there are no factors on the left side of the double-pipe):

m.log_zc <- glmer(cbind(sumFix, N-sumFix) ~ (poly1+poly2+poly3)*Condition +
                 (poly1+poly2+poly3 || Subject) +
                 (poly1+poly2 | Subject:Condition),
               data=TargetFix, family=binomial)
## boundary (singular) fit: see ?isSingular
summary(m.log_zc)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: cbind(sumFix, N - sumFix) ~ (poly1 + poly2 + poly3) * Condition +  
##     (poly1 + poly2 + poly3 || Subject) + (poly1 + poly2 | Subject:Condition)
##    Data: TargetFix
## 
##      AIC      BIC   logLik deviance df.resid 
##   1411.6   1478.3   -687.8   1375.6      282 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.6960 -0.4150 -0.0014  0.3369  2.0756 
## 
## Random effects:
##  Groups            Name        Variance Std.Dev. Corr       
##  Subject.Condition (Intercept) 3.40e-02 1.84e-01            
##                    poly1       4.23e-01 6.51e-01 -0.63      
##                    poly2       1.53e-01 3.91e-01 -0.25  0.70
##  Subject           poly3       5.54e-09 7.44e-05            
##  Subject.1         poly2       8.25e-09 9.08e-05            
##  Subject.2         poly1       4.45e-01 6.67e-01            
##  Subject.3         (Intercept) 2.08e-07 4.56e-04            
## Number of obs: 300, groups:  Subject:Condition, 20; Subject, 10
## 
## Fixed effects:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -0.1177     0.0654   -1.80   0.0721 .  
## poly1                2.8216     0.3182    8.87   <2e-16 ***
## poly2               -0.5589     0.1705   -3.28   0.0010 ** 
## poly3               -0.3134     0.1165   -2.69   0.0071 ** 
## ConditionLow        -0.2606     0.0928   -2.81   0.0050 ** 
## poly1:ConditionLow   0.0659     0.3379    0.19   0.8455    
## poly2:ConditionLow   0.6905     0.2421    2.85   0.0043 ** 
## poly3:ConditionLow  -0.0666     0.1663   -0.40   0.6890    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) poly1  poly2  poly3  CndtnL pl1:CL pl2:CL
## poly1       -0.379                                          
## poly2       -0.129  0.301                                   
## poly3       -0.018  0.029 -0.054                            
## ConditionLw -0.705  0.267  0.092  0.012                     
## ply1:CndtnL  0.357 -0.528 -0.284 -0.027 -0.509              
## ply2:CndtnL  0.092 -0.212 -0.703  0.038 -0.131  0.402       
## ply3:CndtnL  0.012 -0.020  0.037 -0.699 -0.003  0.033 -0.056
## convergence code: 0
## boundary (singular) fit: see ?isSingular

Logistic GCA: Plot model fit

The fitted function conveniently returns proportions from a logistic model, so plotting the model fit is easy:

ggplot(TargetFix, aes(Time, meanFix, color=Condition)) +
  stat_summary(fun.data=mean_se, geom="pointrange") +
  stat_summary(aes(y=fitted(m.log)), fun=mean, geom="line") +
  stat_summary(aes(y=fitted(m.log_zc)), fun=mean, geom="line", linetype="dashed") +
  theme_bw() + expand_limits(y=c(0,1)) + 
  labs(y="Fixation Proportion", x="Time since word onset (ms)")

Exercise 4

The word learning accuracy data in WordLearnEx are proportions from a binary response variable (correct/incorrect). Re-analyze these data using logistic GCA and compare with the linear GCA we used before.

You’ll need to convert the accuracy proportions to counts of correct and incorrect responses. To do that you need to know that there were 6 trials per block.

##     Subject       TP          Block         Accuracy    
##  244    : 10   Low :280   Min.   : 1.0   Min.   :0.000  
##  253    : 10   High:280   1st Qu.: 3.0   1st Qu.:0.667  
##  302    : 10              Median : 5.5   Median :0.833  
##  303    : 10              Mean   : 5.5   Mean   :0.805  
##  305    : 10              3rd Qu.: 8.0   3rd Qu.:1.000  
##  306    : 10              Max.   :10.0   Max.   :1.000  
##  (Other):500

Calculate number correct and incorrect

# WordLearnEx$Correct <- round(WordLearnEx$Accuracy * 6)
# WordLearnEx$Error <- 6 - WordLearnEx$Correct
WordLearnEx <- mutate(WordLearnEx, 
                      Correct = round(Accuracy*6),
                      Error = 6-Correct)