Logistic GCA

Dan Mirman

24 January 2019

Why logistic regression? (A brief review)

Aggregated binary outcomes (e.g., accuracy, fixation proportion) can look approximately continuous, but they

Why logistic regression? (A brief review)

These properties can produce incorrect results in linear regression.

Logistic regression (A brief review)

Outcome is log-odds (also called “logit”): \(logit(Yes, No) = \log \left(\frac{Yes}{No}\right)\)

Compare to proportions: \(p(Yes, No) = \frac{Yes}{Yes+No}\)

Note: logit is undefined (Inf) when \(p=0\) or \(p=1\), this makes it hard to fit logistic models to data with such extreme values (e.g., fixations on objects that are very rarely fixated).

Logistic GCA: Data

##     Subject         Time         timeBin   Condition     meanFix      
##  708    : 30   Min.   : 300   Min.   : 1   High:150   Min.   :0.0286  
##  712    : 30   1st Qu.: 450   1st Qu.: 4   Low :150   1st Qu.:0.2778  
##  715    : 30   Median : 650   Median : 8              Median :0.4558  
##  720    : 30   Mean   : 650   Mean   : 8              Mean   :0.4483  
##  722    : 30   3rd Qu.: 850   3rd Qu.:12              3rd Qu.:0.6111  
##  725    : 30   Max.   :1000   Max.   :15              Max.   :0.8286  
##  (Other):120                                                          
##      sumFix           N       
##  Min.   : 1.0   Min.   :33.0  
##  1st Qu.:10.0   1st Qu.:35.8  
##  Median :16.0   Median :36.0  
##  Mean   :15.9   Mean   :35.5  
##  3rd Qu.:21.2   3rd Qu.:36.0  
##  Max.   :29.0   Max.   :36.0  
## 

Logistic GCA: Fit the model

Logistic model code is almost the same as linear model code. Three differences:

  1. glmer() instead of lmer()
  2. outcome is 1/0 or aggregated number of 1s, 0s
  3. add family=binomial
#make 3rd-order orth poly
TargetFix <- code_poly(TargetFix, predictor="timeBin", poly.order=3, draw.poly=F)
# fit logisitc GCA model
m.log <- glmer(cbind(sumFix, N-sumFix) ~ (poly1+poly2+poly3)*Condition +
                 (poly1+poly2+poly3 | Subject) +
                 (poly1+poly2 | Subject:Condition),
               data=TargetFix, family=binomial)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00859796
## (tol = 0.001, component 1)

Logistic GCA models are much slower to fit and are prone to convergence failure

Logistic GCA: Model summary

summary(m.log)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## cbind(sumFix, N - sumFix) ~ (poly1 + poly2 + poly3) * Condition +  
##     (poly1 + poly2 + poly3 | Subject) + (poly1 + poly2 | Subject:Condition)
##    Data: TargetFix
## 
##      AIC      BIC   logLik deviance df.resid 
##   1419.1   1508.0   -685.6   1371.1      276 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.7544 -0.4098 -0.0031  0.3786  2.0623 
## 
## Random effects:
##  Groups            Name        Variance Std.Dev. Corr             
##  Subject:Condition (Intercept) 0.03233  0.1798                    
##                    poly1       0.40183  0.6339   -0.68            
##                    poly2       0.14804  0.3848   -0.23  0.73      
##  Subject           (Intercept) 0.00175  0.0419                    
##                    poly1       0.34347  0.5861    1.00            
##                    poly2       0.00200  0.0447   -1.00 -1.00      
##                    poly3       0.02747  0.1657   -1.00 -1.00  1.00
## Number of obs: 300, groups:  Subject:Condition, 20; Subject, 10
## 
## Fixed effects:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -0.1168     0.0655   -1.78  0.07449 .  
## poly1                2.8186     0.2983    9.45  < 2e-16 ***
## poly2               -0.5591     0.1695   -3.30  0.00097 ***
## poly3               -0.3208     0.1277   -2.51  0.01200 *  
## ConditionLow        -0.2615     0.0909   -2.88  0.00404 ** 
## poly1:ConditionLow   0.0638     0.3313    0.19  0.84723    
## poly2:ConditionLow   0.6951     0.2398    2.90  0.00375 ** 
## poly3:ConditionLow  -0.0707     0.1662   -0.43  0.67060    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) poly1  poly2  poly3  CndtnL pl1:CL pl2:CL
## poly1       -0.288                                          
## poly2       -0.128  0.272                                   
## poly3       -0.100 -0.228 -0.015                            
## ConditionLw -0.690  0.297  0.081  0.012                     
## ply1:CndtnL  0.372 -0.552 -0.292 -0.024 -0.541              
## ply2:CndtnL  0.080 -0.230 -0.701  0.034 -0.116  0.415       
## ply3:CndtnL  0.013 -0.020  0.037 -0.637 -0.003  0.031 -0.056
## convergence code: 0
## Model failed to converge with max|grad| = 0.00859796 (tol = 0.001, component 1)

Logistic GCA: Simplify random effects

In the original model, the Subject random effects correlations were all 1.0 or -1.0. That suggests that the random effects structure is over-parameterized, which might be causing the convergence warning.

Try removing those correlations by using || in the random effects specification – this tells glmer() not to estimate the correlations (NB: this only works if there are no factors on the left side of the double-pipe):

m.log_zc <- glmer(cbind(sumFix, N-sumFix) ~ (poly1+poly2+poly3)*Condition +
                 (poly1+poly2+poly3 || Subject) +
                 (poly1+poly2 | Subject:Condition),
               data=TargetFix, family=binomial)
## singular fit
summary(m.log_zc)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## cbind(sumFix, N - sumFix) ~ (poly1 + poly2 + poly3) * Condition +  
##     (poly1 + poly2 + poly3 || Subject) + (poly1 + poly2 | Subject:Condition)
##    Data: TargetFix
## 
##      AIC      BIC   logLik deviance df.resid 
##   1411.6   1478.3   -687.8   1375.6      282 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.6960 -0.4150 -0.0014  0.3369  2.0756 
## 
## Random effects:
##  Groups            Name        Variance Std.Dev. Corr       
##  Subject.Condition (Intercept) 3.40e-02 1.84e-01            
##                    poly1       4.23e-01 6.51e-01 -0.63      
##                    poly2       1.53e-01 3.91e-01 -0.25  0.70
##  Subject           poly3       5.54e-09 7.44e-05            
##  Subject.1         poly2       8.25e-09 9.08e-05            
##  Subject.2         poly1       4.45e-01 6.67e-01            
##  Subject.3         (Intercept) 2.08e-07 4.56e-04            
## Number of obs: 300, groups:  Subject:Condition, 20; Subject, 10
## 
## Fixed effects:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -0.1177     0.0654   -1.80   0.0721 .  
## poly1                2.8216     0.3182    8.87   <2e-16 ***
## poly2               -0.5589     0.1705   -3.28   0.0010 ** 
## poly3               -0.3134     0.1165   -2.69   0.0071 ** 
## ConditionLow        -0.2606     0.0928   -2.81   0.0050 ** 
## poly1:ConditionLow   0.0659     0.3379    0.19   0.8455    
## poly2:ConditionLow   0.6905     0.2421    2.85   0.0043 ** 
## poly3:ConditionLow  -0.0666     0.1663   -0.40   0.6890    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) poly1  poly2  poly3  CndtnL pl1:CL pl2:CL
## poly1       -0.379                                          
## poly2       -0.129  0.301                                   
## poly3       -0.018  0.029 -0.054                            
## ConditionLw -0.705  0.267  0.092  0.012                     
## ply1:CndtnL  0.357 -0.528 -0.284 -0.027 -0.509              
## ply2:CndtnL  0.092 -0.212 -0.703  0.038 -0.131  0.402       
## ply3:CndtnL  0.012 -0.020  0.037 -0.699 -0.003  0.033 -0.056
## convergence code: 0
## singular fit

Logistic GCA: Plot model fit

The fitted function conveniently returns proportions from a logistic model, so plotting the model fit is easy:

ggplot(TargetFix, aes(Time, meanFix, color=Condition)) +
  stat_summary(fun.data=mean_se, geom="pointrange") +
  stat_summary(aes(y=fitted(m.log)), fun.y=mean, geom="line") +
  stat_summary(aes(y=fitted(m.log_zc)), fun.y=mean, geom="line", linetype="dashed") +
  theme_bw() + expand_limits(y=c(0,1)) + 
  labs(y="Fixation Proportion", x="Time since word onset (ms)")

Exercise 3

The word learning accuracy data in WordLearnEx are proportions from a binary response variable (correct/incorrect). Re-analyze these data using logistic GCA and compare with the linear GCA we used before.

You’ll need to convert the accuracy proportions to counts of correct and incorrect responses. To do that you need to know that there were 6 trials per block.

summary(WordLearnEx)
##     Subject       TP          Block         Accuracy    
##  244    : 10   Low :280   Min.   : 1.0   Min.   :0.000  
##  253    : 10   High:280   1st Qu.: 3.0   1st Qu.:0.667  
##  302    : 10              Median : 5.5   Median :0.833  
##  303    : 10              Mean   : 5.5   Mean   :0.805  
##  305    : 10              3rd Qu.: 8.0   3rd Qu.:1.000  
##  306    : 10              Max.   :10.0   Max.   :1.000  
##  (Other):500
# calculate number correct and incorrect
WordLearnEx <- WordLearnEx %>% 
  mutate(Correct = round(Accuracy * 6), 
         Error = 6 - Correct)

Exercise 3 Solution

Re-analyze the word learning accuracy data in WordLearnEx using logistic GCA and compare with the linear GCA we used before.

# prep for GCA
WordLearnEx <- code_poly(WordLearnEx, predictor="Block", poly.order=2, draw.poly=F)
# fit logistic model
wl.logit <- glmer(cbind(Correct, Error) ~ (poly1+poly2)*TP + 
                    (poly1+poly2 | Subject),
                  data=WordLearnEx, family=binomial)
## singular fit
coef(summary(wl.logit))
##              Estimate Std. Error z value  Pr(>|z|)
## (Intercept)   1.58261     0.1982  7.9861 1.392e-15
## poly1         2.21221     0.3234  6.8403 7.904e-12
## poly2        -0.08209     0.2318 -0.3542 7.232e-01
## TPHigh        0.56565     0.2842  1.9904 4.654e-02
## poly1:TPHigh  0.34726     0.4474  0.7762 4.376e-01
## poly2:TPHigh -0.98955     0.3254 -3.0407 2.360e-03