Dan Mirman
24 January 2019
Aggregated binary outcomes (e.g., accuracy, fixation proportion) can look approximately continuous, but they
These properties can produce incorrect results in linear regression.
Outcome is log-odds (also called “logit”): \(logit(Yes, No) = \log \left(\frac{Yes}{No}\right)\)
Compare to proportions: \(p(Yes, No) = \frac{Yes}{Yes+No}\)
Note: logit is undefined (Inf
) when \(p=0\) or \(p=1\), this makes it hard to fit logistic models to data with such extreme values (e.g., fixations on objects that are very rarely fixated).
## Subject Time timeBin Condition meanFix
## 708 : 30 Min. : 300 Min. : 1 High:150 Min. :0.0286
## 712 : 30 1st Qu.: 450 1st Qu.: 4 Low :150 1st Qu.:0.2778
## 715 : 30 Median : 650 Median : 8 Median :0.4558
## 720 : 30 Mean : 650 Mean : 8 Mean :0.4483
## 722 : 30 3rd Qu.: 850 3rd Qu.:12 3rd Qu.:0.6111
## 725 : 30 Max. :1000 Max. :15 Max. :0.8286
## (Other):120
## sumFix N
## Min. : 1.0 Min. :33.0
## 1st Qu.:10.0 1st Qu.:35.8
## Median :16.0 Median :36.0
## Mean :15.9 Mean :35.5
## 3rd Qu.:21.2 3rd Qu.:36.0
## Max. :29.0 Max. :36.0
##
Logistic model code is almost the same as linear model code. Three differences:
glmer()
instead of lmer()
family=binomial
#make 3rd-order orth poly
TargetFix <- code_poly(TargetFix, predictor="timeBin", poly.order=3, draw.poly=F)
# fit logisitc GCA model
m.log <- glmer(cbind(sumFix, N-sumFix) ~ (poly1+poly2+poly3)*Condition +
(poly1+poly2+poly3 | Subject) +
(poly1+poly2 | Subject:Condition),
data=TargetFix, family=binomial)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 0.00859796
## (tol = 0.001, component 1)
Logistic GCA models are much slower to fit and are prone to convergence failure
summary(m.log)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula:
## cbind(sumFix, N - sumFix) ~ (poly1 + poly2 + poly3) * Condition +
## (poly1 + poly2 + poly3 | Subject) + (poly1 + poly2 | Subject:Condition)
## Data: TargetFix
##
## AIC BIC logLik deviance df.resid
## 1419.1 1508.0 -685.6 1371.1 276
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.7544 -0.4098 -0.0031 0.3786 2.0623
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject:Condition (Intercept) 0.03233 0.1798
## poly1 0.40183 0.6339 -0.68
## poly2 0.14804 0.3848 -0.23 0.73
## Subject (Intercept) 0.00175 0.0419
## poly1 0.34347 0.5861 1.00
## poly2 0.00200 0.0447 -1.00 -1.00
## poly3 0.02747 0.1657 -1.00 -1.00 1.00
## Number of obs: 300, groups: Subject:Condition, 20; Subject, 10
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.1168 0.0655 -1.78 0.07449 .
## poly1 2.8186 0.2983 9.45 < 2e-16 ***
## poly2 -0.5591 0.1695 -3.30 0.00097 ***
## poly3 -0.3208 0.1277 -2.51 0.01200 *
## ConditionLow -0.2615 0.0909 -2.88 0.00404 **
## poly1:ConditionLow 0.0638 0.3313 0.19 0.84723
## poly2:ConditionLow 0.6951 0.2398 2.90 0.00375 **
## poly3:ConditionLow -0.0707 0.1662 -0.43 0.67060
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) poly1 poly2 poly3 CndtnL pl1:CL pl2:CL
## poly1 -0.288
## poly2 -0.128 0.272
## poly3 -0.100 -0.228 -0.015
## ConditionLw -0.690 0.297 0.081 0.012
## ply1:CndtnL 0.372 -0.552 -0.292 -0.024 -0.541
## ply2:CndtnL 0.080 -0.230 -0.701 0.034 -0.116 0.415
## ply3:CndtnL 0.013 -0.020 0.037 -0.637 -0.003 0.031 -0.056
## convergence code: 0
## Model failed to converge with max|grad| = 0.00859796 (tol = 0.001, component 1)
In the original model, the Subject
random effects correlations were all 1.0 or -1.0. That suggests that the random effects structure is over-parameterized, which might be causing the convergence warning.
Try removing those correlations by using ||
in the random effects specification – this tells glmer()
not to estimate the correlations (NB: this only works if there are no factors on the left side of the double-pipe):
m.log_zc <- glmer(cbind(sumFix, N-sumFix) ~ (poly1+poly2+poly3)*Condition +
(poly1+poly2+poly3 || Subject) +
(poly1+poly2 | Subject:Condition),
data=TargetFix, family=binomial)
## singular fit
summary(m.log_zc)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula:
## cbind(sumFix, N - sumFix) ~ (poly1 + poly2 + poly3) * Condition +
## (poly1 + poly2 + poly3 || Subject) + (poly1 + poly2 | Subject:Condition)
## Data: TargetFix
##
## AIC BIC logLik deviance df.resid
## 1411.6 1478.3 -687.8 1375.6 282
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.6960 -0.4150 -0.0014 0.3369 2.0756
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject.Condition (Intercept) 3.40e-02 1.84e-01
## poly1 4.23e-01 6.51e-01 -0.63
## poly2 1.53e-01 3.91e-01 -0.25 0.70
## Subject poly3 5.54e-09 7.44e-05
## Subject.1 poly2 8.25e-09 9.08e-05
## Subject.2 poly1 4.45e-01 6.67e-01
## Subject.3 (Intercept) 2.08e-07 4.56e-04
## Number of obs: 300, groups: Subject:Condition, 20; Subject, 10
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.1177 0.0654 -1.80 0.0721 .
## poly1 2.8216 0.3182 8.87 <2e-16 ***
## poly2 -0.5589 0.1705 -3.28 0.0010 **
## poly3 -0.3134 0.1165 -2.69 0.0071 **
## ConditionLow -0.2606 0.0928 -2.81 0.0050 **
## poly1:ConditionLow 0.0659 0.3379 0.19 0.8455
## poly2:ConditionLow 0.6905 0.2421 2.85 0.0043 **
## poly3:ConditionLow -0.0666 0.1663 -0.40 0.6890
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) poly1 poly2 poly3 CndtnL pl1:CL pl2:CL
## poly1 -0.379
## poly2 -0.129 0.301
## poly3 -0.018 0.029 -0.054
## ConditionLw -0.705 0.267 0.092 0.012
## ply1:CndtnL 0.357 -0.528 -0.284 -0.027 -0.509
## ply2:CndtnL 0.092 -0.212 -0.703 0.038 -0.131 0.402
## ply3:CndtnL 0.012 -0.020 0.037 -0.699 -0.003 0.033 -0.056
## convergence code: 0
## singular fit
The fitted
function conveniently returns proportions from a logistic model, so plotting the model fit is easy:
ggplot(TargetFix, aes(Time, meanFix, color=Condition)) +
stat_summary(fun.data=mean_se, geom="pointrange") +
stat_summary(aes(y=fitted(m.log)), fun.y=mean, geom="line") +
stat_summary(aes(y=fitted(m.log_zc)), fun.y=mean, geom="line", linetype="dashed") +
theme_bw() + expand_limits(y=c(0,1)) +
labs(y="Fixation Proportion", x="Time since word onset (ms)")
The word learning accuracy data in WordLearnEx
are proportions from a binary response variable (correct/incorrect). Re-analyze these data using logistic GCA and compare with the linear GCA we used before.
You’ll need to convert the accuracy proportions to counts of correct and incorrect responses. To do that you need to know that there were 6 trials per block.
summary(WordLearnEx)
## Subject TP Block Accuracy
## 244 : 10 Low :280 Min. : 1.0 Min. :0.000
## 253 : 10 High:280 1st Qu.: 3.0 1st Qu.:0.667
## 302 : 10 Median : 5.5 Median :0.833
## 303 : 10 Mean : 5.5 Mean :0.805
## 305 : 10 3rd Qu.: 8.0 3rd Qu.:1.000
## 306 : 10 Max. :10.0 Max. :1.000
## (Other):500
# calculate number correct and incorrect
WordLearnEx <- WordLearnEx %>%
mutate(Correct = round(Accuracy * 6),
Error = 6 - Correct)
Re-analyze the word learning accuracy data in WordLearnEx
using logistic GCA and compare with the linear GCA we used before.
# prep for GCA
WordLearnEx <- code_poly(WordLearnEx, predictor="Block", poly.order=2, draw.poly=F)
# fit logistic model
wl.logit <- glmer(cbind(Correct, Error) ~ (poly1+poly2)*TP +
(poly1+poly2 | Subject),
data=WordLearnEx, family=binomial)
## singular fit
coef(summary(wl.logit))
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.58261 0.1982 7.9861 1.392e-15
## poly1 2.21221 0.3234 6.8403 7.904e-12
## poly2 -0.08209 0.2318 -0.3542 7.232e-01
## TPHigh 0.56565 0.2842 1.9904 4.654e-02
## poly1:TPHigh 0.34726 0.4474 0.7762 4.376e-01
## poly2:TPHigh -0.98955 0.3254 -3.0407 2.360e-03