Dan Mirman
14 March 2018
Individual differences provide an additional level of analysis for understanding phenomena
Example: National Youth Survey longitudinal data on tolerance for deviant behavior, exposure to deviant behavior, and gender. Subset of 16 subjects printed in Willett (1997, Table 11.1). (DeviantBehavior
)
Tolerance
was measured by asking whether it is wrong for someone their age to: cheat on tests, purposely destroy property of others, use marijuana, steal something worth less than $5, hit or threaten someone without reason, use alcohol, break into a building or vehicle to steal, sell hard drugs, or steal something worth more than $50.
## SubjID Gender Exposure Age Tolerance
## S0009 : 5 Female:45 Min. :0.810 Min. :11 Min. :1.00
## S0045 : 5 Male :35 1st Qu.:0.922 1st Qu.:12 1st Qu.:1.22
## S0268 : 5 Median :1.145 Median :13 Median :1.50
## S0314 : 5 Mean :1.191 Mean :13 Mean :1.62
## S0442 : 5 3rd Qu.:1.395 3rd Qu.:14 3rd Qu.:1.99
## S0514 : 5 Max. :1.990 Max. :15 Max. :3.46
## (Other):50
m.base <- lmer(Tolerance ~ Age*Gender + (Age | SubjID),
contrasts=list(Gender="contr.sum"), data=DeviantBehavior)
get_pvalues(m.base)
## Estimate Std..Error t.value p.normal p.normal.star
## (Intercept) -0.12563 0.52513 -0.2392 0.810928
## Age 0.13488 0.04409 3.0593 0.002218 **
## Gender1 0.35552 0.52513 0.6770 0.498406
## Age:Gender1 -0.03255 0.04409 -0.7382 0.460371
Significant increase with age, no significant effects of gender
Exposure
scale is not clear and other effects would be estimated at Exposure=0
, which is not an attested value.
If we center Exposure
, its effects and the effects of other predictors will be easier to interpret.
# center Exposure, but do not re-scale it
DeviantBehavior$Exposure.center <- scale(DeviantBehavior$Exposure, scale=FALSE)
# full model
m.exp1 <- lmer(Tolerance ~ Age*Gender*Exposure.center + (Age | SubjID),
data=DeviantBehavior, contrasts=list(Gender="contr.sum"))
get_pvalues(m.exp1)
## Estimate Std..Error t.value p.normal p.normal.star
## (Intercept) -0.38880 0.43008 -0.904 3.660e-01
## Age 0.15603 0.03492 4.468 7.881e-06 ***
## Gender1 0.65192 0.43008 1.516 1.296e-01
## Exposure.center -4.22668 1.47541 -2.865 4.173e-03 **
## Age:Gender1 -0.05921 0.03492 -1.696 8.993e-02 .
## Age:Exposure.center 0.38532 0.11979 3.217 1.297e-03 **
## Gender1:Exposure.center 3.62705 1.47541 2.458 1.396e-02 *
## Age:Gender1:Exposure.center -0.28576 0.11979 -2.385 1.706e-02 *
The three-way Age-by-Gender-by-Exposure interaction is a relationship among four variables (Tolerance for deviant behavior, Exposure to deviant behavior, Age, and Gender), three of which are continuous variables. This is hard to visualize.
To make it easier to visualize, can split the Exposure
into levels
# a simple median split
DeviantBehavior$ExposureMedSpl <- ifelse(
DeviantBehavior$Exposure >= median(DeviantBehavior$Exposure), "High", "Low")
# plot the interaction
ggplot(DeviantBehavior, aes(Age, Tolerance, color=ExposureMedSpl)) + facet_wrap(~ Gender) +
stat_summary(fun.y=mean, geom="point", size=3) + stat_summary(fun.y=mean, geom="line") +
stat_summary(fun.data=mean_se, geom="errorbar", width=0.2) +
labs(x="Age", y="Tolerance for deviant behavior") +
theme_bw() + legend_positioning(c(0,1))
Adolescents with higher Exposure to deviant behavior tend to have increased Tolerance for deviant behavior as they get older, and this is more true for males than females.
To make it easier to visualize, can split the Exposure
into levels: tertile split
# define break points
b <- quantile(DeviantBehavior$Exposure, probs=seq(0, 1, by=1/3))
# split continuous predictor and provide level labels
DeviantBehavior$Exposure3 <- cut(DeviantBehavior$Exposure,
breaks=b, include.lowest=T,
labels=c("Low", "Medium", "High"))
# make the plot
ggplot(DeviantBehavior, aes(Age, Tolerance, color=Exposure3)) + facet_wrap(~ Gender) +
stat_summary(fun.y=mean, geom="point", size=3) + stat_summary(fun.y=mean, geom="line") +
stat_summary(fun.data=mean_se, geom="errorbar", width=0.2) +
labs(x="Age", y="Tolerance for deviant behavior") +
theme_bw() + legend_positioning(c(0,1))
For such situations, random effects provide a way to quantify individual effect sizes in the context of a model of overall group performance.
A simple example:
Participant A: \(\zeta_{A1} - \zeta_{A0} = 1 - (-1) = 2\)
Participant B: \(\zeta_{B1} - \zeta_{B0} = (-1) - 1 = -2\)
Data: Function and Thematic competition for 17 LH stroke patients (FunctThemePts
)
## subj Condition Object Time
## 206 : 486 Function:4113 Target :2733 Min. :-1000
## 281 : 486 Thematic:4086 Competitor:2733 1st Qu.: 0
## 419 : 486 Unrelated :2733 Median : 1000
## 1088 : 486 Mean : 1000
## 1238 : 486 3rd Qu.: 2000
## 1392 : 486 Max. : 3000
## (Other):5283
## timeBin meanFix sumFix N
## Min. : 0 Min. :0.0000 Min. : 0.00 Min. :12.0
## 1st Qu.:20 1st Qu.:0.0312 1st Qu.: 1.00 1st Qu.:15.0
## Median :40 Median :0.1250 Median : 2.00 Median :16.0
## Mean :40 Mean :0.1777 Mean : 3.26 Mean :15.4
## 3rd Qu.:60 3rd Qu.:0.2500 3rd Qu.: 5.00 3rd Qu.:16.0
## Max. :80 Max. :1.0000 Max. :16.00 Max. :16.0
##
Question: is there a correlation between Function and Thematic competition across patients?
#prep data for GCA
FunctThemePts.gca <- code_poly(subset(FunctThemePts, Time >= 500 & Time <= 2000 &
Object != "Target"),
predictor="Time", poly.order=4, draw.poly = FALSE)
# function competition: full model
m.funct <- lmer(meanFix ~ (poly1+poly2+poly3+poly4)*Object +
(poly1+poly2+poly3+poly4 | subj) + (poly1+poly2 | subj:Object),
data=subset(FunctThemePts.gca, Condition == "Function"), REML=F)
## Warning in optwrap(optimizer, devfun, getStart(start, rho$lower, rho$pp), :
## convergence code 1 from bobyqa: bobyqa -- maximum number of function
## evaluations exceeded
# thematic competition: full model
m.theme <- lmer(meanFix ~ (poly1+poly2+poly3+poly4)*Object +
(poly1+poly2+poly3+poly4 | subj) + (poly1+poly2 | subj:Object),
data=subset(FunctThemePts.gca, Condition == "Thematic"), REML=F)
str(ranef(m.funct), vec.len=2)
## List of 2
## $ subj:Object:'data.frame': 34 obs. of 3 variables:
## ..$ (Intercept): num [1:34] -0.0327 0.0254 ...
## ..$ poly1 : num [1:34] 0.0339 0.0726 ...
## ..$ poly2 : num [1:34] -0.1538 0.0143 ...
## $ subj :'data.frame': 17 obs. of 5 variables:
## ..$ (Intercept): num [1:17] 0.025 -0.0134 ...
## ..$ poly1 : num [1:17] 0.0924 -0.0454 ...
## ..$ poly2 : num [1:17] -0.1915 0.0375 ...
## ..$ poly3 : num [1:17] 0.1124 -0.0677 ...
## ..$ poly4 : num [1:17] 0.0386 0.0474 ...
## - attr(*, "class")= chr "ranef.mer"
head(ranef(m.funct)$"subj:Object")
## (Intercept) poly1 poly2
## 206:Competitor -0.0327460 0.03386 -0.153809
## 206:Unrelated 0.0253878 0.07261 0.014314
## 281:Competitor 0.0066498 -0.04395 -0.059606
## 281:Unrelated -0.0200312 -0.05530 0.115819
## 419:Competitor -0.0006385 0.24667 -0.001246
## 419:Unrelated -0.0092935 -0.15294 0.056835
We can use these to calculate individual effect sizes, just like in the simple demo example.
Use psy811::get_ranef()
, which is a helper function for extracting random effects
re.funct <- get_ranef(m.funct, "subj:Object")
head(re.funct)
## subj Object Intercept poly1 poly2
## 206:Competitor 206 Competitor -0.0327460 0.03386 -0.153809
## 206:Unrelated 206 Unrelated 0.0253878 0.07261 0.014314
## 281:Competitor 281 Competitor 0.0066498 -0.04395 -0.059606
## 281:Unrelated 281 Unrelated -0.0200312 -0.05530 0.115819
## 419:Competitor 419 Competitor -0.0006385 0.24667 -0.001246
## 419:Unrelated 419 Unrelated -0.0092935 -0.15294 0.056835
Use plyr::ddply
to compute individual effect size as random effects difference between the two Conditions for each Subject for Intercept and Linear time term
ES.funct <- ddply(re.funct, c("subj"), summarize,
Function_Intercept = Intercept[Object=="Competitor"] - Intercept[Object=="Unrelated"],
Function_Linear = poly1[Object=="Competitor"] - poly1[Object=="Unrelated"])
head(ES.funct)
## subj Function_Intercept Function_Linear
## 1 206 -0.058134 -0.03875
## 2 281 0.026681 0.01135
## 3 419 0.008655 0.39961
## 4 1088 -0.003283 -0.15628
## 5 1238 -0.013349 -0.13986
## 6 1392 -0.003196 0.19123
Repeat those steps for Thematic condition
re.theme <- get_ranef(m.theme, "subj:Object")
ES.theme <- ddply(re.theme, c("subj"), summarize,
Thematic_Intercept = Intercept[Object=="Competitor"] - Intercept[Object=="Unrelated"],
Thematic_Linear = poly1[Object=="Competitor"] - poly1[Object=="Unrelated"])
head(ES.theme)
## subj Thematic_Intercept Thematic_Linear
## 1 206 0.030963 -0.152879
## 2 281 0.015367 0.003994
## 3 419 -0.001865 -0.145997
## 4 1088 -0.084598 -0.061909
## 5 1238 -0.022051 -0.015554
## 6 1392 0.061526 -0.353137
# combine condition effect sizes into one data frame
ES <- merge(ES.funct, ES.theme)
# compute correlations between effect sizes
cor.test(ES$Function_Intercept, ES$Thematic_Intercept)
##
## Pearson's product-moment correlation
##
## data: ES$Function_Intercept and ES$Thematic_Intercept
## t = -2.4, df = 15, p-value = 0.03
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.8007 -0.0530
## sample estimates:
## cor
## -0.5204
cor.test(ES$Function_Linear, ES$Thematic_Linear)
##
## Pearson's product-moment correlation
##
## data: ES$Function_Linear and ES$Thematic_Linear
## t = -3.4, df = 15, p-value = 0.004
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.8637 -0.2545
## sample estimates:
## cor
## -0.655
A two-panel scatterplot that shows correlations between Function and Thematic competition effect sizes on each of the terms (Intercept, Linear).
# need to re-arrange the data
library(reshape2)
ES.m <- melt(ES, id="subj") # "melt" into long form
ES.m <- cbind(ES.m, colsplit(ES.m$variable, "_", c("Condition", "Term"))) # split labels
ES.c <- dcast(ES.m, subj + Term ~ Condition) # "cast" so conditions are in separate columns
# now can make the plot
ggplot(ES.c, aes(Function, Thematic)) +
facet_wrap(~ Term, scales="free") + geom_point() +
stat_smooth(method="lm", se=F) # add trendline
Why bother with the model-fitting when you could just take the difference of the observed data to get effect sizes?
Individual participant Function
competition effect sizes
The CohortRhyme
data set contains data from an eye-tracking experiment (Mirman et al., 2011) that investigated phonological competition between cohorts (e.g., penny - pencil) and rhymes (e.g., carrot - parrot).
Three groups of participants were tested: 5 Broca’s aphasics, 3 Wernicke’s aphasics, and 12 control participants.
summary(CohortRhyme)
## subjID Group Time timeBin
## 101 : 80 Control :960 Min. : 50 Min. : 1.00
## 102 : 80 Broca :400 1st Qu.: 525 1st Qu.: 5.75
## 103 : 80 Wernicke:240 Median :1000 Median :10.50
## 104 : 80 Mean :1000 Mean :10.50
## 105 : 80 3rd Qu.:1475 3rd Qu.:15.25
## 106 : 80 Max. :1950 Max. :20.00
## (Other):1120
## Object Type FixProp
## Competitor:800 Cohort:800 Min. :0.0000
## Unrelated :800 Rhyme :800 1st Qu.:0.0000
## Target : 0 Median :0.0417
## Mean :0.0665
## 3rd Qu.:0.1000
## Max. :0.5833
##