Dan Mirman
24 January 2019
Individual differences provide an additional level of analysis for understanding phenomena
Example: National Youth Survey longitudinal data on tolerance for deviant behavior, exposure to deviant behavior, and gender. Subset of 16 subjects printed in Willett (1997, Table 11.1). (DeviantBehavior
: http://dmirman.github.io/ET2019/DeviantBehavior.rda)
Tolerance
was measured by asking whether it is wrong for someone their age to: cheat on tests, purposely destroy property of others, use marijuana, steal something worth less than $5, hit or threaten someone without reason, use alcohol, break into a building or vehicle to steal, sell hard drugs, or steal something worth more than $50.
## SubjID Gender Exposure Age Tolerance
## S0009 : 5 Female:45 Min. :0.810 Min. :11 Min. :1.00
## S0045 : 5 Male :35 1st Qu.:0.922 1st Qu.:12 1st Qu.:1.22
## S0268 : 5 Median :1.145 Median :13 Median :1.50
## S0314 : 5 Mean :1.191 Mean :13 Mean :1.62
## S0442 : 5 3rd Qu.:1.395 3rd Qu.:14 3rd Qu.:1.99
## S0514 : 5 Max. :1.990 Max. :15 Max. :3.46
## (Other):50
m.base <- lmer(Tolerance ~ Age*Gender + (Age | SubjID),
contrasts=list(Gender="contr.sum"), data=DeviantBehavior)
get_pvalues(m.base)
## Estimate Std..Error t.value p.normal p.normal.star
## (Intercept) -0.12563 0.52513 -0.2392 0.810928
## Age 0.13488 0.04409 3.0593 0.002218 **
## Gender1 0.35552 0.52513 0.6770 0.498406
## Age:Gender1 -0.03255 0.04409 -0.7382 0.460371
Significant increase with age, no significant effects of gender
Exposure
scale is not clear and other effects would be estimated at Exposure=0
, which is not an attested value.
If we center Exposure
, its effects and the effects of other predictors will be easier to interpret.
# center Exposure, but do not re-scale it
DeviantBehavior$Exposure.center <- scale(DeviantBehavior$Exposure, scale=FALSE)
# full model
m.exp1 <- lmer(Tolerance ~ Age*Gender*Exposure.center + (Age | SubjID),
data=DeviantBehavior, contrasts=list(Gender="contr.sum"))
get_pvalues(m.exp1)
## Estimate Std..Error t.value p.normal p.normal.star
## (Intercept) -0.38880 0.43008 -0.904 3.660e-01
## Age 0.15603 0.03492 4.468 7.881e-06 ***
## Gender1 0.65192 0.43008 1.516 1.296e-01
## Exposure.center -4.22668 1.47541 -2.865 4.173e-03 **
## Age:Gender1 -0.05921 0.03492 -1.696 8.993e-02 .
## Age:Exposure.center 0.38532 0.11979 3.217 1.297e-03 **
## Gender1:Exposure.center 3.62705 1.47541 2.458 1.396e-02 *
## Age:Gender1:Exposure.center -0.28576 0.11979 -2.385 1.706e-02 *
The three-way Age-by-Gender-by-Exposure interaction is a relationship among four variables (Tolerance for deviant behavior, Exposure to deviant behavior, Age, and Gender), three of which are continuous variables. This is hard to visualize.
To make it easier to visualize, can split the Exposure
into levels
# a simple median split
DeviantBehavior$ExposureMedSpl <- ifelse(
DeviantBehavior$Exposure >= median(DeviantBehavior$Exposure), "High", "Low")
# plot the interaction
ggplot(DeviantBehavior, aes(Age, Tolerance, color=ExposureMedSpl)) + facet_wrap(~ Gender) +
stat_summary(fun.y=mean, geom="point", size=3) + stat_summary(fun.y=mean, geom="line") +
stat_summary(fun.data=mean_se, geom="errorbar", width=0.2) +
labs(x="Age", y="Tolerance for deviant behavior") +
theme_bw() + legend_positioning(c(0,1))
Adolescents with higher Exposure to deviant behavior tend to have increased Tolerance for deviant behavior as they get older, and this is more true for males than females.
To make it easier to visualize, can split the Exposure
into levels: tertile split
# define break points
b <- quantile(DeviantBehavior$Exposure, probs=seq(0, 1, by=1/3))
# split continuous predictor and provide level labels
DeviantBehavior$Exposure3 <- cut(DeviantBehavior$Exposure,
breaks=b, include.lowest=T,
labels=c("Low", "Medium", "High"))
# make the plot
ggplot(DeviantBehavior, aes(Age, Tolerance, color=Exposure3)) + facet_wrap(~ Gender) +
stat_summary(fun.y=mean, geom="point", size=3) + stat_summary(fun.y=mean, geom="line") +
stat_summary(fun.data=mean_se, geom="errorbar", width=0.2) +
labs(x="Age", y="Tolerance for deviant behavior") +
theme_bw() + legend_positioning(c(0,1))
For such situations, random effects provide a way to quantify individual effect sizes in the context of a model of overall group performance.
A simple example:
Participant A: \(\zeta_{A1} - \zeta_{A0} = 1 - (-1) = 2\)
Participant B: \(\zeta_{B1} - \zeta_{B0} = (-1) - 1 = -2\)
Data: Function and Thematic competition for 17 LH stroke patients (FunctThemePts
)
## subj Condition Object Time
## 206 : 486 Function:4113 Target :2733 Min. :-1000
## 281 : 486 Thematic:4086 Competitor:2733 1st Qu.: 0
## 419 : 486 Unrelated :2733 Median : 1000
## 1088 : 486 Mean : 1000
## 1238 : 486 3rd Qu.: 2000
## 1392 : 486 Max. : 3000
## (Other):5283
## timeBin meanFix sumFix N
## Min. : 0 Min. :0.0000 Min. : 0.00 Min. :12.0
## 1st Qu.:20 1st Qu.:0.0312 1st Qu.: 1.00 1st Qu.:15.0
## Median :40 Median :0.1250 Median : 2.00 Median :16.0
## Mean :40 Mean :0.1777 Mean : 3.26 Mean :15.4
## 3rd Qu.:60 3rd Qu.:0.2500 3rd Qu.: 5.00 3rd Qu.:16.0
## Max. :80 Max. :1.0000 Max. :16.00 Max. :16.0
##
Question: is there a correlation between Function and Thematic competition across patients?
#prep data for GCA
FunctThemePts.gca <- code_poly(subset(FunctThemePts, Time >= 500 & Time <= 2000 &
Object != "Target"),
predictor="Time", poly.order=4, draw.poly = FALSE)
# function competition: full model
m.funct <- lmer(meanFix ~ (poly1+poly2+poly3+poly4)*Object +
(poly1+poly2+poly3+poly4 | subj) + (poly1+poly2 | subj:Object),
data=subset(FunctThemePts.gca, Condition == "Function"), REML=F)
# thematic competition: full model
m.theme <- lmer(meanFix ~ (poly1+poly2+poly3+poly4)*Object +
(poly1+poly2+poly3+poly4 | subj) + (poly1+poly2 | subj:Object),
data=subset(FunctThemePts.gca, Condition == "Thematic"), REML=F)
str(ranef(m.funct), vec.len=2)
## List of 2
## $ subj:Object:'data.frame': 34 obs. of 3 variables:
## ..$ (Intercept): num [1:34] -0.0327 0.0254 ...
## ..$ poly1 : num [1:34] 0.0339 0.0726 ...
## ..$ poly2 : num [1:34] -0.1538 0.0143 ...
## $ subj :'data.frame': 17 obs. of 5 variables:
## ..$ (Intercept): num [1:17] 0.025 -0.0134 ...
## ..$ poly1 : num [1:17] 0.0924 -0.0454 ...
## ..$ poly2 : num [1:17] -0.1915 0.0375 ...
## ..$ poly3 : num [1:17] 0.1124 -0.0677 ...
## ..$ poly4 : num [1:17] 0.0386 0.0474 ...
## - attr(*, "class")= chr "ranef.mer"
head(ranef(m.funct)$"subj:Object")
## (Intercept) poly1 poly2
## 206:Competitor -0.0327460 0.03386 -0.153809
## 206:Unrelated 0.0253878 0.07261 0.014314
## 281:Competitor 0.0066498 -0.04395 -0.059606
## 281:Unrelated -0.0200312 -0.05530 0.115819
## 419:Competitor -0.0006385 0.24667 -0.001246
## 419:Unrelated -0.0092935 -0.15294 0.056835
We can use these to calculate individual effect sizes, just like in the simple demo example.
Use gazer::get_ranef()
, which is a helper function for extracting random effects
re.funct <- get_ranef(m.funct, "subj:Object")
head(re.funct)
## Intercept poly1 poly2 subj Object
## 206:Competitor -0.0327460 0.03386 -0.153809 206 Competitor
## 206:Unrelated 0.0253878 0.07261 0.014314 206 Unrelated
## 281:Competitor 0.0066498 -0.04395 -0.059606 281 Competitor
## 281:Unrelated -0.0200312 -0.05530 0.115819 281 Unrelated
## 419:Competitor -0.0006385 0.24667 -0.001246 419 Competitor
## 419:Unrelated -0.0092935 -0.15294 0.056835 419 Unrelated
Compute individual effect size as random effects difference between the two Conditions for each Subject for Intercept and Linear time term
ES.funct <- re.funct %>% group_by(subj) %>%
summarize(Function_Intercept = Intercept[Object=="Competitor"] - Intercept[Object=="Unrelated"],
Function_Linear = poly1[Object=="Competitor"] - poly1[Object=="Unrelated"])
head(ES.funct)
## # A tibble: 6 x 3
## subj Function_Intercept Function_Linear
## <chr> <dbl> <dbl>
## 1 1088 -0.00328 -0.156
## 2 1238 -0.0133 -0.140
## 3 1392 -0.00320 0.191
## 4 1626 0.0281 0.278
## 5 1687 0.0565 -0.251
## 6 1846 -0.0788 0.399
Repeat those steps for Thematic condition
re.theme <- get_ranef(m.theme, "subj:Object")
ES.theme <- re.theme %>% group_by(subj) %>%
summarize(Thematic_Intercept = Intercept[Object=="Competitor"] - Intercept[Object=="Unrelated"],
Thematic_Linear = poly1[Object=="Competitor"] - poly1[Object=="Unrelated"])
head(ES.theme)
## # A tibble: 6 x 3
## subj Thematic_Intercept Thematic_Linear
## <chr> <dbl> <dbl>
## 1 1088 -0.0846 -0.0619
## 2 1238 -0.0221 -0.0156
## 3 1392 0.0615 -0.353
## 4 1626 0.0246 -0.469
## 5 1687 -0.0226 -0.0485
## 6 1846 0.00176 0.155
# combine condition effect sizes into one data frame
ES <- merge(ES.funct, ES.theme)
summary(ES)
## subj Function_Intercept Function_Linear
## Length:17 Min. :-0.0788 Min. :-0.3659
## Class :character 1st Qu.:-0.0508 1st Qu.:-0.2511
## Mode :character Median :-0.0027 Median :-0.0388
## Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.0267 3rd Qu.: 0.1912
## Max. : 0.1076 Max. : 0.4237
## Thematic_Intercept Thematic_Linear
## Min. :-0.08556 Min. :-0.4691
## 1st Qu.:-0.02205 1st Qu.:-0.1529
## Median :-0.00187 Median :-0.0485
## Mean : 0.00000 Mean : 0.0000
## 3rd Qu.: 0.02460 3rd Qu.: 0.1554
## Max. : 0.07799 Max. : 0.6714
# compute correlations between effect sizes
cor.test(ES$Function_Intercept, ES$Thematic_Intercept)
##
## Pearson's product-moment correlation
##
## data: ES$Function_Intercept and ES$Thematic_Intercept
## t = -2.4, df = 15, p-value = 0.03
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.8007 -0.0530
## sample estimates:
## cor
## -0.5204
cor.test(ES$Function_Linear, ES$Thematic_Linear)
##
## Pearson's product-moment correlation
##
## data: ES$Function_Linear and ES$Thematic_Linear
## t = -3.4, df = 15, p-value = 0.004
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.8637 -0.2545
## sample estimates:
## cor
## -0.655
A two-panel scatterplot that shows correlations between Function and Thematic competition effect sizes on each of the terms (Intercept, Linear).
# need to re-arrange the data
ES.p <- gather(ES, key = "key", value = "value", 2:5) %>% # "gather" into long form
separate(key, c("Condition", "Term"), sep = "_") %>% # separate the labels into two columns
spread(key = Condition, value = value) # "spread" so conditions are in separate columns
summary(ES.p)
## subj Term Function Thematic
## Length:34 Length:34 Min. :-0.3659 Min. :-0.4691
## Class :character Class :character 1st Qu.:-0.0736 1st Qu.:-0.0647
## Mode :character Mode :character Median :-0.0029 Median :-0.0077
## Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.0436 3rd Qu.: 0.0294
## Max. : 0.4237 Max. : 0.6714
# now can make the plot
ggplot(ES.p, aes(Function, Thematic)) +
facet_wrap(~ Term, scales="free") + geom_point() +
stat_smooth(method="lm", se=F) # add trendline
Why bother with the model-fitting when you could just take the difference of the observed data to get effect sizes?
Individual participant Function
competition effect sizes
The CohortRhyme
data set contains data from an eye-tracking experiment (Mirman et al., 2011) that investigated phonological competition between cohorts (e.g., penny - pencil) and rhymes (e.g., carrot - parrot).
Three groups of participants were tested: 5 Broca’s aphasics, 3 Wernicke’s aphasics, and 12 control participants.
summary(CohortRhyme)
## subjID Group Time timeBin
## 101 : 80 Control :960 Min. : 50 Min. : 1.00
## 102 : 80 Broca :400 1st Qu.: 525 1st Qu.: 5.75
## 103 : 80 Wernicke:240 Median :1000 Median :10.50
## 104 : 80 Mean :1000 Mean :10.50
## 105 : 80 3rd Qu.:1475 3rd Qu.:15.25
## 106 : 80 Max. :1950 Max. :20.00
## (Other):1120
## Object Type FixProp
## Competitor:800 Cohort:800 Min. :0.0000
## Unrelated :800 Rhyme :800 1st Qu.:0.0000
## Target : 0 Median :0.0417
## Mean :0.0665
## 3rd Qu.:0.1000
## Max. :0.5833
##