Pre-Processing Eye-Tracking Data

Dan Mirman

23 January 2019

Workshop Schedule

Day 1: Data wrangling, Pre-processing eye-tracking data


Time	Content
9am - Noon	Part 1: Data wrangling
Noon - 1:30pm	Lunch break, Q&A, own data analysis time
1:30pm - 4:30pm	Part 2: Pre-processing eye-tracking data

Day 2: Statistical analysis of eye-tracking data (GCA)

Pre-processing eye-tracking data

Approaches

Data wrangling tools (tidyverse): very general, requires customization expertise
gazeR
In-house scripts: already customized, only works for very specific experiment

LCDL Eye-Tracking Helper Functions

https://github.com/dmirman/gazer

To install

devtools::install_github("dmirman/gazer")

Load package

library(gazer)

Visual World Paradigm

Fixation report

We use a SR EyeLink 1000+ desktop-mounted (remote) eye-tracker and record monocular eye position at 250Hz.

It comes with a EyeLink Data Viewer application, which has built-in algorithms for detecting fixations, saccades, blinks, etc. This application can output a “fixation report”, which can include data from multiple participants and produces a tab-delimited text file that looks like this:

Columns: RECORDING_SESSION_LABEL, CURRENT_FIX_PUPIL, CURRENT_FIX_DURATION, CURRENT_FIX_END, CURRENT_FIX_START, CURRENT_FIX_X, CURRENT_FIX_Y, CompPort, Condition, CorrectPort, StimSlide.ACC, StimSlide.RT, Target

(You can select which columns to include in the report)

Read the fixation report

# Use a file installed with the package 
gaze_path <- system.file("extdata", "FixData_v1_N15.xls", package = "gazer")
gaze <- readFixationReport(gaze_path, plot_fix_scatter = FALSE)
summary(gaze)

##     Subject     CURRENT_FIX_PUPIL CURRENT_FIX_DURATION CURRENT_FIX_END
##  9160   :1109   Min.   :  36      Min.   :   2         Min.   :   22  
##  9196   : 897   1st Qu.: 122      1st Qu.: 140         1st Qu.:  920  
##  9115   : 882   Median : 165      Median : 210         Median : 1886  
##  9187   : 839   Mean   : 176      Mean   : 280         Mean   : 1958  
##  9061   : 787   3rd Qu.: 201      3rd Qu.: 328         3rd Qu.: 2614  
##  9171   : 786   Max.   :9144      Max.   :2660         Max.   :26184  
##  (Other):5616                                                         
##  CURRENT_FIX_START CURRENT_FIX_X   CURRENT_FIX_Y     CompPort   
##  Min.   :    4     Min.   :-3270   Min.   :-3270   image1:2794  
##  1st Qu.:  650     1st Qu.:  234   1st Qu.:  174   image2:2762  
##  Median : 1562     Median :  511   Median :  363   image3:2716  
##  Mean   : 1680     Mean   :  511   Mean   :  354   image4:2644  
##  3rd Qu.: 2334     3rd Qu.:  800   3rd Qu.:  523                
##  Max.   :25848     Max.   : 3270   Max.   : 3270                
##                                                                 
##      Condition     TargetLoc         ACC             RT       
##  associate:3059   image1:2769   Min.   :0.00   Min.   : 2236  
##  filler   :3010   image2:2891   1st Qu.:1.00   1st Qu.: 2957  
##  practice :1702   image3:2611   Median :1.00   Median : 3237  
##  taxonomic:3145   image4:2645   Mean   :0.99   Mean   : 3631  
##                                 3rd Qu.:1.00   3rd Qu.: 3687  
##                                 Max.   :1.00   Max.   :26105  
##                                                               
##      Target     TargetLocation
##  barn   : 213   1:2769        
##  walker : 194   2:2891        
##  acorn  : 184   3:2611        
##  bandaid: 184   4:2645        
##  pillow : 181                 
##  falcon : 180                 
##  (Other):9780

Get some calibration diagnostics, including a figure

cg <- get_gaze_diagnostics(gaze)

Non-Fixation time: 1 - ((total fixation time) / (total task time))
- Should be non-zero (saccades, blinks, etc.), but a high value suggests poor track quality
Out of bounds proportion: (fixation time outside the screen boundaries) / (total fixation time)
- can be non-zero, but shoud be very low

A single call

A single call will read in the data and generate two diagnostic figures

gaze <- readFixationReport(gaze_path, plot_fix_scatter=TRUE)

Parse areas of interest

For this experiment, the objects were always presented in the four corners of the screen and the gaze position was recorded in terms of (x,y) coordinates. So we need to:

Identify target and competitor image locations (image1, image2, etc.)
Convert gaze coordinates into image locations
Compare gaze location to target and competitor locations.

[If your data are already coded in terms of which image is being fixated or the images are not in fixed locations, then this step is not necessary.]

#(1) extract the numbered location of the target and competitor
gaze$TargetLocation <- as.numeric(substr(gaze$TargetLoc, 6, 6))
gaze$CompLocation <- as.numeric(substr(gaze$CompPort, 6, 6))
# (2) use fixation coordinates to determine which image location was fixated
gaze_aoi <- assignAOI(gaze)
summary(gaze_aoi)

##     Subject     CURRENT_FIX_PUPIL CURRENT_FIX_DURATION CURRENT_FIX_END
##  9160   :1109   Min.   :  36      Min.   :   2         Min.   :   22  
##  9196   : 897   1st Qu.: 122      1st Qu.: 140         1st Qu.:  920  
##  9115   : 882   Median : 165      Median : 210         Median : 1886  
##  9187   : 839   Mean   : 176      Mean   : 280         Mean   : 1958  
##  9061   : 787   3rd Qu.: 201      3rd Qu.: 328         3rd Qu.: 2614  
##  9171   : 786   Max.   :9144      Max.   :2660         Max.   :26184  
##  (Other):5616                                                         
##  CURRENT_FIX_START CURRENT_FIX_X   CURRENT_FIX_Y     CompPort   
##  Min.   :    4     Min.   :-3270   Min.   :-3270   image1:2794  
##  1st Qu.:  650     1st Qu.:  234   1st Qu.:  174   image2:2762  
##  Median : 1562     Median :  511   Median :  363   image3:2716  
##  Mean   : 1680     Mean   :  511   Mean   :  354   image4:2644  
##  3rd Qu.: 2334     3rd Qu.:  800   3rd Qu.:  523                
##  Max.   :25848     Max.   : 3270   Max.   : 3270                
##                                                                 
##      Condition     TargetLoc         ACC             RT       
##  associate:3059   image1:2769   Min.   :0.00   Min.   : 2236  
##  filler   :3010   image2:2891   1st Qu.:1.00   1st Qu.: 2957  
##  practice :1702   image3:2611   Median :1.00   Median : 3237  
##  taxonomic:3145   image4:2645   Mean   :0.99   Mean   : 3631  
##                                 3rd Qu.:1.00   3rd Qu.: 3687  
##                                 Max.   :1.00   Max.   :26105  
##                                                               
##      Target     TargetLocation  CompLocation       AOI      
##  barn   : 213   Min.   :1.00   Min.   :1.00   Min.   :0.0   
##  walker : 194   1st Qu.:1.00   1st Qu.:1.00   1st Qu.:0.0   
##  acorn  : 184   Median :2.00   Median :2.00   Median :2.0   
##  bandaid: 184   Mean   :2.47   Mean   :2.48   Mean   :1.7   
##  pillow : 181   3rd Qu.:3.00   3rd Qu.:3.00   3rd Qu.:3.0   
##  falcon : 180   Max.   :4.00   Max.   :4.00   Max.   :4.0   
##  (Other):9780                                 NA's   :1040

# (3) match AOI codes with target and competitor locations to determine which object was being fixated
gaze_aoi$Targ <- gaze_aoi$AOI == gaze_aoi$TargetLocation
gaze_aoi$Comp <- gaze_aoi$AOI == gaze_aoi$CompLocation
gaze_aoi$Unrelated <- ((gaze_aoi$AOI != as.numeric(gaze_aoi$TargetLocation)) &
                         (gaze_aoi$AOI != as.numeric(gaze_aoi$CompLocation)) &
                         (gaze_aoi$AOI != 0) & !is.na(gaze_aoi$AOI))

Fixations to bins

Fixations start and end at different times. For easy plotting and data analysis, these need to be converted to aligned time bins.

If your critical time windows are not aligned to trial onset (for example, if you used natural sentences where the critical word occurs at slightly different times), you should re-align the fixation start and end times with those key time points, which can be trial-specific.

This is also a good time to drop columns that are no longer needed. Most of the work is done by the binify_fixations() function, it just needs a list columns that should be kept after the bining is done. You can optionally specify a bin size (default is 20ms). Note: this step is slow.

gaze_bins <- binify_fixations(gaze = gaze_aoi, 
                              keepCols = c("Subject", "Target", "Condition", "ACC",
                                           "RT", "Targ", "Comp", "Unrelated"))
summary(gaze_bins)

##    FixationID       timeBin          Subject             Target      
##  Min.   :    1   Min.   :   1.0   9115   :14560   barn      :  3184  
##  1st Qu.: 2732   1st Qu.:  45.0   9160   :12851   walker    :  2761  
##  Median : 5295   Median :  88.0   9061   :12215   bandaid   :  2752  
##  Mean   : 5458   Mean   :  95.1   9156   :11734   acorn     :  2673  
##  3rd Qu.: 8293   3rd Qu.: 130.0   9171   :10931   soda      :  2642  
##  Max.   :10916   Max.   :1310.0   9092   :10763   paintbrush:  2613  
##                                   (Other):88597   (Other)   :145026  
##      Condition          ACC             RT           Targ        
##  associate:45169   Min.   :0.00   Min.   : 2236   Mode :logical  
##  filler   :45125   1st Qu.:1.00   1st Qu.: 2947   FALSE:87494    
##  practice :25206   Median :1.00   Median : 3229   TRUE :63465    
##  taxonomic:46151   Mean   :0.99   Mean   : 3641   NA's :10692    
##                    3rd Qu.:1.00   3rd Qu.: 3673                  
##                    Max.   :1.00   Max.   :26105                  
##                                                                  
##     Comp         Unrelated            Time      
##  Mode :logical   Mode :logical   Min.   :   20  
##  FALSE:135512    FALSE:134895    1st Qu.:  900  
##  TRUE :15447     TRUE :26756     Median : 1760  
##  NA's :10692                     Mean   : 1902  
##                                  3rd Qu.: 2600  
##                                  Max.   :26200  
##

Gather

The fixation locations are in separate columns, need to gather() them into a single column:

gaze_obj <- gather(gaze_bins, key = "Object", value = "Fix", 
                   Targ, Comp, Unrelated, factor_key = TRUE)
# recode NA as not-fixating
gaze_obj$Fix <- replace(gaze_obj$Fix, is.na(gaze_obj$Fix), FALSE) 
summary(gaze_obj)

##    FixationID       timeBin          Subject              Target      
##  Min.   :    1   Min.   :   1.0   9115   : 43680   barn      :  9552  
##  1st Qu.: 2732   1st Qu.:  45.0   9160   : 38553   walker    :  8283  
##  Median : 5295   Median :  88.0   9061   : 36645   bandaid   :  8256  
##  Mean   : 5458   Mean   :  95.1   9156   : 35202   acorn     :  8019  
##  3rd Qu.: 8293   3rd Qu.: 130.0   9171   : 32793   soda      :  7926  
##  Max.   :10916   Max.   :1310.0   9092   : 32289   paintbrush:  7839  
##                                   (Other):265791   (Other)   :435078  
##      Condition           ACC             RT             Time      
##  associate:135507   Min.   :0.00   Min.   : 2236   Min.   :   20  
##  filler   :135375   1st Qu.:1.00   1st Qu.: 2947   1st Qu.:  900  
##  practice : 75618   Median :1.00   Median : 3229   Median : 1760  
##  taxonomic:138453   Mean   :0.99   Mean   : 3641   Mean   : 1902  
##                     3rd Qu.:1.00   3rd Qu.: 3673   3rd Qu.: 2600  
##                     Max.   :1.00   Max.   :26105   Max.   :26200  
##                                                                   
##        Object          Fix         
##  Targ     :161651   Mode :logical  
##  Comp     :161651   FALSE:379285   
##  Unrelated:161651   TRUE :105668   
##                                    
##                                    
##                                    
##

Compute fixation proportions

To calculate subject-by-condition time courses that will go into an analysis:

Filter out error and practice trials, focus on relevant time window
Group by Subject, Condition, and Object type; calculate number of valid trials in each cell
Group by Subject, Condition, Object type, and time bin
Aggregate within time bins to calculate time course

gaze_subj <- gaze_obj %>% 
  filter(ACC == 1, Condition != "practice", Time < 3500) %>% 
  # calculate number of valid trials for each subject-condition
  group_by(Subject, Condition, Object) %>% 
  mutate(nTrials = length(unique(Target))) %>% ungroup() %>%
  # calculate number of fixations 
  group_by(Subject, Condition, Object, Time) %>%
  summarize(sumFix = sum(Fix), nTrials = unique(nTrials), 
            meanFix = sum(Fix)/unique(nTrials))

# there were two unrelated objects, so divide those proportions by 2
gaze_subj$meanFix[gaze_subj$Object == "Unrelated"] <- 
  gaze_subj$meanFix[gaze_subj$Object == "Unrelated"] / 2

summary(gaze_subj)

##     Subject          Condition          Object          Time     
##  9061   : 1566   associate:7800   Targ     :7790   Min.   :  20  
##  9062   : 1566   filler   :7758   Comp     :7790   1st Qu.: 880  
##  9092   : 1566   practice :   0   Unrelated:7790   Median :1740  
##  9115   : 1566   taxonomic:7812                    Mean   :1742  
##  9146   : 1566                                     3rd Qu.:2600  
##  9153   : 1566                                     Max.   :3480  
##  (Other):13974                                                   
##      sumFix        nTrials        meanFix      
##  Min.   : 0.0   Min.   :19.0   Min.   :0.0000  
##  1st Qu.: 0.0   1st Qu.:20.0   1st Qu.:0.0000  
##  Median : 2.0   Median :20.0   Median :0.0789  
##  Mean   : 3.5   Mean   :19.9   Mean   :0.1519  
##  3rd Qu.: 5.0   3rd Qu.:20.0   3rd Qu.:0.2000  
##  Max.   :20.0   Max.   :20.0   Max.   :1.0000  
##

Plot fixation time course

ggplot(gaze_subj, aes(Time, meanFix, color = Object)) + 
  facet_wrap(~ Condition) +
  stat_summary(fun.y = mean, geom = "line") +
  geom_vline(xintercept = 1300) +
  annotate("text", x=1300, y=0.9, label="Word onset", hjust=0)

Pre-processing pipeline summary

Read in data
Check data quality
(If necessary) Convert fixation locations (x,y) into objects (target, competitor, etc.)
Convert fixations (or samples) into time bins
Aggregate data in time bins (by subjects or by items)
Plot, analyze

Exercise

In a second version of the same experiment, participants just listened to the words and did not make any response. This means there is no accuracy or RT, but everything else about the data is the same. Read and pre-process these data (FixData_v2_N15.xls) and make a graph.
Try reading and pre-processing your own data.