Zoom link: https://harvard.zoom.us/j/812428264
Here we investigate the relationship between contemporary neural models’
inductive bias, perplexity, and ability to predict human reading times. We
train three model architectures---5-grams, LSTMs, and Recurrent Neural
Network Grammars (which have explicit syntactic representations; Dyer et
al., 2016)---on four datasets derived from the BLLIP corpus (Charniak et.
al, 2000). For each model we investigate the relationship between word
surprisal and human reading time on three online processing datasets: the
Dundee eye-tracking corpus (Kennedy et al. 2003), self-paced reading from
selections of the Brown corpus (Smith & Levy, 2013), and the Natural
Stories corpus (Futrell, 2017), using multiple regression with standard
control predictors (Goodkind & Bicknell, 2018). In line with previous
findings, we find a strong linear relationship between surprisal and
reading times for all models. Comparing across neural models trained on the
same dataset, we find a negative relationship between test perplexity and
predictive power. However, we find that n-gram models demonstrate
predictive power comparable with the neural models despite much worse
test-set perplexity. Addressing the issue of syntactic structure, we
compare each model’s predictive power against a syntactic generalization
score, which is derived from a battery of targeted syntactic evaluation
tests following Futrell et al. (2019) and Marvin and Linzen (2018). We find
a positive correlation for our eye-tracking dataset, and either no
correlation or a negative correlation for our self-paced reading dataset.
These results are consistent with the hypothesis that structural bias does
not lead to better reading time predictions.