While previous research on readability has typically focused on document-level measures, recent work in areas such as natural language generation has pointed out the need of sentence-level readability measures. Much of psycholinguistics has focused for many years on processing measures that provide difficulty estimates on a word-by-word basis. However, these psycholinguistic measures have not yet been tested on sentence readability ranking tasks. In this paper, we use four psycholinguistic measures: idea density, surprisal, integration cost, and embedding depth to test whether these features are predictive of readability levels. We find that psycholinguistic features significantly improve performance by up to 3 percentage points over a standard document-level readability metric baseline.
Psycholinguistic theories of online (human) sentence processing have identified a number of features which correlate with reading times, suggesting that these features explain at least some of the processing difficulty that humans have when reading. This paper explores the extent to which these features are also helpful for ranking individual sentences based on their text difficulty. Using a large corpus drawn from English and Simple English Wikipedia and a smaller corpus edited by professional language instructors, we compare surprisal, embedding depth & difference, idea density, and integration cost features for this task. We find that adding these psycholinguistic features improves performance by up to 3 percentage points over a simple baseline.