Selected Publications

Developing conventional natural language generation systems requires extensive attention from human experts in order to craft complex sets of sentence planning rules. We propose a Bayesian nonparametric approach to learn sentence planning rules by inducing synchronous tree substitution grammars for pairs of text plans and morphosyntactically-specified dependency trees. Our system is able to learn rules which can be used to generate novel texts after training on small datasets.
INLG, 2018

Natural language generation (NLG) systems rely on corpora for both hand-crafted approaches in a traditional NLG architecture and for statistical end-to-end (learned) generation systems. Limitations in existing resources, however, make it difficult to develop systems which can vary the linguistic properties of an utterance as needed. For example, when users’ attention is split between a linguistic and a secondary task such as driving, a generation system may need to reduce the information density of an utterance to compensate for the reduction in user attention.

We introduce a new corpus in the restaurant recommendation and comparison domain, collected in a paraphrasing paradigm, where subjects wrote texts targeting either a general audience or an elderly family member. This design resulted in a corpus of more than 5000 texts which exhibit a variety of lexical and syntactic choices and differ with respect to average word & sentence length and surprisal. The corpus includes two levels of meaning representation: flat ‘semantic stacks’ for propositional content and Rhetorical Structure Theory (RST) relations between these propositions.


While previous research on readability has typically focused on document-level measures, recent work in areas such as natural language generation has pointed out the need of sentence-level readability measures. Much of psycholinguistics has focused for many years on processing measures that provide difficulty estimates on a word-by-word basis. However, these psycholinguistic measures have not yet been tested on sentence readability ranking tasks. In this paper, we use four psycholinguistic measures: idea density, surprisal, integration cost, and embedding depth to test whether these features are predictive of readability levels. We find that psycholinguistic features significantly improve performance by up to 3 percentage points over a standard document-level readability metric baseline.
In EACL, 2017

Recent & Upcoming Talks

Learning Sentence Planning Rules with Bayesian Methods
Wed, Oct 24, 2018 11:00
G-TUNA: a corpus of referring expressions in German, including duration information
Wed, Sep 6, 2017 11:00
Psycholinguistic Models of Sentence Processing Improve Sentence Readability Ranking
Wed, Apr 12, 2017 00:00
German morphosyntactic change is consistent with an optimal encoding hypothesis
Fri, Mar 10, 2017 13:00

Recent Posts

Installing Treex can be tricky, especially if you’ve never worked with Perl before. Here I’ve gone through the official Dockerfile, turning it into instructions to save you the headaches I had.


OpenCCG is a Java library which can handle both parsing and generation. I’ve mostly used it for surface realization, converting fairly syntactic meaning representations into a natural language text, but you can use it for parsing or for generation from higher-level semantic representations if you’d like. This tutorial is intended to help you: Start exploring OpenCCG with the tccg utility. If you haven’t installed OpenCCG yet, see the first post on Installing OpenCCG first.


Installing OpenCCG can seem intimidating, but it’s not so bad, really. Here I’ve tried to reduce the README to the necessary details while providing a little bit of extra explanation when it seemed helpful.


My website was stagnant. I love Markdown. This looks like a good way to motivate myself to update regularly. And I even found a responsive theme!



Adapting generation to users under cognitive load

This is the natural language generation side of SFB 1102 project A4 ‘Language Comprehension and Cognitive Control Demands: Adapting Information Density to Changing Situations and Individual Users’.

GerMorphIT: Exploring German Morphology with Information Theory

Work with Cynthia A. Johnson and Rory Turnbull on understanding changes in the German adjectival system in terms of information theory


  • Heriot-Watt University, Earl Mountbatten Building, Room 1.56, Riccarton, Edinburgh EH14 4AS, United Kingdom