Reach me on gmail: dave.howcroft
I research the factors that make a text more or less difficult to read and ways to incorporate this knowledge into natural language generation systems. On the complexity side of things, I am interested in the role played by factors like surprisal, embedding depth, dependency length, and idea density in reading comprehension. In addition to factoring these features into models of generation, I have worked on grammar induction for microplanning and the influence of discourse markers on fluency judgements.
For the last year and a half I have been focusing on making it easier to start new projects in NLG or to port existing systems to new domains. The emphasis in this work is on inducing 'grammars' for microplanning, originally in a template-derived approach (White & Howcroft 2015) but now using Bayesian non-parametric approaches.
My Master's thesis (Howcroft 2015) evaluated the discriminative power of psycholinguistic metrics in ranking sentences according to their complexity. Using the English and Simple English Wikipedia Corpus (Hwang et al. 2015; ESEW) and the One Stop English Corpus (Vajjala 2015; OSE), I trained an averaged perceptron model using both traditional features (like word and sentence length) and psycholinguistically-motivated features (like surprisal and embedding depth). The psycholinguistic features resulted in a small but significant improvement in accuracy.
Cynthia A Johnson, Rachel Steindel Burdin, and Rory Turnbull, and I are examining adjectival paradigms in Middle and New High German using expected relative entropy. For an overview of the project, you can check out an old handout.
In 2012 and 2013 I worked with Michael White on the generation of contrastive expressions and presented our work at ENLG.
Unfortunately there's no video, but you should read the paper if you're interested:
David M. Howcroft, Crystal Nakatsu, and Michael White. 2013. "Enhancing the Expression of Contrast in the SPaRKy Restaurant Corpus". In Proceedings of the 14th European Workshop on Natural Language Generation. [PDF]