Natural Language Generation

Disentangling 20 years of confusion: the need for standards in human evaluation

Human assessment remains the most trusted form of evaluation in natural language generation, among other areas of NLP, but there is huge variation in terms of both what is assessed and how it is assessed. We recently surveyed 20 years of publications in the NLG community to better understand this variation and conclude that we need to work together to develop clear standards for human evaluations.

Natural Language Generation and Human Language Production: a history and an opportunity

Disentangling 20 years of confusion: quo vadis, human evaluation?

Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definitions

Human assessment remains the most trusted forrm of evaluation in NLG, but highly diverse approaches and a proliferation of different quality criteria used by researchers make it difficult to compare results and draw conclusions across papers, with …

Disentangling the Properties of Human Evaluation Methods: A Classification System to Support Comparability, Meta-Evaluation and Reproducibility Testing

Current standards for designing and reporting human evaluations in NLP mean it is generally unclear which evaluations are comparable and can be expected to yield similar results when applied to the same system outputs. This has serious implications …

Semantic Noise Matters for Neural Natural Language Generation

Neural natural language generation (NNLG) systems are known for their pathological outputs, i.e. generating text which is unrelated to the input specification. In this paper, we show the impact of semantic noise on state-of-theart NNLG models which …

Noise and Neural Natural Language GenerationRubbish in, Rubbish out?

At this workshop we highlighted several sources of noise for neural NLG (semantic, typographic, and grammatical) before presenting the impact of semantic noise on the quality of NNLG (in a preview of our INLG paper) and how these different kinds of errors impact human evaluations of perceived text quality.

Arguing for consistency in the human evaluation of natural language generation systems

Learning Sentence Planning Rules with Bayesian Methods

Data-driven natural language generation (NLG) is not a new concept. For decades, researchers have been studying corpora to inform their development of NLG systems. More recently, this interest has shifted away from rule-based systems to fully …

Getting Started With Openccg

OpenCCG is a Java library which can handle both parsing and generation. I've mostly used it for surface realization, converting fairly syntactic meaning representations into a natural language text, but you can use it for parsing or for generation from higher-level semantic representations if you'd like. This tutorial is intended to help you: Start exploring OpenCCG with the tccg utility. If you haven't installed OpenCCG yet, see the first post on Installing OpenCCG first.

From OpenCCG to AI Planning: Detecting Infeasible Edges in Sentence Generation

The search space in grammar-based natural language generation tasks can get very large, which is particularly problematic when generating long utterances or paragraphs. Using surface realization with OpenCCG as an example, we show that we can …

Search Challenges in Natural Language Generation with Complex Optimization Objectives

Automatic natural language generation (NLG) is a difficult problem already when merely trying to come up with natural-sounding utterances. Ubiquituous applications, in particular companion technologies, pose the additional challenge of flexible …