text quality

Disentangling 20 years of confusion: the need for standards in human evaluation

Human assessment remains the most trusted form of evaluation in natural language generation, among other areas of NLP, but there is huge variation in terms of both what is assessed and how it is assessed. We recently surveyed 20 years of publications in the NLG community to better understand this variation and conclude that we need to work together to develop clear standards for human evaluations.

Disentangling 20 years of confusion: quo vadis, human evaluation?

Crowdsourcing and evaluating text quality

Over the last decade, crowdsourcing has become a standard method for collecting training data for NLP tasks and evaluating NLP systems for things like text quality. Many evaluations, however, are still ill-defined. In the practical portion of this …