evaluation

Crowdsourcing and evaluating text quality

Over the last decade, crowdsourcing has become a standard method for collecting training data for NLP tasks and evaluating NLP systems for things like text quality. Many evaluations, however, are still ill-defined. In the practical portion of this …

Arguing for consistency in the human evaluation of natural language generation systems