David M. Howcroft
Disentangling 20 years of confusion: quo vadis, human evaluation?
Crowdsourcing and evaluating text quality
Over the last decade, crowdsourcing has become a standard method for collecting training data for NLP tasks and evaluating NLP systems for things like text quality. Many evaluations, however, are still ill-defined. In the practical portion of this …