Over the last decade, crowdsourcing has become a standard method for collecting training data for NLP tasks and evaluating NLP systems for things like text quality. Many evaluations, however, are still ill-defined.
In the practical portion of this talk I present an overview of current tasks addressed with crowdsourcing in computational linguistics, along with tools for implementing them. This overview is meant to be interactive: I am sharing some of the best or most interesting tasks I am aware of, but I would like us to have a conversation about how you are using crowdsourcing as well.
After this discussion of tasks, tools, and best practices, I introduce a new research program from the Heriot-Watt NLP Lab looking at human and automatic evaluations for natural language generation. This includes foundational work to make our evaluations more well-defined, experimental work developing new reading time measures to assess readability, and modeling work as we seek new methods of quality estimation that improve upon metrics like BLEU and BERTscore.