Low-resource NLG - developing tools and building a Scottish Gaelic dataset

Most NLG is low-resource, even in high-resource languages (Howcroft & Gkatzia 2022). In this talk I will highlight our efforts to develop a data collection paradigm which focuses on question answering conversations and (dialogue) summarisation for low-resource languages. We are applying this paradigm to build a new dataset for Scottish Gaelic focused on conversations about artefacts held by the National Museum of Scotland. In parallel to these efforts, we are developing a neural NLG library to provide implementations for multiple models in a common codebase, facilitating model comparison and our own future work on neural pipelines and multitask learning for NLG.

Research Fellow in Natural Language Generation

Dave Howcroft is a computational linguist working at Edinburgh Napier University.