PhD defence by Ana Valeria González

Title

Towards Human-Centered Natural Language Processing

Summary

With recent advancements in deep learning and the infrastructure to support training models on large amounts of data, there has been an increasing emphasis on developing data-driven Natural Language Processing (NLP) systems that contain billions of parameters and optimize for language understanding benchmark datasets. While many of these systems now exceed human performance in such benchmarks, this progress has been at the expense of other desirable system qualities such as user satisfaction, fairness, and transparency.

Due to their black-box nature, the full extent of model capabilities is still not completely clear, yet, there is increasing evidence showing that systems learn undesirable and socially unacceptable patterns and can make correct predictions for the wrong reasons.

These challenges make the adoption of systems by users controversial, corrode user trust in the system and make it unethical to deploy systems in the wild without understanding their impact on society.

As a response to this progression in the field, the studies in this dissertation adopt an interdisciplinary human-centered approach for studying and improving NLP systems. This perspective emphasizes that NLP technology must be built with an understanding of humans, society, and the impact it has on both.

Specifically, this dissertation investigates ways of (1) improving the performance of NLP systems by leveraging user interactions and (2) ensuring fairness and transparency in NLP systems.

The first part of the thesis demonstrates how to incorporate user interactions and user feedback signals that better align to human expectations in the real world, to improve the predictive performance of dialogue systems and improve their ability to adapt to new domains.

As ethical concerns have emerged in recent years, the second part of this dissertation shifts focus, acknowledging the need for better evaluation.

I show that by incorporating knowledge from NLP, Human-Computer Interaction, linguistics, and cognitive science, more meaningful evaluation protocols can be created to assess the fairness and transparency of NLP models.

Assessment Committee

Chair: Professor, Christina Lioma, Department of Computer Science, UCPH
Associate Professor, Thamar Solorio, University of Houston
Research Scientist, Adina Williams

Academic supervisor

Professor, Anders Søgaard, Department of Computer Science, UCPH

Moderator at this defence will be

Assistant Professor, Daniel Hershcovich, Department of Computer Science, UCPH

The defence will be held online via Zoom.

For a digital copy of the thesis, please go to our PhD page

Datalogisk Institut