DIKU Bits by Dustin Brandon Wright
Speaker
Dustin Brandon Wright, Postdoc in the NLP (Natural Language Processing) section
Title
LLM-Tropes: Revealing Fine-Grained Values and Opinions in Large Language Models
Abstract
Large language models such as ChatGPT are biased; how do we know in what ways they are biased? Many people approach this by giving LLMs biased statements, and seeing in what ways they respond. For political bias, people rely on the political compass test, a set of 62 politically charged statements which can be used to plot survey participants based on their responses. However, the reponses generated by LLMs can vary greatly depending on how they are prompted, and there are many ways to argue for or against a given statement. In this talk, I will discuss how we have addressed this by analyzing a 156k LLM responses to the PCT using 420 prompt variations. We propose to identify tropes: phrases that are repeated across many prompts, revealing patterns in the text that a given LLM is prone to produce. We find that even when the model changes or we use extremely different prompts (e.g. prompting for “far-left” vs. “far-right” politically), LLMs produce many of the same justifications for their political stances, regardless of what the stance actually is.