Measuring Harmful Sentence Completion in Language Models for LGBTQIA+ Individuals

Examples of hurtful sentence completions


Current language technology is ubiquitous and directly influences individuals' lives worldwide. Given the recent trend in AI on training and constantly releasing new and powerful large language models (LLMs), there is a need to assess their biases and potential concrete consequences. While some studies have highlighted the shortcomings of these models, there is only little on the negative impact of LLMs on LGBTQIA+ individuals. In this paper, we investigated a state-of-the-art template-based approach for measuring the harmfulness of English LLMs sentence completion when the subjects belong to the LGBTQIA+ community. Our findings show that, on average, the most likely LLM-generated completion is an identity attack 13% of the time. Our results raise serious concerns about the applicability of these models in production environments.

Federico Bianchi
Federico Bianchi
Postdoctoral Researcher at Stanford University

My research interests include developing and understanding large language (and vision) models and recommender systems.