Measuring Harmful Sentence Completion in Language Models for LGBTQIA+ Individuals

Debora Nozza, Federico Bianchi, Dirk Hovy, Anne Lauscher

April 2022

Examples of hurtful sentence completions

Abstract

Current language technology is ubiquitous and directly influences individuals' lives worldwide. Given the recent trend in AI on training and constantly releasing new and powerful large language models (LLMs), there is a need to assess their biases and potential concrete consequences. While some studies have highlighted the shortcomings of these models, there is only little on the negative impact of LLMs on LGBTQIA+ individuals. In this paper, we investigated a state-of-the-art template-based approach for measuring the harmfulness of English LLMs sentence completion when the subjects belong to the LGBTQIA+ community. Our findings show that, on average, the most likely LLM-generated completion is an identity attack 13% of the time. Our results raise serious concerns about the applicability of these models in production environments.

Type

Conference paper

Publication

Second Workshop on Language Technology for Equality, Diversity and Inclusion at ACL 2022