Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Assessing Racial and Ethnic Bias in Text Generation by Large Language Models for Health Care–Related Tasks: Cross-Sectional Study

2025·12 Zitationen·Journal of Medical Internet ResearchOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

BACKGROUND: Racial and ethnic bias in large language models (LLMs) used for health care tasks is a growing concern, as it may contribute to health disparities. In response, LLM operators implemented safeguards against prompts that are overtly seeking certain biases. OBJECTIVE: This study aims to investigate a potential racial and ethnic bias among 4 popular LLMs: GPT-3.5-turbo (OpenAI), GPT-4 (OpenAI), Gemini-1.0-pro (Google), and Llama3-70b (Meta) in generating health care consumer-directed text in the absence of overtly biased queries. METHODS: In this cross-sectional study, the 4 LLMs were prompted to generate discharge instructions for patients with HIV. Each patient's encounter deidentified metadata including race/ethnicity as a variable was passed over in a table format through a prompt 4 times, altering only the race/ethnicity information (African American, Asian, Hispanic White, and non-Hispanic White) each time, while keeping all other information constant. The prompt requested the model to write discharge instructions for each encounter without explicitly mentioning race or ethnicity. The LLM-generated instructions were analyzed for sentiment, subjectivity, reading ease, and word frequency by race/ethnicity. RESULTS: The only observed statistically significant difference between race/ethnicity groups was found in entity count (GPT-4, df=42, P=.047). However, post hoc chi-square analysis for GPT-4's entity counts showed no significant pairwise differences among race/ethnicity categories after Bonferroni correction. CONCLUSIONS: A total of 4 LLMs were relatively invariant to race/ethnicity in terms of linguistic and readability measures. While our study used proxy linguistic and readability measures to investigate racial and ethnic bias among 4 LLM responses in a health care-related task, there is an urgent need to establish universally accepted standards for measuring bias in LLM-generated responses. Further studies are needed to validate these results and assess their implications.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationHealth Literacy and Information AccessibilityTopic Modeling

Volltext beim Verlag öffnen

Assessing Racial and Ethnic Bias in Text Generation by Large Language Models for Health Care–Related Tasks: Cross-Sectional Study

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen