Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessing ChatGPT's capability in understanding and reporting antiretroviral therapy drug–drug interaction effects: Quantitative and qualitative results from the ACCURATE‐DDI study
0
Zitationen
7
Autoren
2025
Jahr
Abstract
19 Background: Artificial intelligence platforms such as ChatGPT are transforming access to medication information. However, their ability to accurately identify, describe and provide clinically relevant management guidance for antiretroviral (ARV) drug–drug interactions (DDIs) remains unclear. This study evaluates the accuracy of ChatGPT's analysis of ARV-related DDIs compared to established HIV-specific DDI checkers. Materials and methods: Using ChatGPT4o-mini in November 2024, we tested 94 ARV DDI pairs. Pairs included prescription and non-prescription medications, oral and non-oral routes of administration and pharmacokinetic and pharmacodynamic interactions, with an even distribution of non-interacting, potential, and serious interacting combinations. ChatGPT was queried using the following prompt: ‘Is there a drug interaction between Drug A and Drug B? (a) no interaction, (b) potential interaction and (c) serious interaction.’ Using the HIV/HCV Drug Therapy Guide and the University of Liverpool HIV Drug Interactions Checker as references, we calculated accuracy (overall and stratified by DDI severity), sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). Qualitative analysis was conducted on responses for a subset of 25 pairs tested with the ChatGPT prompt, ‘Can I take Drug A with Drug B?’ Severity, mechanism, clinical effects and management were scored (0 = incorrect, 1 = mixed, 2 = correct), yielding a composite score (maximum = 8). Responses were independently assessed by HIV pharmacist/pharmacologist reviewers; score discrepancies of 2 between two reviewers were resolved by consensus involving a third reviewer. Results: ChatGPT correctly classified 40.4% (38/94) of DDI pairs, with errors primarily due to false negatives (34/56 errors; 60.7%). Among 31 non-interacting pairs, ChatGPT correctly classified 9 (29%) as true negatives; the remaining 22 (71%) were false positives. Among 32 potential DDI pairs, ChatGPT categorized 29 (91%) correctly; 3 (9%) were false negatives. All 31 serious interaction pairs were incorrectly classified (false negatives). Overall sensitivity was 46.0%, specificity 29.0%, PPV 56.9% and NPV 20.9%. In qualitative analysis, the mean composite score across all pairs was 3.9/8. Among non-interacting pairs (mean composite score 2.7/8), ChatGPT incorrectly warned about non-existent DDIs (mean severity 1/2). Mechanism (0.6/2), clinical effects (0.7/2) and management (0.4/2) were often incorrect. Across potential interaction pairs (mean composite score 4.2/8), ChatGPT was better able to categorize the severity (1.6/2), compared to mechanism (0.7/2), clinical effects (0.9/2) and management (1/2). For serious DDI pairs (mean composite score 4.3/8), ChatGPT showed limited accuracy in identifying severity (0.4/2), despite often correctly identifying mechanism (1.4/2) and clinical effects (1.6/2). Serious DDI management was often incorrect (0.9/2). ChatGPT failed to identify ritonavir-mediated DDIs involving non-CYP3A4 pathways, incorrectly claimed that non-CYP3A4 substrates would be boosted when co-administered with cobicistat and failed to identify pharmacodynamic interactions. Conclusion: ChatGPT demonstrated limited accuracy and specificity, often generating responses that combined correct and incorrect information. Notably, ChatGPT demonstrated a tendency to issue unfounded warnings about non-existent or clinically insignificant drug interactions while underreporting the severity of serious DDIs. ChatGPT is not yet a reliable tool for ARV DDI information, but performance may improve with future updates.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.312 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.169 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.564 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.466 Zit.