Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
DeepSeek-R1 vs. OpenAI o1 in colorectal cancer screening: a binational evaluation
0
Zitationen
10
Autoren
2026
Jahr
Abstract
Large language models (LLMs), such as OpenAI o1 and DeepSeek-R1, demonstrate promising applications in healthcare through structured reasoning and decision support. This study evaluates the responses and chain-of-thought (CoT) outputs of OpenAI o1and DeepSeek-R1 in answering questions about colorectal cancer (CRC) screening. Fifteen questions about CRC screening were posed to OpenAI o1 and DeepSeek-R1. Four experts rated the responses for accuracy and comprehensiveness and three further experts evaluated the CoT reasoning output for logical-coherence and error-types and handling, using the National Comprehensive Cancer Network (NCCN) guidelines as the primary reference standard. Both LLMs demonstrated high accuracy without significant differences (median accuracy scores: OpenAI o1 = 4.5, DeepSeek-R1 = 5; p = 0.5243). However, DeepSeek-R1 significantly outperformed OpenAI o1 in comprehensiveness (p < 0.0001), logical coherence (p = 0.0001), and error types and handling (p = 0.0149). DeepSeek-R1 generated more detailed responses (word count: 110 ± 40 vs. 57 ± 24, p = 0.0001), with longer response times (25 ± 10s vs. 7 ± 4s, p < 0.0001). DeepSeek-R1 and OpenAI o1 both offer high accuracy for CRC screening guidance, with DeepSeek-R1 providing more comprehensive responses with logically more coherent, and robust error-handling reasoning process, compared with OpenAI o1. Context-specific evaluation is critical for practical clinical integration.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.646 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.554 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.071 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.851 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.