OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 12.05.2026, 19:57

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

DeepSeek-R1 vs. OpenAI o1 in colorectal cancer screening: a binational evaluation

2026·0 Zitationen·BMC GastroenterologyOpen Access
Volltext beim Verlag öffnen

0

Zitationen

10

Autoren

2026

Jahr

Abstract

Large language models (LLMs), such as OpenAI o1 and DeepSeek-R1, demonstrate promising applications in healthcare through structured reasoning and decision support. This study evaluates the responses and chain-of-thought (CoT) outputs of OpenAI o1and DeepSeek-R1 in answering questions about colorectal cancer (CRC) screening. Fifteen questions about CRC screening were posed to OpenAI o1 and DeepSeek-R1. Four experts rated the responses for accuracy and comprehensiveness and three further experts evaluated the CoT reasoning output for logical-coherence and error-types and handling, using the National Comprehensive Cancer Network (NCCN) guidelines as the primary reference standard. Both LLMs demonstrated high accuracy without significant differences (median accuracy scores: OpenAI o1 = 4.5, DeepSeek-R1 = 5; p = 0.5243). However, DeepSeek-R1 significantly outperformed OpenAI o1 in comprehensiveness (p < 0.0001), logical coherence (p = 0.0001), and error types and handling (p = 0.0149). DeepSeek-R1 generated more detailed responses (word count: 110 ± 40 vs. 57 ± 24, p = 0.0001), with longer response times (25 ± 10s vs. 7 ± 4s, p < 0.0001). DeepSeek-R1 and OpenAI o1 both offer high accuracy for CRC screening guidance, with DeepSeek-R1 providing more comprehensive responses with logically more coherent, and robust error-handling reasoning process, compared with OpenAI o1. Context-specific evaluation is critical for practical clinical integration.

Ähnliche Arbeiten