首页 > 中华胃肠外科杂志 > DeepSeek-R1与ChatGPT在晚期胃癌多学科治疗决策中的应用比较

DeepSeek-R1与ChatGPT在晚期胃癌多学科治疗决策中的应用比较

A comparative study on the application of DeepSeek-R1 and ChatGPT in multidisciplinary treatment decision-making for advanced gastric cancer

摘要目的:比较DeepSeek-R1和ChatGPT-4o在晚期胃癌生成治疗建议方面的精确性和全面性。方法:本研究包含3个步骤：（1）评估对10个关键临床问题的回答；（2）分析本中心多学科团队（MDT）中的临床案例；（3）查阅PubMed上罕见的胃癌病例。研究案例涵盖2022年11月至2024年7月期间南昌大学第二附属医院95例晚期胃癌患者MDT资料以及PubMed上的14例罕见病例。从晚期胃癌病例中设计的提示以标准化格式提交给DeepSeek-R1和ChatGPT-4o。使用结构化的4点Likert量表对输出进行精确性和完整性评估。计算评估者间的一致性以确保评估的客观性。结果:对于这10个临床关键问题、本中心实践MDT案例以及PubMed罕见案例，DeepSeek-R1在精确性与完整性方面均优于ChatGPT-4o。分层分析显示，DeepSeek-R1在手术建议、化疗建议以及化疗方案方面的回答更具优势。对评估者间的信度进行评价显示，评分者间可靠性较高（临床关键问题：精确性和完整性W系数分别为0.696和0.632；本中心实践MDT案例：精确性和完整性W系数分别为0.657和0.634；PubMed罕见案例：精确性W系数为0.683；均 P<0.001）。结论:在对晚期胃癌病例生成治疗建议中，DeepSeek-R1表现出略优于ChatGPT-4o的性能。

abstractsObjective:To compare the accuracy and comprehensiveness of DeepSeek-R1 and ChatGPT-4o in generating treatment recommendations for advanced gastric cancer.Methods:This study included three steps: (1) evaluating the answers to ten key clinical questions; (2) analyzing clinical cases from the multidisciplinary team (MDT) of our center; (3) reviewing rare gastric cancer cases on PubMed. The study cases included MDT data of 95 patients with advanced gastric cancer treated at the Second Affiliated Hospital of Nanchang University from November 2022 to July 2024, as well as 14 rare cases retrieved from PubMed. Prompts designed based on the advanced gastric cancer cases were submitted to DeepSeek-R1 and ChatGPT-4o in a standardized format. A structured 4-point Likert scale was used to evaluate the accuracy and completeness of the outputs. Inter-rater consistency was calculated to ensure the objectivity of the evaluation.Results:DeepSeek-R1 outperformed ChatGPT-4o in both accuracy and completeness regarding the ten key clinical questions, the practical MDT cases from our center, and the rare cases from PubMed. Stratified analysis showed that DeepSeek-R1 had advantages in providing answers related to surgical recommendations, chemotherapy suggestions, and chemotherapy regimens. The evaluation of inter-rater reliability revealed high reliability among raters (Accuracy and completeness: For key clinical questions: W=0.696 and W=0.632, respectively; For practical MDT cases of our center: W=0.657 and W=0.634, respectively; For rare cases from PubMed: W=0.683 for accuracy; all P<0.001). Conclusion:DeepSeek-R1 demonstrates slightly better performance than ChatGPT-4o in generating treatment recommendations for advanced gastric cancer cases.

作者黄骄保 ^[1] 李卉子 ^[1] 宗振 ^[1] 刘鹊玲 ^[2] 顾太富 ^[3] 梁博 ^[4] 陆俊 ^[5] 刘吉彪 ^[1] 刘坤潭 ^[1] 毛盛勋 ^[1] 学术成果认领

作者单位南昌大学第二附属医院胃肠外科，南昌　330008 ^[1] 南昌大学第二附属医院肿瘤科，南昌　330008 ^[2] 南昌大学第二附属医院影像科，南昌　330008 ^[3] 南昌大学第二附属医院肝胆外科，南昌　330008 ^[4] 复旦大学附属肿瘤医院胃外二科，上海　200032 ^[5]

关键词胃肿瘤，晚期大型语言模型 DeepSeek-R1 ChatGPT-4o 多学科团队 Stomach neoplasms, advanced Large language model DeepSeek-R1 ChatGPT-4o Multidisciplinary team

栏目名称

学术探讨

DOI 10.3760/cma.j.cn441530-20250409-00149

发布时间 2026-01-25（万方平台首次上网日期，不代表论文的发表时间）