医学文献 >>
  • 检索发现
  • 增强检索
知识库 >>
  • 临床诊疗知识库
  • 中医药知识库
评价分析 >>
  • 机构
  • 作者
默认
×
热搜词:
换一批
论文 期刊
取消
高级检索

检索历史 清除

基于机器学习算法建立子宫内膜癌诊断模型及效能验证研究

To establish a clinical diagnosis model of endometrial cancer based on machine learning algorithm and its performance verification study

摘要目的:利用机器学习算法构建子宫内膜癌(EC)诊断模型并评估验证模型效能。方法:回顾性收集2022—2023年期间河南省人民医院肿瘤科和郑州大学第一附属医院妇科的EC患者465例与同期非EC患者182例,采用随机数字表法分为训练集(363例,包括EC患者272例、非EC患者91例)和内部验证集(284例,包括EC患者193例、非EC患者91例)。收集2024年1月至2025年2月期间同科室的EC患者188例、非EC患者93例作为外部验证集,收集患者临床特征(年龄、体质指数等)、影像学特征(子宫内膜异常回声)及实验室指标(肿瘤标志物等)作为预测变量。采用最小绝对收缩和选择算法(LASSO)进行EC诊断模型特征筛选,采用逻辑回归(LR)、随机森林(RF)及支持向量机(SVM)3种机器学习算法建立EC诊断模型,结合特异度或灵敏度选择最优算法选择并分别进行内部、外部验证,绘制受试者工作特征(ROC)曲线及校准曲线评价3种EC诊断模型的诊断效能。结果:训练集的年龄(41.4±13.2)岁;内部验证集的年龄(41.1±13.4)岁;外部验证集的年龄(41.0±13.1)岁。训练集、内部验证集、外部验证集EC组的转录因子样蛋白5(TCFL5)、癌胚抗原(CEA)、糖类抗原153(CA153)、糖类抗原72-4(CA72-4)、中性粒细胞与淋巴细胞比值(NLR)、脂肪因子(Apelin)、人附睾蛋白4(HE4)水平、糖尿病及子宫内膜异常回声比例均高于非EC组( P均<0.001)。沙普利加和解释法分析显示LR模型特征重要性排序前三位的指标依次为年龄、子宫内膜异常回声、TCFL5。内部验证中LR、RF、SVM模型曲线下面积(AUC)(95% CI)分别为0.890(0.849~0.923)、0.885(0.843~0.919)、0.845(0.799~0.884),LR模型Brier评分低于RF与SVM模型( P均<0.05)。外部验证显示LR模型AUC(95% CI)为0.932(0.894~0.961),高于RF模型[0.918(0.877~0.949)]和SVM模型[0.887(0.841~0.924)](均 P<0.05)。 结论:基于年龄、糖尿病、子宫内膜异常回声、TCFL5、CEA、CA153、CA72-4、NLR、Apelin、HE4等多维度变量构建的EC诊断模型具有良好效能,其中LR算法表现最优,可为EC早期临床识别与诊断提供科学参考依据。

更多

abstractsObjective:To construct and validate a diagnostic model for endometrial cancer (EC) using machine learning algorithms.Methods:Case-control study. A total of 465 patients with endometrial agenesis (EC) and 182 non-EC patients from the Department of Oncology, Henan Provincial People′s Hospital and the Department of Gynecology, First Affiliated Hospital of Zhengzhou University, were collected from 2022 to 2023. They were randomly assigned to a training set (363 cases, including 272 EC patients and 91 non-EC patients) and an internal validation set (284 cases, including 193 EC patients and 91 non-EC patients) using a random number table. From January 2024 to February 2025, 188 EC patients and 93 non-EC patients from the same department were collected as an external validation set. Patient clinical characteristics (age, body mass index, etc.), imaging characteristics (abnormal endometrial echo), and laboratory indicators (tumor markers, etc.) were collected as predictive variables. The Least Absolute Shrinkage and Selection (LASSO) algorithm was used to screen features for EC diagnostic models. Three machine learning algorithms, namely logistic regression (LR), random forest (RF) and support vector machine (SVM), were used to build EC diagnostic models. The optimal algorithm was selected by combining specificity or sensitivity and internal and external validations were performed respectively. ROC curves and calibration curves were plotted to evaluate the diagnostic efficacy of the three EC diagnostic models.Results:The average age of the training set was (41.4±13.2) years; the average age of the internal validation set was (41.1±13.4) years; and the average age of the external validation set was (41.0±13.1) years. The levels of transcription factor-like protein 5 (TCFL5), carcinoembryonic antigen (CEA), carbohydrate antigen 153 (CA153), carbohydrate antigen 72-4 (CA72-4), neutrophil-to-lymphocyte ratio (NLR), adipokines (Apelin), human epididymal protein 4 (HE4), diabetes, and abnormal endometrial echogenicity were all significantly higher in the EC group than in the non-EC group (all P<0.001). SHAP analysis showed that the top three indicators of importance in the LR model were age, abnormal endometrial echogenicity, and TCFL5. In internal validation, the area uncler curve(AUC) (95% CI) of the LR, RF, and SVM models were 0.890 (0.849-0.923), 0.885 (0.843-0.919), and 0.845 (0.799-0.884), respectively. The Brier score of the LR model was lower than that of the RF and SVM models (both P<0.05). External validation showed that the AUC (95% CI) of the RF and SVM models were 0.918 (0.877-0.949) and 0.887 (0.841-0.924), respectively, while the AUC (95% CI) of the LR model was 0.932 (0.894-0.961), which was significantly higher than the former two (both P<0.05). Conclusions:The EC diagnostic model constructed based on multiple dimensions of variables including age, diabetes, abnormal endometrial echo, TCFL5, CEA, CA153, CA72-4, NLR, Apelin, and HE4 has good efficacy, with the LR algorithm showing the best performance. It can provide a scientific reference for the early clinical identification and diagnosis of EC.

More
广告
  • 浏览3
  • 下载0
中华医学杂志

中华医学杂志

2026年106卷14期

1344-1350页

MEDLINEISTICPKUCSCDCA

加载中!

相似文献

  • 中文期刊
  • 外文期刊
  • 学位论文
  • 会议论文

加载中!

加载中!

加载中!

加载中!

扩展文献

法律状态公告日 法律状态 法律状态信息

特别提示:本网站仅提供医学学术资源服务,不销售任何药品和器械,有关药品和器械的销售信息,请查阅其他网站。

  • 客服热线:4000-115-888 转3 (周一至周五:8:00至17:00)

  • |
  • 客服邮箱:yiyao@wanfangdata.com.cn

  • 违法和不良信息举报电话:4000-115-888,举报邮箱:problem@wanfangdata.com.cn,举报专区

官方微信
万方医学小程序
new医文AI 翻译 充值 订阅 收藏 移动端

官方微信

万方医学小程序

使用
帮助
Alternate Text
调查问卷