基于机器学习技术利用常规检验数据建立肺结核鉴别诊断方法
Application of routine laboratory data in combination with machine learning in the differential diagnosis of lung tuberculosis
摘要目的:探讨利用常规检验数据建立肺结核疾病鉴别诊断模型的应用价值。方法:采用回顾性调查研究方法,收集2015年5月至2021年11月就诊于北京积水潭医院和北京和平里医院初诊为肺结核和其他肺部疾病患者的常规检验数据。共纳入11 516例患者数据,通过计算机产生随机数方法以9∶1比例分为训练集和测试集。使用支持向量机、随机森林、K最近邻和逻辑回归4种机器学习算法进行模型测试和特征选择,采用十折交叉验证法验证模型诊断准确度,并采用受试者工作特征(ROC)曲线下面积(AUC)评价模型诊断效能。结果:本研究选择随机森林作为最优机器学习算法构建肺结核鉴别诊断的最佳特征模型。通过模型特征重要性排序,选择37个非特异性检验项目构成肺结核鉴别诊断模型,其验证集和测试集曲线下面积分别为0.747和0.736,敏感度为68.03%和68.75%,特异度为70.91%和67.90%,准确度为70.30%和68.12%。结论:基于机器学习算法利用常规检测数据是肺结核疾病鉴别诊断的一个有效工具,但其应用价值还有待于更多医疗机构数据做进一步验证。
更多相关知识
abstractsObjective:To investigate the application value of establishing the differential diagnosis model of pulmonary tuberculosis using routine laboratory data.Methods:The retrospective study was conducted. The routine laboratory data of newly diagnosed patients with pulmonary tuberculosis and other pulmonary diseases in Beijng Jishuitan Hospital and Beijing Hepingli Hospital from May 2015 to November 2021were collected. According to the random numbers showed in the computer, all the 11516 patients were divided into training dataset and test dataset with a ratio of 9∶1. Four machine learning algorithms, Support Vector Machine, Random Forest, K-Nearest Neighbor and Logistic Regression, were used to build models and select features. The diagnostic accuracy of each model was verified by using the 10-fold cross-validation method and the performance of each model was evaluated by using the receptor operator of characteristic (ROC) curve.Results:Random Forest was selected as the optimal machine learning algorithm to build the best feature model in the study. According to importance scale of factors, the differential diagnosis model of pulmonary tuberculosis consisting of 37 non-specific test indexes. In the validation set and test set the accuracy and area under curve (AUC) of the models were 0.747 and 0.736, the sensitivity, specificity and accuracy were 68.03% and 68.75%, 70.91% and 67.90%, 70.30% and 68.12%, respectively.Conclusion:A key tool in the differential diagnosis model of pulmonary tuberculosis was established by routine laboratory data in combination with machine learning. The results of this study need to be further verified by more data from medical institutions.
More相关知识
- 浏览119
- 被引0
- 下载0

相似文献
- 中文期刊
- 外文期刊
- 学位论文
- 会议论文