首页 > 中华检验医学杂志 > 基于MALDI-TOF MS的不同机器学习模型对肺炎克雷伯菌亚胺培南药物敏感性预测的诊断效能比较

基于MALDI-TOF MS的不同机器学习模型对肺炎克雷伯菌亚胺培南药物敏感性预测的诊断效能比较

Comparison of the diagnostic effect on different machine learning predictive model of imipenem sensitivity of Klebsiella pneumoniae based on MALDI-TOF MS

二维码有效期 120s

摘要目的:机器学习是人工智能的一个重要分支及支撑技术，本研究拟比较基于基质辅助激光解吸电离飞行时间质谱（MALDI-TOF-MS）技术的4种肺炎克雷伯菌对亚胺培南药物敏感性机器学习预测模型的诊断效能。方法:回顾性研究。收集2019年1月至2020年12月天津市海河医院检验科微生物室临床标本中分离的684株肺炎克雷伯菌的MALDI-TOF MS质谱峰和亚胺培南药敏数据，从中按照简单随机方法选取亚胺培南敏感株和耐药株各70株的质谱峰数据作为训练集，以及敏感株和耐药株各30株建立测试集模型，对上述200份标本的质谱峰数据经归一化处理后进行正交偏最小二乘判别分析（OPLS-DA）后，再分别通过最小绝对值选择与收缩算子（LASSO）、逻辑回归（LR）、支持向量机（SVM）和神经网络（NN）算法建立训练集数据模型，通过网格搜索算法和10折交叉验证选取最好的训练集和测试集模型曲线下面积（AUC）和混淆矩阵，通过测试集混淆矩阵验证预测模型的正确率。结果:OPLS-DA分析的R2Y和Q2分别为0.546 3和0.017 8，最优的LASSO、LR、SVM和NN算法训练集和测试集的AUC分别为1.000 0和0.858 1、1.000 0和0.820 1、0.940 8和 0.756 1、1.000 0和0.697 2，训练集模型对耐药预测准确率分别为99%（69/70）、100%（70/70）、91%（64/70）和100%（70/70），药物敏感预测准确率分别为100%（70/70）、100%（70/70）、90%（63/70）和100%（70/70），正确率分别为99%（139/140）、100%（140/140）、91%（127/140）和100%（140/140），经测试集验证分别对耐药预测准确率为93%（28/30）、87%（26/30）、60%（18/30）和60%（18/30），药物敏感预测准确率分别为100%（30/30）、80%（24/30）、93%（28/30）和67%（20/30），正确率分别为97%（58/60）、83%（50/60）、77%（46/60）和63%（38/60）。结论:LASSO算法建立的肺炎克雷伯菌对亚胺培南药物敏感性预测模型具有较高的诊断效能，具有潜在的临床辅助决策支持能力。

abstractsObjective:Machine learning is not only an important branch of artificial intelligence, but also supporting technologies for bioinformatics analysis. In the presence work, four machine-learning-predictive model for the drug-sensitivity of Klebsiella pneumoniae to imipenem were established based on matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS) and the diagnostic effect of these methods was exmained. Methods:A retrospective study was performed and the data of MALDI-TOF-MS and imipenem sensitivity of a total number of 684 cases Klebsiella pneumoniae isolated from clinical specimens in the laboratory of microbiology department of Tianjin Haihe Hospital from 2019 January to 2020 December were collected. The mass spectrometry and imipenem sensitivity data of 70 cases identified as imipenem-sensitive and 70 resistant cases were simple randomly selected to establish the training set model; whereas 30 cases of sensitive and 30 cases of resistant cases were randomly selected to establish the test set model. Mass spectral peak data were subjected to Orthogonal Partial least squares Discriminant Analysis (OPLS-DA). The training set data model was established by machine learning least absolute shrinkage and selection operator (LASSO) algorithm, Logistic Regression (LR) algorithm, Support vector machines (SVM) algorithm, neural network (NN) algorithm. The area under the curve (AUC) and confusion matrix of training set and test set model were calculated and selected by Grid search and 10-fold Cross-validation respectively, the accuracy of the prediction model was verified by test set confusion matrix. Results:The R2Y and Q2 of OPLS-DA were 0.546 3 and 0.017 8. The AUC of the best training set and test set models were 1.000 0 and 0.858 1, 1.000 0 and 0.820 1, 0.940 8 and 0.756 1, 1.000 0 and 0.697 2 evaluated by LASSO, LR, SVM and NN model respectively. The accuracy of the model were 99% (69/70), 100% (70/70), 91% (64/70) and 100% (70/70) for prediction of drug resistance, 100% (70/70), 100% (70/70), 90% (63/70) and 100% (70/70) for drug sensitivity prediction, the correct rate were 99% (139/140), 100% (140/140), 91% (127/140) and 100% (140/140) in training set, the test set showed that the accuracy were 93% (28/30), 87% (26/30), 60% (18/30) and 60% (18/30) for prediction of drug resistance, 100% (30/30), 80% (24/30), 93% (28/30) and 67% (20/30) for drug sensitivity prediction, the correct rate were 97% (58/60), 83% (50/60), 77% (46/60) and 63% (38/60) by LASSO, LR, SVM and NN model respectively.Conclusion:The LASSO prediction model of Klebsiella pneumoniae sensitivity to imipenem established in this study has a high accuracy rate and has potential clinical decision support ability.