人民卫生出版社系列期刊
ISSN 2096-2738 CN 11-9370/R

中国科技核心期刊(中国科技论文统计源期刊)
中国科学引文数据库(CSCD)来源期刊
《中国学术期刊影响因子年报》统计源期刊
美国化学文摘社(CAS)数据库收录期刊
日本科学技术振兴机构(JST)数据库收录期刊

新发传染病电子杂志 ›› 2026, Vol. 11 ›› Issue (1): 44-49.doi: 10.19871/j.cnki.xfcrbzz.2026.01.007

• 论著 • 上一篇    下一篇

基于可解释机器学习的中青年初治肺结核患者治疗失败预测模型构建与验证

刘慧梅, 张硕, 许昌伟, 韩溪溪, 席向宇   

  1. 首都医科大学附属北京地坛医院徐州医院/徐州市第七人民医院结核三科,江苏 徐州 221000
  • 收稿日期:2025-08-16 出版日期:2026-02-28 发布日期:2026-03-16
  • 通讯作者: 席向宇,Email:8988731@qq.com
  • 基金资助:
    1.中国公共卫生联盟课题(GWLM202011); 2.徐州市科技计划项目(KC23205); 3.徐州市第七人民医院专病队列项目(KCDL202401)

Construction and validation of a predictive model for treatment failure in young and middle-aged patients with initial treatment of pulmonary tuberculosis based on interpretable machine learning

Liu Huimei, Zhang Shuo, Xu Changwei, Han Xixi, Xi Xiangyu   

  1. The Third Department of Tuberculosis, Beijing Ditan Hospital, Xuzhou Hospital, Capital Medical University, Xuzhou Seventh People's Hospital, Jiangsu Xuzhou 221000, China
  • Received:2025-08-16 Online:2026-02-28 Published:2026-03-16

摘要: 目的 基于机器学习构建中青年初治肺结核患者治疗失败预测模型,为临床早期识别治疗失败高风险患者及制定个体化干预策略提供参考。方法 选取2022年1月至2024年6月首都医科大学附属北京地坛医院徐州医院/徐州市第七人民医院收治的760例中青年初治肺结核患者作为研究对象,采用SPSS生成随机数字,按照7:3将患者随机分为建模组(532例)和验证组(228例)。基于建模组数据,采用最小绝对收缩和选择算子(least absolute shrinkage and selection operator,LASSO)回归筛选患者治疗失败特征变量,选用随机森林(random forest,RF)、支持向量机(support vector machine,SVM)、逻辑回归(logistic regression,LR)、朴素贝叶斯(naive bayes,NB)、人工神经网络(artificial neural networks,ANN)、梯度提升机(gradient boosting machine,GBM)6种机器学习构建患者治疗失败预测模型。采用受试者操作特征曲线(receiver operating characteristic curve,ROC曲线)、曲线下面积(area under the curve,AUC)、校准曲线、决策曲线评估模型效能。采用沙普利可加性解释(shapley additive explanation,SHAP)方法解释各变量对最优模型结果的贡献。结果 LASSO回归筛选出5项非零系数指标,包括居住地、吸烟史、痰菌阳性、肺部空洞、营养控制状况(controlling nutritional status,CONUT)评分,以此构建的6种机器学习模型中,GBM模型的AUC、F1分数均最高。ROC曲线分析结果显示,GBM模型在建模组、验证组中的AUC分别为0.881、0.865。校准曲线显示,GBM模型在建模组、验证集中的校准度较好(χ2=8.638,P=0.374;χ2=4.786,P=0.780)。决策曲线显示,GBM模型在建模组、验证组中均具有较广的临床净收益。SHAP分析显示,对GBM模型预测结果贡献程度从高到低分别为CONUT评分、居住地、肺部空洞、痰菌阳性、吸烟史,其中高CONUT评分、居住地为乡村、合并肺部空洞、合并痰菌阳性、有吸烟史会增加患者治疗失败风险。结论 本研究基于6种机器学习算法构建并验证了中青年初治肺结核患者治疗失败的预测模型,其中GBM模型展现出最优的预测效能与临床应用价值。高CONUT评分、居住地为乡村、合并肺部空洞、合并痰菌阳性、有吸烟史为患者治疗失败的关键风险因素。可为临床预测和识别治疗失败的高风险患者及制定干预方案提供依据。

关键词: 中青年, 肺结核, 初治, 治疗失败, 危险因素, 机器学习

Abstract: Objective To construct treatment failure prediction model for young and middle-aged patients with initially treated pulmonary tuberculosis based on machine learning, and to provide a reference for the early clinical identification of high-risk patients with treatment failure and the formulation of personalized intervention strategies. Method A total of 760 young and middle-aged patients with initially treated pulmonary tuberculosis admitted to Beijing Ditan Hospital, Xuzhou Hospital, Capital Medical University, Xuzhou Seventh People's Hospital from January 2022 to June 2024 were selected as the research subjects. Using SPSS software to generate random numbers,according to 7:3, the patients were randomly divided into modeling group (n=532) and validation group (n=228). Based on modeling group data, the least absolute shrinkage and selection operator (LASSO) regression was used to screen the characteristic variables of patient treatment failure. Six machine learning methods, including random forest (RF), support vector machine (SVM), logistic regression (LR), naive bayes (NB), artificial neural network (ANN)and gradient boosting machine (GBM)were selected to construct treatment failure prediction model. The effectiveness of model was evaluated by AUC of ROC, calibration curve and decision curve. In addition, the shapley additive interpretation (SHAP) method was used to explain the contributions of each variable to the results. Result LASSO regression screened 5 non-zero coefficient indicators, including residence, smoking history, sputum positive, pulmonary cavity and controlling nutritional status(CONUT) score. Among the 6 machine learning models, the GBM model had the highest AUC and F1 scores. The ROC curve showed that the AUC of GBM model in modeling group and validation group were 0.881 and 0.865, respectively. The calibration curve showed that GBM model had a good calibration degree in two groups (χ2=8.638, P=0.374; χ2=4.786, P=0.780). The decision curve showed that GBM model had wide range of clinical net benefits in two groups.SHAP analysis showed that the importance ranking of contribution to prediction results of GBM model were CONUT score,residence, pulmonary cavity, sputum positivity and smoking history. Among them, high CONUT scores, rural residence, combined pulmonary cavities, positive sputum results and history of smoking can increase the risk of treatment failure for patients. Conclusion In this study, a predictive model for treatment failure in young and middle-aged patients with newly diagnosed pulmonary tuberculosis was constructed and validated based on six machine learning algorithms, among which the GBM model exhibited the optimal predictive performance and clinical application value. This study demonstrated the key risk factors for treatment failure in patients including high CONUT score, rural residence, presence of pulmonary cavities, sputum positivity and smoking history. This model can provide a basis for clinical prediction and identification of high-risk patients with treatment failure, as well as for developing intervention plans.

Key words: Young and middle-aged, Pulmonary tuberculosis, Initially treated, Treatment failure, Risk factor, Machine learning

中图分类号: