People's Health Press
ISSN 2096-2738 CN 11-9370/R

Source Journal for Chinese Scientific and Technical Papers and Citations
Source Journal for Annual Report for Chinese Academic Journal Impact Factors(2022)
Indexed Journals in the Database of the Chemical Abstracts Service (CAS), USA
Indexed Journals in the Database of the Japan Science and Technology Agency (JST)

  • Official WeChat

  • Official Weibo

  • Electronic Journal of Emerging Infectious Diseases ›› 2026, Vol. 11 ›› Issue (1): 44-49.doi: 10.19871/j.cnki.xfcrbzz.2026.01.007

    • Original Articles • Previous Articles     Next Articles

    Construction and validation of a predictive model for treatment failure in young and middle-aged patients with initial treatment of pulmonary tuberculosis based on interpretable machine learning

    Liu Huimei, Zhang Shuo, Xu Changwei, Han Xixi, Xi Xiangyu   

    1. The Third Department of Tuberculosis, Beijing Ditan Hospital, Xuzhou Hospital, Capital Medical University, Xuzhou Seventh People's Hospital, Jiangsu Xuzhou 221000, China
    • Received:2025-08-16 Online:2026-02-28 Published:2026-03-16

    Abstract: Objective To construct treatment failure prediction model for young and middle-aged patients with initially treated pulmonary tuberculosis based on machine learning, and to provide a reference for the early clinical identification of high-risk patients with treatment failure and the formulation of personalized intervention strategies. Method A total of 760 young and middle-aged patients with initially treated pulmonary tuberculosis admitted to Beijing Ditan Hospital, Xuzhou Hospital, Capital Medical University, Xuzhou Seventh People's Hospital from January 2022 to June 2024 were selected as the research subjects. Using SPSS software to generate random numbers,according to 7:3, the patients were randomly divided into modeling group (n=532) and validation group (n=228). Based on modeling group data, the least absolute shrinkage and selection operator (LASSO) regression was used to screen the characteristic variables of patient treatment failure. Six machine learning methods, including random forest (RF), support vector machine (SVM), logistic regression (LR), naive bayes (NB), artificial neural network (ANN)and gradient boosting machine (GBM)were selected to construct treatment failure prediction model. The effectiveness of model was evaluated by AUC of ROC, calibration curve and decision curve. In addition, the shapley additive interpretation (SHAP) method was used to explain the contributions of each variable to the results. Result LASSO regression screened 5 non-zero coefficient indicators, including residence, smoking history, sputum positive, pulmonary cavity and controlling nutritional status(CONUT) score. Among the 6 machine learning models, the GBM model had the highest AUC and F1 scores. The ROC curve showed that the AUC of GBM model in modeling group and validation group were 0.881 and 0.865, respectively. The calibration curve showed that GBM model had a good calibration degree in two groups (χ2=8.638, P=0.374; χ2=4.786, P=0.780). The decision curve showed that GBM model had wide range of clinical net benefits in two groups.SHAP analysis showed that the importance ranking of contribution to prediction results of GBM model were CONUT score,residence, pulmonary cavity, sputum positivity and smoking history. Among them, high CONUT scores, rural residence, combined pulmonary cavities, positive sputum results and history of smoking can increase the risk of treatment failure for patients. Conclusion In this study, a predictive model for treatment failure in young and middle-aged patients with newly diagnosed pulmonary tuberculosis was constructed and validated based on six machine learning algorithms, among which the GBM model exhibited the optimal predictive performance and clinical application value. This study demonstrated the key risk factors for treatment failure in patients including high CONUT score, rural residence, presence of pulmonary cavities, sputum positivity and smoking history. This model can provide a basis for clinical prediction and identification of high-risk patients with treatment failure, as well as for developing intervention plans.

    Key words: Young and middle-aged, Pulmonary tuberculosis, Initially treated, Treatment failure, Risk factor, Machine learning

    CLC Number: