基于机器学习构建儿童抗结核药物肝损伤风险预测模型

    Construction of A Risk Prediction Model for Childhood Anti-tuberculosis Drug Induced Liver Injury Based on Machine Learning

    • 摘要:
      目的 采用机器学习算法构建儿童抗结核药物肝损伤(anti-tuberculosis drug induced liver injury,ATB-DILI)风险预测模型。
      方法 选取2013年1月—2022年12月重庆医科大学附属儿童医院确诊为结核病的儿童患者为研究对象。采用单因素及LASSO回归筛选特征变量,基于极端梯度提升(extreme gradient boosting,XGBoost) 、自适应增强、轻量梯度提升、随机森林 4种机器学习算法分别构建预测模型。通过受试者工作特征曲线下面积(area under the curve,AUC)、准确度、精确度、召回率以及F1分数评估模型性能,应用Shapley加性解释(Shapley additive explanation,SHAP)算法对最优模型进行解释性分析,构建列线图使预测结果可视化。
      结果 共纳入2796例结核患者,ATB-DILI发生率为5.47%。XGBoost模型整体预测性能最好,AUC(0.881)、准确率(0.951)、精确度(0.981)、召回率(0.956)和F1分数(0.968)。基于SHAP算法对XGBoost模型的特征变量解释分析,显示结核复治,谷丙转氨酶、谷氨酰转肽酶、直接胆红素基线值高,住院天数延长,异烟肼治疗天数≤45 d,异烟肼累积总剂量高和乙胺丁醇治疗天数长是儿童ATB-DILI的风险因素。
      结论 基于XGBoost算法构建的儿童ATB-DILI风险预测模型性能最优。SHAP算法为模型提供明确解释,列线图法使预测结果易于理解和应用,有助于临床早期识别和预防儿童ATB-DILI的发生。

       

      Abstract:
      OBJECTIVE A machine learning algorithm was used to construct a risk prediction model for childhood antituberculosis drug-induced liver injury(ATB-DILI), providing a new method for accurate prediction of ATB-DILI in clinical pediatric tuberculosis patients.
      METHODS Pediatric patients diagnosed with tuberculosis in Children's Hospital of Chongqing Medical University from January 2013 to December 2022 were selected for the study. Univariate and LASSO regression were used to screen the characteristic variables. They were randomly divided into training set(1 957 cases) and test set (839 cases) in the ratio of 7∶3. The training set was used for risk prediction model construction and parameter adjustment, and the test set was used to validate the model performance. Extreme gradient boosting(XGBoost), adaptive boosting, light gradient boosting machine, and random forest machine learning algorithms to construct the prediction model. The model performance was evaluated by the area under the curve(AUC), accuracy, precision, recall and F1 score of the subjects' work characteristics, and the Shapley additive explanation(SHAP) algorithm was used to perform interpretive analysis of the optimal model to quantify and visualise the presentation of the complex relationship between the risk factors and the prediction results, and to increase the interpretability of the model. Nomogram make the prediction results more intuitive.
      RESULTS A total of 2 796 tuberculosis patients were included and the incidence of ATB-DILI was 5.47%. The XGBoost model had the best overall predictive performance with AUC(0.881), precision(0.951), accuracy(0.981), recall(0.956) and F1 score(0.968). The importance of clinical features in the XGBoost model was ranked based on the SHAP algorithm, in the order of treatment history, baseline value of alanine aminotransferase, baseline value of gamma-glutamyl transferase, number of days of hospitalisation, number of days of isoniazid treatment, total cumulative dose of isoniazid, baseline value of direct bilirubin, and number of days of ethambutol treatment. An in-depth analysis of how the characteristic variables in the XGBoost model affected the predicted outcomes revealed that tuberculosis retreatment, high baseline values of alanine aminotransferase, high baseline values of gamma-glutamyl transferase, high baseline values of direct bilirubin, prolonged hospitalisation days, long days of ethambutol treatment, and high cumulative total dose of isoniazid had a positive impact on the predicted outcomes, tending to favour the occurrence of ATB-DILI.
      CONCLUSION The risk prediction model for ATB-DILI in children was constructed based on the XGBoost algorithm performed optimally. The SHAP algorithm provides an explicit interpretation of the model and clarifies that tuberculosis resumption, high baseline values of alanine aminotransferase, high baseline values of gamma-glutamyl transferase, high baseline values of direct bilirubin, prolonged days of hospitalisation, days ≤45 of isoniazid treatment, high cumulative total dose of isoniazid and long days of ethambutol treatment are the risk factors for children's risk factors for ATB-DILI in patients with tuberculosis. Nomogram make the predictive results easy to understand and apply, and helps in early clinical identification and prevention of ATB-DILI in children.

       

    /

    返回文章
    返回