可解释机器学习算法构建发热患儿川崎病的预测模型

    Interpretable Machine Learning Algorithm for Constructing a Prediction Model of Kawasaki Disease in Febrile Children

    • 摘要:
      目的 利用机器学习算法构建发热患儿川崎病的预测模型。
      方法 收集2023年苏州大学附属儿童医院心内科发热患儿的诊断和治疗数据。使用逻辑回归、逻辑回归+最小绝对收缩和选择算子、决策树、随机森林、极端梯度提升、Gradient boosting梯度提升、CatBoost梯度提升、K近邻8种机器学习方法来构建预测模型。以受试者工作特征曲线下面积(area under the curve,AUC)、准确率、召回率、精确率、F1评分作为不同模型之间的比较指标。
      结果 共纳入了3043例发热患儿,其中为260例患儿确诊川崎病。内部验证队列与外部验证队列均显示CatBoost模型对川崎病的预测效果最佳,内部验证队列AUC(0.954)、F1(0.696);外部验证队列AUC(0.967)、F1(0.642)。基于SHAP图对 CatBoost模型的特征变量解释分析,白蛋白、补体C3、血红蛋白和超敏C反应蛋白是发热患儿川崎病的4个最重要的预测特征。
      结论 本研究成功开发了一个基于机器学习算法的预测模型,能够有效预测发热患儿为川崎病的风险。该模型具有较高的准确性和良好的泛化能力,有助于临床医师在早期识别高风险患儿,从而及时采取治疗措施,减少并发症的发生。

       

      Abstract:
      OBJECTIVE To construct a predictive model of Kawasaki disease in febrile children using machine learning algorithm.
      METHODS The diagnosis and treatment data of children with fever in the Department of Cardiology, Children’s Hospital of Soochow University in 2023 were collected. Eight machine learning algorithms: logistic regression, logistic regression+least absolute shrinkage and selection operator, decision tree, random forest, extreme Gradient boosting, Gradient boosting, CatBoost gradient boost, K-nearest neighbor were used to construct the prediction model. The area under the curve(AUC) of the receiver operating characteristic, accuracy rate, recall rate, accuracy rate and F1 score were used as comparison indexes among different models.
      RESULTS A total of 3043 children with fever were included, of which 260 were diagnosed with Kawasaki disease. Both the internal and external validation cohorts showed that the CatBoost model had the best predictive performance for Kawasaki disease, with AUC(0.954) and F1 score(0.696) in the internal cohort, and AUC(0.967) and F1 score(0.642) in the external cohort. Based on the SHAP algorithm’s interpretation and analysis of the feature variables of the CatBoost model, albumin, C3, hemoglobin, and high sensitive C-reaction protein were identified as the 4 most important predictors for Kawasaki disease.
      CONCLUSION This study successfully developed a predictive model based on machine learning algorithm, which can effectively predict the risk of developing Kawasaki disease in children with fever. The model has high accuracy and good generalization ability, which helps clinicians to identify high-risk children at an early stage, so that timely treatment measures can be taken to reduce the occurrence of complications.

       

    /

    返回文章
    返回