Abstract:
OBJECTIVE To construct a predictive model of Kawasaki disease in febrile children using machine learning algorithm.
METHODS The diagnosis and treatment data of children with fever in the Department of Cardiology, Children’s Hospital of Soochow University in 2023 were collected. Eight machine learning algorithms: logistic regression, logistic regression+least absolute shrinkage and selection operator, decision tree, random forest, extreme Gradient boosting, Gradient boosting, CatBoost gradient boost, K-nearest neighbor were used to construct the prediction model. The area under the curve(AUC) of the receiver operating characteristic, accuracy rate, recall rate, accuracy rate and F1 score were used as comparison indexes among different models.
RESULTS A total of 3043 children with fever were included, of which 260 were diagnosed with Kawasaki disease. Both the internal and external validation cohorts showed that the CatBoost model had the best predictive performance for Kawasaki disease, with AUC(0.954) and F1 score(0.696) in the internal cohort, and AUC(0.967) and F1 score(0.642) in the external cohort. Based on the SHAP algorithm’s interpretation and analysis of the feature variables of the CatBoost model, albumin, C3, hemoglobin, and high sensitive C-reaction protein were identified as the 4 most important predictors for Kawasaki disease.
CONCLUSION This study successfully developed a predictive model based on machine learning algorithm, which can effectively predict the risk of developing Kawasaki disease in children with fever. The model has high accuracy and good generalization ability, which helps clinicians to identify high-risk children at an early stage, so that timely treatment measures can be taken to reduce the occurrence of complications.