Abstract:Objective To evaluate the effects of 3 machine learning algorithms (support vector machine [SVM], random forest, and extreme gradient boosting [XGBoost]) and logistic regression in predicting the 30-d mortality of severe ischemic stroke patients. Methods The data of 2 358 patients with severe ischemic stroke who qualified for the criteria in the Medical Information Mart for Intensive CareⅣ (MIMIC-Ⅳ) database from 2008 to 2019 were used. SVM, random forest, XGBoost and logistic regression combined with synthetic minority oversampling technique (SMOTE) were used respectively to build early mortality prediction models. The prediction performance of models was evaluated by the area under curve (AUC) of receiver operating characteristic curve, accuracy, F1-score, and Brier score. Results The AUC values of SVM, random forest, XGBoost and logistic regression models using original unbalance data were 0.78, 0.81, 0.84 and 0.83, respectively. After using SMOTE-based synthetic data, the AUC values of SVM, random forest, XGBoost and logistic regression models were 0.72, 0.84, 0.83 and 0.83, respectively. Except for SVM, random forest and XGBoost had similar predictive ability to logistic regression, but their accuracy and Brier score were better than logistic regression, and their overall classification performance was better. Conclusion Machine learning algorithms have better performance than traditional logistic regression in predicting early mortality of ischemic stroke patients.