Abstract:Objective To construct a risk prediction model of essential hypertension complicated with cerebral infarction based on machine learning algorithm, and explore the risk factors.Methods The data of 42 clinical indexes of 1 478 patients with essential hypertension complicated with cerebral infarction and 2 826 patients with essential hypertension without cerebral infarction in 7 hospitals of Chongqing from Jan. 1, 2015 to Dec. 31, 2019 were collected. Univariate analysis was used to screen the input indexes. The 4 304 patients were randomly divided into training set (n=3 012) and test set (n=1 292) with a ratio of 7 ∶ 3. The data of the training set was used to construct logistic regression, decision tree, random forest and XGBoost models, and the data of the test set was used for internal verification. The relative importance scores of each input index in the 4 models were calculated. The positive predictive value, negative predictive value, accuracy, F1 value, area under curve (AUC) value of receiver operating characteristic (ROC) curve and Delong test were used to evaluate the predictive diagnostic value of the 4 models for essential hypertension complicated with cerebral infarction.Results A total of 29 statistically significant indexes were selected by univariate analysis. The AUC values of essential hypertension complicated with cerebral infarction predicted by logistic regression, decision tree, random forest and XGBoost models were higher. The results of Delong test showed that the prediction performance of random forest and XGBoost models was better than that of logistic regression and decision tree models. The negative predictive value, accuracy, F1 value and AUC value of XGBoost model were the highest, being 0.780 (95% confidence interval[CI] 0.778-0.782), 0.766 (95% CI 0.764-0.768), 0.603 (95% CI 0.599-0.607) and 0.808 (95% CI 0.804-0.811), respectively. The results of relative importance scores showed that logistic regression, decision tree, random forest and XGBoost models all suggested that hematocrit, albumin, age, white blood cell count, choline esterase and apolipoprotein A1 were important influencing factors of essential hypertension complicated with cerebral infarction.Conclusion The risk prediction models of essential hypertension complicated with cerebral infarction based on machine learning, such as logistic regression, decision tree, random forest and XGBoost models, have high diagnostic value, among which XGBoost model has the best comprehensive diagnostic efficiency. Hematocrit, albumin, age, white blood cell count, choline esterase and apolipoprotein A1 can be used to predict the risk of cerebral infarction in patients with essential hypertension.