【打印本页】 【下载PDF全文】 【HTML】 查看/发表评论下载PDF阅读器关闭

←前一篇|后一篇→

过刊浏览    高级检索

本文已被:浏览 950次   下载 724 本文二维码信息
码上扫一扫!
基于机器学习的原发性高血压并发脑梗死的风险预测模型
刘婷1,2,朱琴1,徐琳1,杜志银1,2*
0
(1. 重庆医科大学医学信息学院卫生信息管理与决策教研室, 重庆 400016;
2. 重庆医科大学医学数据研究院, 重庆 400016
*通信作者)
摘要:
目的 利用机器学习算法构建原发性高血压并发脑梗死的风险预测模型,并探索原发性高血压患者并发脑梗死的危险因素。方法 收集重庆市7家医院2015年1月1日至2019年12月31日确诊的1 478例原发性高血压并发脑梗死患者及2 826例无脑梗死的原发性高血压患者的42项临床指标资料。采用单因素分析筛选输入指标,将4 304名患者按照7 ∶ 3随机分为训练集(n=3 012)和测试集(n=1 292),训练集的数据用于构建logistic回归、决策树、随机森林、XGBoost模型,测试集中的数据用于内部验证。计算各输入指标在4个模型中的相对重要性评分,使用阳性预测值、阴性预测值、准确度、F1值、ROC曲线的AUC值及Delong检验等评价4个模型对原发性高血压并发脑梗死的预测价值。结果 单因素分析筛选出29项差异有统计学意义的指标,基于此构建的logistic回归、决策树、随机森林和XGBoost模型预测原发性高血压并发脑梗死的AUC值均较高。Delong检验结果显示,随机森林和XGBoost模型的预测性能均优于logistic回归和决策树模型,其中XGBoost模型的阴性预测值、准确度、F1值、AUC值均最高,分别为0.780(95%CI 0.778~0.782)、0.766(95%CI 0.764~0.768)、0.603(95%CI 0.599~0.607)、0.808(95%CI 0.804~0.811)。相对重要性评分结果显示,logistic回归、决策树、随机森林、XGBoost模型均提示血细胞比容、白蛋白、就诊年龄、白细胞计数、胆碱酯酶和载脂蛋白A1是原发性高血压并发脑梗死的重要影响因素。结论 基于机器学习的预测原发性高血压并发脑梗死风险的logistic回归、决策树、随机森林和XGBoost模型均有较高的诊断价值,其中XGBoost模型的综合诊断效能最佳。血细胞比容、白蛋白、就诊年龄、白细胞计数、胆碱酯酶和载脂蛋白A1可用于预测原发性高血压患者的脑梗死患病风险。
关键词:  原发性高血压  脑梗死  机器学习  危险因素  预测模型
DOI:10.16781/j.CN31-2187/R.20211053
投稿时间:2021-10-20修订日期:2021-11-16
基金项目:重庆医科大学校级哲学社会科学专项科研项目(201725),重庆医科大学智慧医学研究项目(YJSZHYX202002).
Risk prediction models of essential hypertension complicated with cerebral infarction based on machine learning algorithm
LIU Ting1,2,ZHU Qin1,XU Lin1,DU Zhi-yin1,2*
(1. Department of Health Information Management and Decision Making, School of Medical Informatics, Chongqing Medical University, Chongqing 400016, China;
2. Medical Data Science Academy, Chongqing Medical University, Chongqing 400016, China
*Corresponding author)
Abstract:
Objective To construct a risk prediction model of essential hypertension complicated with cerebral infarction based on machine learning algorithm, and explore the risk factors.Methods The data of 42 clinical indexes of 1 478 patients with essential hypertension complicated with cerebral infarction and 2 826 patients with essential hypertension without cerebral infarction in 7 hospitals of Chongqing from Jan. 1, 2015 to Dec. 31, 2019 were collected. Univariate analysis was used to screen the input indexes. The 4 304 patients were randomly divided into training set (n=3 012) and test set (n=1 292) with a ratio of 7 ∶ 3. The data of the training set was used to construct logistic regression, decision tree, random forest and XGBoost models, and the data of the test set was used for internal verification. The relative importance scores of each input index in the 4 models were calculated. The positive predictive value, negative predictive value, accuracy, F1 value, area under curve (AUC) value of receiver operating characteristic (ROC) curve and Delong test were used to evaluate the predictive diagnostic value of the 4 models for essential hypertension complicated with cerebral infarction.Results A total of 29 statistically significant indexes were selected by univariate analysis. The AUC values of essential hypertension complicated with cerebral infarction predicted by logistic regression, decision tree, random forest and XGBoost models were higher. The results of Delong test showed that the prediction performance of random forest and XGBoost models was better than that of logistic regression and decision tree models. The negative predictive value, accuracy, F1 value and AUC value of XGBoost model were the highest, being 0.780 (95% confidence interval[CI] 0.778-0.782), 0.766 (95% CI 0.764-0.768), 0.603 (95% CI 0.599-0.607) and 0.808 (95% CI 0.804-0.811), respectively. The results of relative importance scores showed that logistic regression, decision tree, random forest and XGBoost models all suggested that hematocrit, albumin, age, white blood cell count, choline esterase and apolipoprotein A1 were important influencing factors of essential hypertension complicated with cerebral infarction.Conclusion The risk prediction models of essential hypertension complicated with cerebral infarction based on machine learning, such as logistic regression, decision tree, random forest and XGBoost models, have high diagnostic value, among which XGBoost model has the best comprehensive diagnostic efficiency. Hematocrit, albumin, age, white blood cell count, choline esterase and apolipoprotein A1 can be used to predict the risk of cerebral infarction in patients with essential hypertension.
Key words:  essential hypertension  cerebral infarction  machine learning  risk factors  prediction model