摘要: |
目的 利用机器学习算法分析浸润性乳腺癌预后的影响因素并构建预后模型。方法 采集美国监测、流行病学和终点事件(SEER)数据库中2010—2015年24 584例浸润性乳腺癌患者的临床和病理资料。利用单因素分析和logistic回归分析筛选预后变量,使用logistic回归、决策树、支持向量机、随机森林、人工神经网络5种机器学习分类算法建立生存预后的预测模型,评价各建模方法的预测能力,以灵敏度、特异度、准确度及ROC曲线的AUC作为模型的评价指标。结果 在21个模型输入变量中,组织分级、T分期、N分期、M分期、脑转移、人表皮生长因子受体2表达状态、手术治疗等因素对浸润性乳腺癌患者生存预后具有较大影响,5种机器学习算法构建的预后模型中随机森林和人工神经网络模型预测效果较好。结论 利用机器学习算法构建的浸润性乳腺癌预后模型的预测效果较好,可辅助医师判断浸润性乳腺癌患者的预后情况和治疗效果。 |
关键词: SEER数据库 浸润性乳腺癌 机器学习 预后 预测模型 |
DOI:10.16781/j.CN31-2187/R.20230255 |
投稿时间:2023-05-07修订日期:2023-10-08 |
基金项目: |
|
Construction of prognostic model for invasive breast cancer using machine learning algorithm: based on SEER database |
LU Chunwei1,MA Jun*2 |
(1. Department of Integrative Medicine, Zhongshan Hospital, Fudan University, Shanghai 200032, China; 2. Department of Integrative Chinese and Western Medicine, Xiamen Branch of Zhongshan Hospital, Fudan University, Xiamen 361000, Fujian, China *Corresponding author) |
Abstract: |
Objective To analyze the influencing factors of the prognosis of invasive breast cancer by using machine learning algorithms and construct prognostic model. Methods The clinical and pathological data of 24 584 patients with invasive breast cancer from 2010 to 2015 were collected from the Surveillance, Epidemiology, and End Results (SEER) database. Univariate analysis and logistic regression analysis were used to screen the prognostic variables. Five machine learning classification algorithms including logistic regression, decision tree, support vector machine, random forest and artificial neural network were used to establish the prediction model of survival prognosis. The prediction ability of each modeling method was evaluated. Sensitivity, specificity, accuracy and area under curve of receiver operating characteristic curve were used as evaluation indexes of the model. Results Among the 21 model input variables, histological grade, T stage, N stage, M stage, brain metastasis, expression status of human epidermal growth factor receptor 2 and surgical treatment had great impacts on the survival prognosis of patients with invasive breast cancer. Among the prognostic models constructed by 5 machine learning algorithms, random forest and artificial neural network models had better predictive effects. Conclusion The prognosis model of invasive breast cancer constructed by machine learning algorithm has good prediction effect, which can assist doctors to judge the prognosis and treatment effect of patients with invasive breast cancer. |
Key words: SEER database invasive breast cancer machine learning prognosis prediction model |