Abstract:Objective To analyze the influencing factors of the prognosis of invasive breast cancer by using machine learning algorithms and construct prognostic model. Methods The clinical and pathological data of 24 584 patients with invasive breast cancer from 2010 to 2015 were collected from the Surveillance, Epidemiology, and End Results (SEER) database. Univariate analysis and logistic regression analysis were used to screen the prognostic variables. Five machine learning classification algorithms including logistic regression, decision tree, support vector machine, random forest and artificial neural network were used to establish the prediction model of survival prognosis. The prediction ability of each modeling method was evaluated. Sensitivity, specificity, accuracy and area under curve of receiver operating characteristic curve were used as evaluation indexes of the model. Results Among the 21 model input variables, histological grade, T stage, N stage, M stage, brain metastasis, expression status of human epidermal growth factor receptor 2 and surgical treatment had great impacts on the survival prognosis of patients with invasive breast cancer. Among the prognostic models constructed by 5 machine learning algorithms, random forest and artificial neural network models had better predictive effects. Conclusion The prognosis model of invasive breast cancer constructed by machine learning algorithm has good prediction effect, which can assist doctors to judge the prognosis and treatment effect of patients with invasive breast cancer.