基于缺血性脑卒中患者出院小结的协变量提取方法

doi:10.16781/j.0258-879x.2021.11.1273

首页 > 过刊浏览>2021年第42卷第11期 >1273-1278. DOI:10.16781/j.0258-879x.2021.11.1273

基于缺血性脑卒中患者出院小结的协变量提取方法
DOI:
                        10.16781/j.0258-879x.2021.11.1273
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:R197.324
基金项目:全军后勤科研重大项目子课题（AWS14R013-1），上海市公共卫生体系建设三年行动计划（2020-2022年）优秀人才培养计划（GWV-10.1-XD05）.

Covariate extraction method based on discharge summary of stroke patients

Author:

Affiliation:

Fund Project:

Supported by Major Logistics Research Project of PLA (AWS14R013-1) and Outstanding Talent Trainning Plan of Shanghai 3-Year Action Plan (2020-2022) for Public Health System (GWV-10.1-XD05).

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

目的针对缺血性脑卒中这一发病率高、预后差的疾病，应用自然语言处理技术从患者出院小结中进行文本数据挖掘，并通过Python编程语言将非结构化的文本数据转换成供后续统计分析的结构化数据库。方法利用缺血性脑卒中患者出院小结资料，构建基于知识增强的语义表示模型（ERNIE）+神经网络+条件随机场的命名实体识别模型，进行疾病、药物、手术、影像学检查、症状5种医疗命名实体的识别，提取实体构建半结构化数据库。为了进一步从半结构化数据库中提取出结构化数据，构建基于ERNIE的孪生文本相似度匹配模型，评价指标为准确率，采用最优模型构建协变量提取器。结果命名实体识别模型总体F1值为90.27%，其中疾病F1值为88.41%，药物F1值为91.03%，影像学检查F1值为87.71%，手术F1值为87.07%，症状F1值为96.59%。文本相似度匹配模型的总体准确率为99.11%。结论通过自然语言处理技术，实现了从完全的非结构化数据到半结构化数据再到结构化数据的构建流程，与人工阅读病历并手动提取病历信息相比，极大提高了数据库构建的效率。

Abstract:

Objective To carry out text data mining from discharge summary of patients with stroke (a disease with high incidence and poor prognosis) using natural language processing technology, and to convert unstructured text data into structured database for subsequent statistical analysis through Python. Methods Based on the discharge summary of patients with ischemic stroke, the named entity recognition model of enhanced representation from knowledge integration (ERNIE)+neural network+conditional random field was constructed to identify 5 kinds of medical named entities, including disease, drug, surgery, imaging examination and symptoms. The entities were extracted and the semi-structured database was constructed. In order to further extract structured data from semi-structured databases, a similarity matching model of twin texts based on ERNIE was constructed. The evaluation index was accuracy, and the optimal model was used to construct the covariable extractor. Results The overall F1 value of the named entity recognition model reached 90.27%, including 88.41% for disease F1, 91.03% for drug F1, 87.71% for imaging examination F1, 87.07% for surgery F1, and 96.59% for symptom F1. The overall accuracy of the text similarity matching model reached 99.11%. Conclusion The construction process from complete unstructured data, to semi-structured data, and then to structured data, is realized through natural language processing technology. Compared with reading and extracting medical records manually, the natural language processing technology greatly improved the efficiency of database construction.

参考文献

相似文献

引证文献

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2021-05-21
最后修改日期:2021-06-28
录用日期:
在线发布日期: 2021-12-18
出版日期:

首页

期刊简介

编委会

审稿专家

投稿指南

期刊订阅

相关下载

版权声明

联系我们

English

相关视频

分享

文章指标

历史

文章二维码