基于句子级Lattice-长短记忆神经网络的中文电子病历命名实体识别
CSTR:
作者:
作者单位:

1.南通大学医学院;2.南山区西丽大学城哈工大校区;3.第二军医大学长征医院风湿免疫科;4.南通大学信息科学技术学院

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划(2018YFC0116902),国家自然科学基金(81873915),江苏省研究生科研与实践创新计划项目(KYCX17-1932).


Chinese electronic medical record named entity recognition based on sentence-level Lattice-long short-term memory neural network
Author:
Affiliation:

1.School of Medicine,Nantong University;2.Harbin Institute of Technology,Xili University City,Nanshan District;3.Department of Rheumatology and Immunology,Changzheng Hospital,Second Military Medical University

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    目的 提出一种基于Re-entity新分词方法的条件随机场(CRF)模型,并与双向长短记忆神经网络(BiLSTM)-CRF和Lattice-长短记忆神经网络(LSTM)进行比较。方法 比较了现有实体识别方法和模型后,针对2018年全国知识图谱与语义计算大会(CCKS2018)任务一“电子病历命名实体识别”,提出基于Re-entity的CRF、BiLSTM-CRF、Lattice-LSTM方法,并在不同语料库训练不同参数级别的字符向量集。分别将各方法引入神经网络模型中进行模型性能对比实验,最后分别基于句子级和篇级输入句长进行对比研究。结果 CRF模型在最优特征工程的结果下引入Re-entity方法后性能得到提高,句子级的Lattice-LSTM模型在该任务上取得了89.75%的严格F1-measure,优于CCKS2018任务一的最高结果(89.25%)。结论 基于Re-entity新分词方法的CRF模型可利用中文临床药物知识库有效提高电子病历中药物的识别率,Re-entity方法可改善数据预处理阶段分词导致的错误累加,Lattice结构可以更好地结合字符和词序列的潜在语义信息,同时句子级输入能有效提高神经网络模型的识别准确率。

    Abstract:

    Objective To propose a conditional random field (CRF) model based on the new word segmentation method Re-entity, and to compare with bi-directional long short-term memory neural network (BiLSTM)-CRF and Lattice-long short-term memory neural network (LSTM). Methods After analyzing the existing entity recognition methods, we proposed CRF method based on Re-entity, BiLSTM-CRF and Lattice-LSTM for the China Conference on Knowledge Graph and Semantic Computing in 2018 (CCKS2018) task one:Chinese clinical named entity recognition, and trained character vector sets at different parameter levels based on different corpora. The comparative experiments on model performance were carried out in the different neural network models for each methods. Finally, the comparative study was carried out based on different input lengths such as the sentence level and the text level. Results Re-entity method can improve the performance of CRF model. Lattice-LSTM model based on sentence level achieved a strict F1-measure of 89.75% on this task, which was higher than the highest F1-measure (89.25%) on the task one of CCKS2018. Conclusion The CRF model based on Re-entity can effectively improve the recognition rate of traditional Chinese medicines in electronic medical records by using normalized Chinese clinical drug. Re-entity method can improve the error accumulation caused by word segmentation in data preprocessing. Lattice structure can better combine the latent semantic information of characters and word sequences. At the same time, sentence-level input can effectively improve the recognition accuracy of neural network models.

    参考文献
    相似文献
    引证文献
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2019-02-23
  • 最后修改日期:2019-04-12
  • 录用日期:2019-05-14
  • 在线发布日期: 2019-06-11
  • 出版日期:
文章二维码
重要通知
友情提醒: 近日发现论文正式见刊或网络首发后,有人冒充我刊编辑部名义给作者发邮件,要求添加微信,此系诈骗行为!可致电编辑部核实:021-81870792。
            《海军军医大学学报》编辑部
关闭