本文已被:浏览 1105次 下载 816次 |
码上扫一扫! |
一种基于注意力机制的电子病历中药物词向量转化方法 |
王青华1,邵劲松2,张远鹏1,姜磊3,4,白鹤鸣5,黄勋6,徐沪济3,王理1,4,5* |
|
(1. 南通大学医学院医学信息系, 南通 226001; 2. 南通大学公共卫生学院公共卫生与预防医学系, 南通 226001; 3. 海军军医大学(第二军医大学)长征医院风湿免疫科, 上海 200003; 4. 清华大学精准医学研究院, 北京 100084; 5. 南通大学智能信息技术中心, 南通 226001; 6. 南通大学信息科学技术学院通讯工程系, 南通 226001 *通信作者) |
|
摘要: |
目的 提出一种基于注意力机制的药物词向量生成模型Drug2vec,对药物信息做向量化表示,并与Word2vec和Med2vec模型比较向量转化效果。方法 使用注意力机制捕获医疗实体对中心词的作用,提出Drug2vec模型,将非结构化电子病历中的医疗实体转化为向量。使用包含14 219例系统性红斑狼疮(SLE)患者和963个药物实体的数据集测试Drug2vec模型生成词向量的效果,并且与广泛应用的语言概念空间向量转化模型Word2vec和Med2vec进行对比。结果 在SLE患者数据集中,Drug2vec模型产生的药物词向量准确度优于Word2vec和Med2vec模型。药物词向量相似度排序结果显示Drug2vec模型产生的向量结果符合临床医师的用药顺序。结论 Drug2vec模型可以更精确地利用周围实体修正中心药物实体,从而产生更准确的药物向量。 |
关键词: 语言概念词向量 注意力机制 系统性红斑狼疮 电子健康病历 |
DOI:10.16781/j.0258-879x.2020.10.1129 |
投稿时间:2019-10-30修订日期:2020-02-21 |
基金项目:国家自然科学基金(81873915),国家重点研发计划(2018YFC0116902),南通市科技项目(CP12016003). |
|
A drug word vector conversion method in electronic medical record based on attention mechanism |
WANG Qing-hua1,SHAO Jin-song2,ZHANG Yuan-peng1,JIANG Lei3,4,BAI He-ming5,HUANG Xun6,XU Hu-ji3,WANG Li1,4,5* |
(1. Department of Medical Informatics, School of Medicine, Nantong University, Nantong 226001, Jiangsu, China; 2. Department of Public Health and Preventive Medicine, School of Public Health, Nantong University, Nantong 226001, Jiangsu, China; 3. Department of Rheumatology and Immunology, Changzheng Hospital, Naval Medical University(Second Military Medical University), Shanghai 200003, China; 4. Institute of Precision Medicine, Tsinghua University, Beijing 100084, China; 5. Intelligence Information Technology Center, Nantong University, Nantong 226001, Jiangsu, China; 6. Department of Communication Engineering, School of Information Science and Technology, Nantong University, Nantong 226001, Jiangsu, China *Corresponding author) |
Abstract: |
Objective To propose a drug word vector conversion model based on attention mechanism named Drug2vec for generating vectorized representation of drug information, and to compare the vector conversion effect with Word2vec and Med2vec. Methods Using the attention mechanism to capture the roles of medical entities on the central word, we proposed a Drug2vec model to convert medical entities in unstructured electronic medical records into vectors. Using the systemic lupus erythematosus (SLE) dataset of 14 219 patients and 963 drug entities, we tested the effect of the drug vectors generated by Drug2vec and compared it with the widely used language concept space vector conversion models Word2vec and Med2vec. Results In the SLE dataset, the accuracy of drug vectors generated by Drug2vec was higher than those of Word2vec and Med2vec models. The rank results of the similarity of drugs showed that the drug vectors generated by Drug2vec were consistent with the clinician's medication order. Conclusion Drug2vec model can more accurately modify central drug entities using contextual entities, producing more precise drug vectors. |
Key words: language concept word vector attention mechanism systemic lupus erythematosus electronic health records |