摘要: |
目的 通过生物信息学分析和机器学习模型挖掘公共数据库中的有效信息,识别子痫前期相关的候选基因,以提高子痫前期早期诊断的准确性并为发病机制和诊疗研究提供靶点。方法 从基因表达综合数据库中检索子痫前期患者和正常孕妇胎盘组织样本的RNA-seq数据集,利用生物信息分析工具完成数据下载、质量控制、比对及定量后获得基因表达矩阵。采用DESeq2 1.38.3工具筛选差异表达基因,通过基因本体和京都基因与基因组百科全书数据库确定富集通路,利用加权基因共表达网络分析(WGCNA)构建共表达网络,利用随机森林算法建立机器学习预测模型。结果 4个数据集156例孕妇(70例子痫前期患者、86例正常孕妇)胎盘组织样本共筛选出49个共有差异表达基因,这些基因显著富集在细胞外区域、卵泡刺激素分泌的正向调节通路、激素活性通路及细胞因子-细胞因子受体相互作用等信号通路。通过WGCNA将49个差异表达基因分为7个共表达模块,鉴定出与子痫前期高度相关的关键模块,并筛选出6个候选关键基因,分别为fms相关受体酪氨酸激酶1(FLT1)、冠毛素2(PAPPA2)、蛋白磷酸酶1调节抑制因子亚基1C(PPP1R1C)、肌球蛋白ⅦB(MYO7B)、长基因间非蛋白编码RNA 2009(LINC02009)和抑制素亚基α(INHA)。基于这6个关键基因构建的随机森林模型对子痫前期有较好的预测价值(AUC=0.978)。结论 子痫前期可能与激素分泌、免疫反应、血管生成因子、妊娠相关血浆蛋白、抑制素等有关,相关基因或可成为子痫前期诊断的候选标志物。 |
关键词: 子痫前期 生物标志物 加权基因共表达网络分析 随机森林模型 |
DOI:10.16781/j.CN31-2187/R.20240049 |
投稿时间:2024-01-19修订日期:2024-08-26 |
基金项目:国家自然科学基金面上项目(81971402). |
|
Mining diagnostic markers of preeclampsia based on weighted gene co-expression network analysis |
YAO Ruiqian1,2,YU Dong3,4*,XUE Geng2* |
(1. School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China; 2. Department of Medical Genetics, College of Basic Medical Sciences, Naval Medical University (Second Military Medical University), Shanghai 200433, China; 3. Department of Precision Medicine, Center of Translational Medicine, Naval Medical University (Second Military Medical University), Shanghai 200433, China; 4. Shanghai Key Laboratory of Cell Engineering, Shanghai 200433, China *Corresponding authors) |
Abstract: |
Objective To mine valid information in public databases through bioinformatics analysis and machine learning models and to identify candidate genes related to preeclampsia, so as to improve the accuracy of early diagnosis and provide targets for pathogenesis, diagnosis and treatment research. Methods The RNA-seq datasets of placental tissue samples of preeclampsia patients and healthy pregnant women were retrieved from the Gene Expression Omnibus, and the gene expression matrix was obtained after data download, quality control, comparison and quantification through bioinformation analysis.The differentially expressed genes were screened by DESeq2 1.38.3, the enrichment pathway was determined using Gene Ontology and Kyoto Encyclopedia of Genes and Genomes, the co-expression network was constructed using weighted gene co-expression network analysis (WGCNA), and the machine learning prediction model was established by random forest algorithm. Results A total of 49 common differentially expressed genes were screened from placental tissue samples of 156 pregnant women (70 preeclampsia patients and 86 healthy pregnant women) in 4 datasets and they were significantly enriched in extracellular regions, positive regulation pathway of follicle-stimulating hormone secretion, hormone activity pathway, and cytokine-cytokine receptor interaction pathway, etc.The 49 differentially expressed genes were categorized into 7 co-expression modules by WGCNA, and key modules highly related to preeclampsia were identified.Six candidate key genes (fms related receptor tyrosine kinase 1 [FLT1], pappalysin 2 [PAPPA2], protein phosphatase 1 regulatory inhibitor subunit 1C [PPP1R1C], myosin ⅦB [MYO7B], long intergenic non-protein coding RNA 2009 [LINC02009], and inhibin subunit α [INHA]) were screened.The random forest model based on these 6 key genes had good predictive value for preeclampsia (area under curve was 0.978). Conclusion Preeclampsia may be associated with genes for hormone secretion, immune response, angiogenic factors, pregnancy-associated plasma proteins, and inhibin, and these genes may be candidate diagnostic markers of preeclampsia. |
Key words: preeclampsia biomarkers weighted gene co-expression network analysis random forest model |