基于大语言模型结合RAG技术的慢性肾脏病人工智能体应用模型构建

张智琪; 杨四涛; 刘冬瑞; 周丹; 吴宇平; 杭永付; 李悦; 程宗琦; 虞勋

doi:10.13748/j.cnki.issn1007-7693.20251167

基于大语言模型结合RAG技术的慢性肾脏病人工智能体应用模型构建

Construction of an Artificial Intelligence Application Model for Chronic Kidney Disease Based on Large Language Models Combined with RAG Technology

摘要

摘要:
目的结合大语言模型(large language models，LLM)与检索增强生成(retrieval-augmented generation，RAG)技术，构建慢性肾脏病(chronic kidney disease，CKD)患者用药教育知识库及多模态指导系统，以提升患者用药安全性和依从性，并为医疗专业人员提供辅助支持。
方法通过数据收集与预处理、模型构建与训练、技术集成、知识库构建与维护以及系统评估与优化等方法搭建模型。设计30个CKD相关问题，选取Kimi、讯飞星火(Spark)及智谱(Zhipu)三大中文LLM进行对比评估，由10名肾内科临床药师从准确性、完整性、相关性、逻辑性及专业性5个维度评分，重点评估模型在CKD场景下的临床逻辑一致性、证据溯源完整性及禁忌证识别准确率。每位药师对3种模型处理方式(基础模型、加入提示词、加入知识库)的回答分别评分，共收集30份评分表。同时，收集5家软件开发公司在需求分析、规则设计、系统训练与测试、部署与优化4个阶段的时间投入数据，对比传统开发模式与LLM+RAG模式的耗时差异。采用双因素及单因素方差分析评估模型评分差异，配对t检验分析开发时间差异(P<0.05为差异具有统计学意义)。
结果不同处理方式与模型间的交互评分差异有统计学意义(P<0.001)。Kimi模型在加入提示词后的评分显著高于讯飞与智谱模型；加入知识库后，Kimi评分最高，与智谱差异无统计学意义，但显著高于讯飞模型；基础模型中Kimi评分亦最高。相同模型下，Kimi加入知识库后评分较提示词处理显著提升，但与基础模型无差异；讯飞和智谱加入知识库后评分均显著提升。LLM+RAG模式较传统模式显著缩短开发时间(P=0.017)，规则设计阶段效率提升80%，平均每个阶段节省2.125周，总体效率提高45.9%。
结论 LLM结合RAG技术可显著提升开发效率并缩短周期，优化提示词与知识库能最大化模型性能。不同模型可根据成本与速度需求选择。本研究验证了LLM+RAG在医疗领域的应用潜力，但知识库覆盖范围、模型泛化能力及长期维护仍需优化。未来将扩展知识库并提升智能化水平，以提供更精准的医疗辅助工具。

Abstract:
OBJECTIVE To construct a medication education knowledge base for patients with chronic kidney disease(CKD) by integrating the context understanding capability of large language models(LLM) with the dynamic knowledge retrieval mechanism of retrieval-augmented generation(RAG), a multimodal medication guidance system is constructed to enhance the safety and compliance of patients’ medication use, and to assist medical professionals.
METHODS The model was built through methods such as data collection and preprocessing, model construction and training, technology integration, knowledge base construction and maintenance, as well as system evaluation and optimization. Thirty CKD-related questions were designed, and three major Chinese LLMs, namely Kimi, Xunfei Starfire(Spark), and Zhipu, were selected for comparative evaluation. Ten clinical pharmacists from the nephrology department rated the models based on five dimensions: accuracy, completeness, relevance, logic, and professionalism. The focus was on evaluating the clinical logical consistency, evidence traceability completeness, and accuracy of contraindications identification of the model in the CKD scenario. Each pharmacist rated the responses of the three model processing methods(basic model, adding prompts, adding knowledge base) separately, and a total of 30 rating forms were collected. At the same time, time investment data for the four stages of requirement analysis, rule design, system training and testing, deployment and optimization were collected from 5 software development companies, and the time consumption differences between the traditional development mode and the LLM+RAG mode were compared. Two-factor and one-factor analysis of variance were used to evaluate the differences in model scores, and paired t-tests were used to analyze the differences in development time(P<0.05 was considered statistically significant for the difference to be significant).
RESULTS The interaction scores among different processing methods and models were significantly different(P<0.001). The score of the Kimi model was significantly higher than that of the Xunfei and Zhipu models after adding prompt words; after adding the knowledge base, the score of Kimi was the highest, with no significant difference from Zhipu but significantly higher than that of Xunfei; among the basic models, the score of Kimi was also the highest. Under the same model, the score of Kimi after adding the knowledge base was significantly higher than that after adding prompt words, but there was no difference from the basic model; the scores of Xunfei and Zhipu were significantly improved after adding the knowledge base. The LLM+RAG mode significantly shortened the development time compared to the traditional mode(P=0.017), with an 80% increase in efficiency during the rule design stage, an average saving of 2.125 weeks per stage, and an overall efficiency improvement of 45.9%.
CONCLUSION The combination of LLM and RAG technology can significantly enhance development efficiency and shorten the cycle, and optimizing prompts and knowledge bases can maximize model performance. Different models can be selected based on cost and speed requirements. This study has verified the application potential of LLM+RAG in the medical field, but the coverage of the knowledge base, the model’s generalization ability, and long-term maintenance still need to be optimized. In the future, the knowledge base will be expanded and the intelligence level will be improved to provide more accurate medical assistance tools.

HTML全文

参考文献(13)

施引文献

资源附件(0)