评价DeepSeek与ChatGPT作为咨询工具在真实世界中回答药物-药物相互作用和超说明书用药的能力

    Evaluate the Ability of DeepSeek and ChatGPT as Consulting Tools to Answer Drug-drug Interactions and Off-label Drug Use in the Real World

    • 摘要:
      目的  评估DeepSeek与ChatGPT对真实世界药物咨询问题的应答能力,重点分析其在药物相互作用(drug-drug interactions,DDIs)、超说明书用药(off-label drug use,OLDU)和治疗方案推荐中的准确性、完整性与适宜性。
      方法  在常规的药学门诊工作中收集患者提出的有关DDIs和OLDU的86个问题。以标准格式,包括患者的年龄、性别、药物名称和疾病诊断分别向DeepSeek与ChatGPT提问;问题以拟人化形式“作为一名药剂师,你能告诉我治疗方案中存在的问题吗”结束。此外,为了进一步评估DeepSeek与ChatGPT各自回答的一致性,相同的问题以不同的方式提出。所有回答均由2位临床药师进行独立评估,重点评价回答的准确性、一致性和适宜性。
      结果  DeepSeek和ChatGPT回答了所有问题,临床药师对回答进行评估。其中,DeepSeek在回答治疗方案问题时正确率最高为80.49%,ChatGPT在OLDU领域表现最好,正确率为84.44%。不同提问方式对2款模型DDIs回答正确率有影响:第一种提问方式下,DeepSeek和ChatGPT正确率分别为12.20%和41.46%;第二种提问方式下,DeepSeek正确率为24.39%,而ChatGPT为21.95%,2种提问方式回答的一致性较差。最后对两者回答错误原因进行分析,发现DeepSeek错误类型主要为回答中存在部分错误信息(64.52%),而ChatGPT的错误则集中在推荐了不合适的给药剂量(62.96%)。
      结论  在药物咨询中尽管DeepSeek和ChatGPT分别在某些领域具有较高的准确性,但它们回答的一致性和敏感性较差。因此直接将二者作为临床药物咨询工具在目前现实情景中使用存在重大风险。

       

      Abstract:
      OBJECTIVE  To assess the responsiveness of DeepSeek and ChatGPT to real-world medication consultation queries, focusing specifically on their accuracy, comprehensiveness, and appropriateness in addressing drug-drug interactions(DDIs), off-label drug use(OLDU), and therapeutic recommendations.
      METHODS  Collected 86 clinically validated drug consultation questions from routine pharmacy outpatient services, encompassing DDIs and OLDU scenarios. Each query was formatted with standardized patient parameters(age, gender, medications, diagnosis) and concluded with the prompt: “As a pharmacist, can you identify potential issues in this treatment plan”. To assess response consistency, this study employed varied phrasing for identical clinical questions. Two independent clinical pharmacists evaluated answer accuracy, information completeness, therapeutic appropriateness, and inter-rater consistency.
      RESULTS Both models demonstrated 100% response rate. DeepSeek achieved 80.49% accuracy in treatment recommendations, while ChatGPT showed superior performance in OLDU(84.44% accuracy). Different questioning methods had effects on the correct rates of DDIs answers of the two models: Under the first questioning method, the correct rates of DeepSeek and ChatGPT were 12.20% and 41.46% respectively; Under the second way of questioning, the correct rate of DeepSeek was 24.39%, while that of ChatGPT was 21.95%, and the consistency of the two ways of questioning was poor. Error analysis revealed distinct failure patterns: 64.52% of DeepSeek errors involved factual inaccuracies, whereas ChatGPT primarily erred in dosage recommendations(62.96% of errors).
      CONCLUSION In medication consultations, although DeepSeek and ChatGPT demonstrate high accuracy in certain domains, their responses exhibit poor consistency and sensitivity. Consequently, directly employing these tools as clinical medication consultation aids in current real-world scenarios poses significant risks.

       

    /

    返回文章
    返回