NIU Lulu, LIU Taotao, HUANG Tianmin, LIU Yongjun, LUO Yilin, CHEN Xin, LIAO Yu, HE Jiansong, ZHU Donglan. Evaluate the Ability of DeepSeek and ChatGPT as Consulting Tools to Answer Drug-drug Interactions and Off-label Drug Use in the Real World[J]. Chinese Journal of Modern Applied Pharmacy, 2025, 42(17): 2929-2935. DOI: 10.13748/j.cnki.issn1007-7693.20251406
    Citation: NIU Lulu, LIU Taotao, HUANG Tianmin, LIU Yongjun, LUO Yilin, CHEN Xin, LIAO Yu, HE Jiansong, ZHU Donglan. Evaluate the Ability of DeepSeek and ChatGPT as Consulting Tools to Answer Drug-drug Interactions and Off-label Drug Use in the Real World[J]. Chinese Journal of Modern Applied Pharmacy, 2025, 42(17): 2929-2935. DOI: 10.13748/j.cnki.issn1007-7693.20251406

    Evaluate the Ability of DeepSeek and ChatGPT as Consulting Tools to Answer Drug-drug Interactions and Off-label Drug Use in the Real World

    • OBJECTIVE  To assess the responsiveness of DeepSeek and ChatGPT to real-world medication consultation queries, focusing specifically on their accuracy, comprehensiveness, and appropriateness in addressing drug-drug interactions(DDIs), off-label drug use(OLDU), and therapeutic recommendations.
      METHODS  Collected 86 clinically validated drug consultation questions from routine pharmacy outpatient services, encompassing DDIs and OLDU scenarios. Each query was formatted with standardized patient parameters(age, gender, medications, diagnosis) and concluded with the prompt: “As a pharmacist, can you identify potential issues in this treatment plan”. To assess response consistency, this study employed varied phrasing for identical clinical questions. Two independent clinical pharmacists evaluated answer accuracy, information completeness, therapeutic appropriateness, and inter-rater consistency.
      RESULTS Both models demonstrated 100% response rate. DeepSeek achieved 80.49% accuracy in treatment recommendations, while ChatGPT showed superior performance in OLDU(84.44% accuracy). Different questioning methods had effects on the correct rates of DDIs answers of the two models: Under the first questioning method, the correct rates of DeepSeek and ChatGPT were 12.20% and 41.46% respectively; Under the second way of questioning, the correct rate of DeepSeek was 24.39%, while that of ChatGPT was 21.95%, and the consistency of the two ways of questioning was poor. Error analysis revealed distinct failure patterns: 64.52% of DeepSeek errors involved factual inaccuracies, whereas ChatGPT primarily erred in dosage recommendations(62.96% of errors).
      CONCLUSION In medication consultations, although DeepSeek and ChatGPT demonstrate high accuracy in certain domains, their responses exhibit poor consistency and sensitivity. Consequently, directly employing these tools as clinical medication consultation aids in current real-world scenarios poses significant risks.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return