Presentation Description
Institution: Monash Medical Centre - Victoria, Australia
AIMS
Artificial intelligence (AI) scribes now generate clinical letters directly from transcribed clinical interactions, reducing clinical workload and boosting efficiency. However, no tool currently exists to assess the quality of this documentation method. We developed and validated a new tool for this purpose and examined the reliability of using an unsupervised Large Language Model (LLM) to implement this tool for document evaluation.
METHODOLOGY
We created the Generative AI Medical Letter Evaluation Tool (GAIMLET), a 10-domain tool to assess the quality of AI-generated medical letters. This tool was based on a physician referrer survey and published criteria for letter quality. Using an Australian AI scribe platform (i-scribe®), we generated outpatient letters from de-identified transcripts of ENT consults (n=20). Demographics and letter variables were collected, and 4 experienced otolaryngologists scored these letters using GAIMLET. We also engineered a GPT-4o LLM prompt to conduct the same evaluation in an unsupervised manner, re-testing it 7 days later. Scores from human raters and AI were compared for inter-rater reliability, and LLM test-retest scores were assessed for intra-rater reliability.
RESULTS
Results will be presented at the conference. Letter variables, demographics and overall scores will be shown. Internal test consistency and validation will be performed using Cronbach’s alpha. Overall scores will be compared between human and AI results. Test-retest LLM reliability will be tested using Spearman analysis. Inter-rater reliability will be measured by Kendall’s W.
CONCLUSION
This study aims to validate GAIMLET , a tool for assessing AI-generated medical letters. GAIMLET has the potential for broad application in AI-driven medical documentation. By examining the feasibility of unsupervised LLM evaluation, clinicians may have a reliable way to assess the quality of AI-generated scribe letters. More evidence is needed to support LLM-guided appraisals.
Speakers
Authors
Authors
Dr Jared Panario - , A/Prof Paul Paddle -