GAIMLET: a new tool to assess AI-generated outpatient ENT letters, and

Times are shown in your local time zone GMT

GAIMLET: a new tool to assess AI-generated outpatient ENT letters, and its use in a large language model

Poster

Edit Your Submission

Edit

Favourite

Poster

Disciplines

Default

Presentation Description

Institution: Monash Medical Centre - Victoria, Australia

AIMS Artificial intelligence (AI) scribes now generate clinical letters directly from transcribed clinical interactions, reducing clinical workload and boosting efficiency. However, no tool currently exists to assess the quality of this documentation method. We developed and validated a new tool for this purpose and examined the reliability of using an unsupervised Large Language Model (LLM) to implement this tool for document evaluation. METHODOLOGY We created the Generative AI Medical Letter Evaluation Tool (GAIMLET), a 10-domain tool to assess the quality of AI-generated medical letters. This tool was based on a physician referrer survey and published criteria for letter quality. Using an Australian AI scribe platform (i-scribe®), we generated outpatient letters from de-identified transcripts of ENT consults (n=20). Demographics and letter variables were collected, and 4 experienced otolaryngologists scored these letters using GAIMLET. We also engineered a GPT-4o LLM prompt to conduct the same evaluation in an unsupervised manner, re-testing it 7 days later. Scores from human raters and AI were compared for inter-rater reliability, and LLM test-retest scores were assessed for intra-rater reliability. RESULTS Results will be presented at the conference. Letter variables, demographics and overall scores will be shown. Internal test consistency and validation will be performed using Cronbach’s alpha. Overall scores will be compared between human and AI results. Test-retest LLM reliability will be tested using Spearman analysis. Inter-rater reliability will be measured by Kendall’s W. CONCLUSION This study aims to validate GAIMLET , a tool for assessing AI-generated medical letters. GAIMLET has the potential for broad application in AI-driven medical documentation. By examining the feasibility of unsupervised LLM evaluation, clinicians may have a reliable way to assess the quality of AI-generated scribe letters. More evidence is needed to support LLM-guided appraisals.

Speakers

Jared Panario

Peninsula Health - Australia

Authors

Dr Jared Panario - , A/Prof Paul Paddle -