Can ChatGPT Improve Pancreatic Cancer Synoptic Reports?

Article Type

Changed

Mon, 07/08/2024 - 12:34

Author(s)

TOPLINE:

GPT-4 generated highly accurate pancreatic cancer synoptic reports from original reports, outperforming GPT-3.5. Using GPT-4 reports instead of original reports, surgeons were able to better assess tumor resectability in patients with pancreatic ductal adenocarcinoma and saved time evaluating reports.

METHODOLOGY:

Compared with original reports, structured imaging reports help surgeons assess tumor resectability in patients with pancreatic ductal adenocarcinoma. However, radiologist uptake of structured reporting remains inconsistent.
To determine whether converting free-text (ie, original) radiology reports into structured reports can benefit surgeons, researchers evaluated how well GPT-4 and GPT-3.5 were able to generate pancreatic ductal adenocarcinoma synoptic reports from originals.
The retrospective study included 180 consecutive pancreatic ductal adenocarcinoma staging CT reports, which were reviewed by two radiologists to establish a reference standard for 14 key findings and National Comprehensive Cancer Network resectability category.
Researchers prompted GPT-3.5 and GPT-4 to create synoptic reports from original reports using the same criteria, and surgeons compared the precision, accuracy, and time to assess the original and artificial intelligence (AI)–generated reports.

TAKEAWAY:

GPT-4 outperformed GPT-3.5 on all metrics evaluated. For instance, compared with GPT-3.5, GPT-4 achieved equal or higher F1 scores for all 14 key features (F1 scores help assess the precision and recall of a machine-learning model).
GPT-4 also demonstrated greater precision than GPT-3.5 for extracting superior mesenteric artery involvement (100% vs 88.8%, respectively) and for categorizing resectability.
Compared with original reports, AI-generated reports helped surgeons better categorize resectability (83% vs 76%, respectively; P = .03), and surgeons spent less time when using AI-generated reports.
The AI-generated reports did lead to some clinically notable errors. GPT-4, for instance, made errors in extracting common hepatic artery involvement.

IN PRACTICE:

“In our study, GPT-4 was near-perfect at automatically creating pancreatic ductal adenocarcinoma synoptic reports from original reports, outperforming GPT-3.5 overall,” the authors wrote. This “represents a useful application that can increase standardization and improve communication between radiologists and surgeons.” However, the authors cautioned, the “presence of some clinically significant errors highlights the need for implementation in supervised and preliminary contexts, rather than being relied on for management decisions.”

SOURCE:

The study, with first author Rajesh Bhayana, MD, University Health Network in Toronto, Ontario, Canada, was published online in Radiology.

LIMITATIONS:

While GPT-4 showed high accuracy in report generation, it did lead to some errors. Researchers also relied on original reports when generating the AI reports, and the original reports can contain ambiguous descriptions and language.

DISCLOSURES:

Dr. Bhayana reported no relevant conflicts of interest. Additional disclosures are noted in the original article.

A version of this article first appeared on Medscape.com.

Publications

MDedge Hematology and Oncology

Oncology Practice

Topics

Gastrointestinal Cancer

Practice Management

Gastroenterology

Sections

From the Journals

TOPLINE:

GPT-4 generated highly accurate pancreatic cancer synoptic reports from original reports, outperforming GPT-3.5. Using GPT-4 reports instead of original reports, surgeons were able to better assess tumor resectability in patients with pancreatic ductal adenocarcinoma and saved time evaluating reports.

METHODOLOGY:

Compared with original reports, structured imaging reports help surgeons assess tumor resectability in patients with pancreatic ductal adenocarcinoma. However, radiologist uptake of structured reporting remains inconsistent.
To determine whether converting free-text (ie, original) radiology reports into structured reports can benefit surgeons, researchers evaluated how well GPT-4 and GPT-3.5 were able to generate pancreatic ductal adenocarcinoma synoptic reports from originals.
The retrospective study included 180 consecutive pancreatic ductal adenocarcinoma staging CT reports, which were reviewed by two radiologists to establish a reference standard for 14 key findings and National Comprehensive Cancer Network resectability category.
Researchers prompted GPT-3.5 and GPT-4 to create synoptic reports from original reports using the same criteria, and surgeons compared the precision, accuracy, and time to assess the original and artificial intelligence (AI)–generated reports.

TAKEAWAY:

GPT-4 outperformed GPT-3.5 on all metrics evaluated. For instance, compared with GPT-3.5, GPT-4 achieved equal or higher F1 scores for all 14 key features (F1 scores help assess the precision and recall of a machine-learning model).
GPT-4 also demonstrated greater precision than GPT-3.5 for extracting superior mesenteric artery involvement (100% vs 88.8%, respectively) and for categorizing resectability.
Compared with original reports, AI-generated reports helped surgeons better categorize resectability (83% vs 76%, respectively; P = .03), and surgeons spent less time when using AI-generated reports.
The AI-generated reports did lead to some clinically notable errors. GPT-4, for instance, made errors in extracting common hepatic artery involvement.

IN PRACTICE:

“In our study, GPT-4 was near-perfect at automatically creating pancreatic ductal adenocarcinoma synoptic reports from original reports, outperforming GPT-3.5 overall,” the authors wrote. This “represents a useful application that can increase standardization and improve communication between radiologists and surgeons.” However, the authors cautioned, the “presence of some clinically significant errors highlights the need for implementation in supervised and preliminary contexts, rather than being relied on for management decisions.”

SOURCE:

The study, with first author Rajesh Bhayana, MD, University Health Network in Toronto, Ontario, Canada, was published online in Radiology.

LIMITATIONS:

While GPT-4 showed high accuracy in report generation, it did lead to some errors. Researchers also relied on original reports when generating the AI reports, and the original reports can contain ambiguous descriptions and language.

DISCLOSURES:

Dr. Bhayana reported no relevant conflicts of interest. Additional disclosures are noted in the original article.

A version of this article first appeared on Medscape.com.

TOPLINE:

GPT-4 generated highly accurate pancreatic cancer synoptic reports from original reports, outperforming GPT-3.5. Using GPT-4 reports instead of original reports, surgeons were able to better assess tumor resectability in patients with pancreatic ductal adenocarcinoma and saved time evaluating reports.

METHODOLOGY:

Compared with original reports, structured imaging reports help surgeons assess tumor resectability in patients with pancreatic ductal adenocarcinoma. However, radiologist uptake of structured reporting remains inconsistent.
To determine whether converting free-text (ie, original) radiology reports into structured reports can benefit surgeons, researchers evaluated how well GPT-4 and GPT-3.5 were able to generate pancreatic ductal adenocarcinoma synoptic reports from originals.
The retrospective study included 180 consecutive pancreatic ductal adenocarcinoma staging CT reports, which were reviewed by two radiologists to establish a reference standard for 14 key findings and National Comprehensive Cancer Network resectability category.
Researchers prompted GPT-3.5 and GPT-4 to create synoptic reports from original reports using the same criteria, and surgeons compared the precision, accuracy, and time to assess the original and artificial intelligence (AI)–generated reports.

TAKEAWAY:

GPT-4 outperformed GPT-3.5 on all metrics evaluated. For instance, compared with GPT-3.5, GPT-4 achieved equal or higher F1 scores for all 14 key features (F1 scores help assess the precision and recall of a machine-learning model).
GPT-4 also demonstrated greater precision than GPT-3.5 for extracting superior mesenteric artery involvement (100% vs 88.8%, respectively) and for categorizing resectability.
Compared with original reports, AI-generated reports helped surgeons better categorize resectability (83% vs 76%, respectively; P = .03), and surgeons spent less time when using AI-generated reports.
The AI-generated reports did lead to some clinically notable errors. GPT-4, for instance, made errors in extracting common hepatic artery involvement.

IN PRACTICE:

“In our study, GPT-4 was near-perfect at automatically creating pancreatic ductal adenocarcinoma synoptic reports from original reports, outperforming GPT-3.5 overall,” the authors wrote. This “represents a useful application that can increase standardization and improve communication between radiologists and surgeons.” However, the authors cautioned, the “presence of some clinically significant errors highlights the need for implementation in supervised and preliminary contexts, rather than being relied on for management decisions.”

SOURCE:

The study, with first author Rajesh Bhayana, MD, University Health Network in Toronto, Ontario, Canada, was published online in Radiology.

LIMITATIONS:

While GPT-4 showed high accuracy in report generation, it did lead to some errors. Researchers also relied on original reports when generating the AI reports, and the original reports can contain ambiguous descriptions and language.