Article Type
Changed
Wed, 02/21/2024 - 11:42

 

TOPLINE:

ChatGPT (version 3.5) provides relatively poor and inconsistent responses when asked about appropriate colorectal cancer (CRC) screening and surveillance, a new study showed.

METHODOLOGY:

  • Three board-certified gastroenterologists with 10+ years of clinical experience developed five CRC screening and five CRC surveillance clinical vignettes (with multiple choice answers), which were fed to ChatGPT version 3.5.
  • ChatGPT’s responses were recorded over four separate sessions and screened for accuracy to determine reliability of the tool.
  • The average number of correct answers was compared to that of 238 gastroenterologists and colorectal surgeons answering the same questions with and without the help of a previously validated CRC screening mobile app.

TAKEAWAY:

  • ChatGPT’s average overall performance was 45%; the average number of correct answers was 2.75 for screening and 1.75 for surveillance.
  • ChatGPT’s responses were inconsistent in a large proportion of questions; the tool gave a different answer in four questions among the different sessions.
  • The average number of total correct answers of ChatGPT was significantly lower (P < .001) than that of physicians with and without the mobile app (7.71 and 5.62 correct answers, respectively).

IN PRACTICE:

“The use of validated mobile apps with decision-making algorithms could serve as more reliable assistants until large language models developed with AI are further refined,” the authors concluded.

SOURCE:

The study, with first author Lisandro Pereyra, MD, Department of Gastroenterology, Hospital Alemán of Buenos Aires, Argentina, was published online on February 7, 2024, in the Journal of Clinical Gastroenterology.

LIMITATIONS:

The 10 clinical vignettes represented a relatively small sample size to assess accuracy. The study did not use the latest version of ChatGPT. No “fine-tuning” attempts with inputs of diverse prompts, instructions, or relevant data were performed, which could potentially improve the performance of the chatbot.

DISCLOSURES:

The study had no specific funding. The authors declared no conflicts of interest.
 

A version of this article appeared on Medscape.com.

Publications
Topics
Sections

 

TOPLINE:

ChatGPT (version 3.5) provides relatively poor and inconsistent responses when asked about appropriate colorectal cancer (CRC) screening and surveillance, a new study showed.

METHODOLOGY:

  • Three board-certified gastroenterologists with 10+ years of clinical experience developed five CRC screening and five CRC surveillance clinical vignettes (with multiple choice answers), which were fed to ChatGPT version 3.5.
  • ChatGPT’s responses were recorded over four separate sessions and screened for accuracy to determine reliability of the tool.
  • The average number of correct answers was compared to that of 238 gastroenterologists and colorectal surgeons answering the same questions with and without the help of a previously validated CRC screening mobile app.

TAKEAWAY:

  • ChatGPT’s average overall performance was 45%; the average number of correct answers was 2.75 for screening and 1.75 for surveillance.
  • ChatGPT’s responses were inconsistent in a large proportion of questions; the tool gave a different answer in four questions among the different sessions.
  • The average number of total correct answers of ChatGPT was significantly lower (P < .001) than that of physicians with and without the mobile app (7.71 and 5.62 correct answers, respectively).

IN PRACTICE:

“The use of validated mobile apps with decision-making algorithms could serve as more reliable assistants until large language models developed with AI are further refined,” the authors concluded.

SOURCE:

The study, with first author Lisandro Pereyra, MD, Department of Gastroenterology, Hospital Alemán of Buenos Aires, Argentina, was published online on February 7, 2024, in the Journal of Clinical Gastroenterology.

LIMITATIONS:

The 10 clinical vignettes represented a relatively small sample size to assess accuracy. The study did not use the latest version of ChatGPT. No “fine-tuning” attempts with inputs of diverse prompts, instructions, or relevant data were performed, which could potentially improve the performance of the chatbot.

DISCLOSURES:

The study had no specific funding. The authors declared no conflicts of interest.
 

A version of this article appeared on Medscape.com.

 

TOPLINE:

ChatGPT (version 3.5) provides relatively poor and inconsistent responses when asked about appropriate colorectal cancer (CRC) screening and surveillance, a new study showed.

METHODOLOGY:

  • Three board-certified gastroenterologists with 10+ years of clinical experience developed five CRC screening and five CRC surveillance clinical vignettes (with multiple choice answers), which were fed to ChatGPT version 3.5.
  • ChatGPT’s responses were recorded over four separate sessions and screened for accuracy to determine reliability of the tool.
  • The average number of correct answers was compared to that of 238 gastroenterologists and colorectal surgeons answering the same questions with and without the help of a previously validated CRC screening mobile app.

TAKEAWAY:

  • ChatGPT’s average overall performance was 45%; the average number of correct answers was 2.75 for screening and 1.75 for surveillance.
  • ChatGPT’s responses were inconsistent in a large proportion of questions; the tool gave a different answer in four questions among the different sessions.
  • The average number of total correct answers of ChatGPT was significantly lower (P < .001) than that of physicians with and without the mobile app (7.71 and 5.62 correct answers, respectively).

IN PRACTICE:

“The use of validated mobile apps with decision-making algorithms could serve as more reliable assistants until large language models developed with AI are further refined,” the authors concluded.

SOURCE:

The study, with first author Lisandro Pereyra, MD, Department of Gastroenterology, Hospital Alemán of Buenos Aires, Argentina, was published online on February 7, 2024, in the Journal of Clinical Gastroenterology.

LIMITATIONS:

The 10 clinical vignettes represented a relatively small sample size to assess accuracy. The study did not use the latest version of ChatGPT. No “fine-tuning” attempts with inputs of diverse prompts, instructions, or relevant data were performed, which could potentially improve the performance of the chatbot.

DISCLOSURES:

The study had no specific funding. The authors declared no conflicts of interest.
 

A version of this article appeared on Medscape.com.

Publications
Publications
Topics
Article Type
Sections
Disallow All Ads
Content Gating
No Gating (article Unlocked/Free)
Alternative CME
Disqus Comments
Default
Use ProPublica
Hide sidebar & use full width
render the right sidebar.
Conference Recap Checkbox
Not Conference Recap
Clinical Edge
Display the Slideshow in this Article
Medscape Article
Display survey writer
Reuters content
Disable Inline Native ads
WebMD Article