Date created: | Last Updated:
: DOI | ARK
Creating DOI. Please wait...
Category: Project
Description: Large language models (LLM) such as ChatGPT are now highly popular chatbots able to provide original, human-like responses to user prompts based on supervised and reinforcement machine learning techniques. LLMs (e.g., ChatGPT) can access a vast amount of information including scientific data. They have extensive potential as science communicators by providing laypeople, governments, and policymakers with easily understandable explanations of scientific findings, thus helping to increase science literacy worldwide. However, this potential may also come with significant risks. Specifically, it remains unclear whether LLM summaries of scientific texts capture the uncertainties, limitations, and nuances of research, or contain oversimplified texts, omitting qualifiers or quantifiers present in scientific texts. The omission of qualifiers may result in generalizations of scientific findings that are much broader than warranted by the original research, potentially resulting in human users’ misinterpretations of scientific findings. The scope and accuracy of the generalizations that LLMs may produce in their science communication has not yet been systematically explored, however. We aim to fill this gap by testing and comparing the generalizations found in human summaries (e.g., abstracts) of scientific texts (published in the highest ranking journals) with the summaries produced by four currently leading LLMs (e.g., ChatGPT-4o, DeepSeek, Claude, Llama).
Files can now be accessed and managed under the Files tab.