Main content

Contributors:

Date created: | Last Updated:

: DOI | ARK

Creating DOI. Please wait...

Create DOI

Category: Project

Description: In the realm of natural language processing research, Large Language Models (LLMs) have emerged as powerful tools that offer novel, surprisingly, profound, but also questionable insights into language usage across diverse domains and registers. These models are built using vast amounts of language data and are readily accessible through web interfaces and APIs. The challenge lies in understanding and evaluating LLM outputs, given their 'black box' design and their tendency to generate 'hallucinations' (inaccuracies in model output), especially when they lack representative data in a target domain. This paper addresses the challenges of using LLMs by seamlessly integrating them into a conventional corpus analysis toolkit and establishing a direct connection between the LLM and user-defined corpora. This integration enables users to perform targeted LLM-based queries about individual corpus files or the entire corpus, as well as prompting the LLM for insights on results generated by traditional corpus tools, such as KWIC concordancers and collocate tools. The integration of LLMs with corpus tools described in this paper also allows for strategic prompt engineering that significantly mitigates the risk of 'hallucinations'. Moreover, the accuracy of LLM-derived insights can be easily validated using direct links to the original corpus data, thereby enhancing the credibility and utility of LLMs in corpus research.

Files

Loading files...

Citation

Recent Activity

Loading logs...

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.