Enhancing Corpus Analysis through the Integration of Large Language Models (LLMs)

Laurence Anthony

doi:None

Enhancing Corpus Analysis through the Integration of Large Language Models (LLMs)

Contributors:

Laurence Anthony

Date created: | Last Updated:

: DOI | ARK

Creating DOI. Please wait...

Create DOI

Category: Project

Description: In the realm of natural language processing research, Large Language Models (LLMs) have emerged as powerful tools that offer novel, surprisingly, profound, but also questionable insights into language usage across diverse domains and registers. These models are built using vast amounts of language data and are readily accessible through web interfaces and APIs. The challenge lies in understanding and evaluating LLM outputs, given their 'black box' design and their tendency to generate 'hallucinations' (inaccuracies in model output), especially when they lack representative data in a target domain. This paper addresses the challenges of using LLMs by seamlessly integrating them into a conventional corpus analysis toolkit and establishing a direct connection between the LLM and user-defined corpora. This integration enables users to perform targeted LLM-based queries about individual corpus files or the entire corpus, as well as prompting the LLM for insights on results generated by traditional corpus tools, such as KWIC concordancers and collocate tools. The integration of LLMs with corpus tools described in this paper also allows for strategic prompt engineering that significantly mitigates the risk of 'hallucinations'. Moreover, the accuracy of LLM-derived insights can be easily validated using direct links to the original corpus data, thereby enhancing the credibility and utility of LLMs in corpus research.

Projects
Registrations

Results: All Projects Results: My Projects Results: All Registrations Results: My Registrations

Files

Files can now be accessed and managed under the Files tab.

Citation

Recent Activity

Unable to retrieve logs at this time. Please refresh the page or contact support@osf.io if the problem persists.

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.

Create an Account Learn More Hide this message

Main content

Links to this project

Enhancing Corpus Analysis through the Integration of Large Language Models (LLMs)

Link other OSF projects

Files

Citation

Recent Activity

Start managing your projects on the OSF today.