Corpus AI: Integrating Large Language Models (LLMs) into a Corpus Analysis Toolkit

Laurence Anthony

doi:None

Corpus AI: Integrating Large Language Models (LLMs) into a Corpus Analysis Toolkit

Contributors:

Laurence Anthony

Date created: | Last Updated:

: DOI | ARK

Creating DOI. Please wait...

Create DOI

Category: Project

Description: Large Language Models (LLMs) have the potential to play a pivotal role in corpus linguistics research, providing a deep and nuanced view of language use in a variety of domains and registers. Current implementations of LLMs, such as ChatGPT, are built using massive amounts of language data and can be easily accessed through web interfaces and application programming interfaces (APIs). However, when these models are prompted for insights on language, it can be difficult to evaluate what specific data informs the responses or even how accurate the responses are due to 'hallucinations' (i.e., errors in the model output). Here, we show how LLMs can be integrated into a traditional corpus analysis toolkit, allowing them to be linked directly to target corpora. When used in this way, LLMs can be prompted to query individual corpus files or the entire corpus. They can also be prompted for insights on the results of traditional corpus tools, such as KWIC concordancers, offering a completely new view of the data. The risk of 'hallucinations' is still present, but this phenomenon can be greatly reduced through prompt engineering, and the insights offered by the LLM can be checked using direct links back to the target corpus.

Projects
Registrations

Results: All Projects Results: My Projects Results: All Registrations Results: My Registrations

Files

Files can now be accessed and managed under the Files tab.

Citation

Recent Activity

Loading logs...

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.

Create an Account Learn More Hide this message

Main content

Corpus AI: Integrating Large Language Models (LLMs) into a Corpus Analysis Toolkit

Files

Citation

Tags

Recent Activity

Start managing your projects on the OSF today.

Main content

Links to this project

Corpus AI: Integrating Large Language Models (LLMs) into a Corpus Analysis Toolkit

Link other OSF projects

Files

Citation

Tags

Recent Activity

Start managing your projects on the OSF today.