Main content
Bot or Not: Semantic Comparisons of Speech by Humans vs Generative Large Language Models
Date created: | Last Updated:
: DOI | ARK
Creating DOI. Please wait...
Category: Project
Description: Generative large language models potentially could lead to the creation of an Artificial General Intelligence (AGI), potentially with greater cognitive abilities than any human could achieve. Some researchers say that such an AGI could upgrade itself to become autonomous and unaligned with human interests. This postulated event is variously called Superintelligence, The Singularity or Foom, among other terminologies. Other researchers argue that this is highly unlikely. Eliezer Yudcowsky has proposed that given the physical vulnerabilities of any AGI, if less drastic measures fail, nuclear weapons could kill it. We could contribute to this debate by identifying a technique that would seek a high area under the curve (AUC) result of a receiver operating characteristics (ROC) analysis so as to make binary classifications (bot or not) between human and and Gen AI semantic structures, up to and including a postulated superintelligence. Our hypothesis: this distinction could be achieved by asking an entity for predictions of readily scoreable events combined with rationales from each forecaster stating how each of their predictions were created. We hypothesize that triggering an output of text with a requirement that it address the entity’s choice of likelihood, stated in specific terms (i.e. a percentage) of a readily scoreable event might create a usable “Bot or Not” classifier. A data source that meets these criteria currently is in operation at the Q3 AI Forecasting Benchmark Tournament, scheduled from July 1, 2024 through June 30, 2025. We would compare these results with our results with thousands of human forecasters in the paper “What do forecasting rationales reveal about thinking patterns of top geopolitical forecasters?” We also have archived all the Amazon Prime Mechanical Turks forecasting data from IARPA’s Geopolitical Forecasting Competition II and could use this raw data for this research. Our experiments seek to determine conditions under which this sort of classifier could be largely effective.