Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
Title: Examining corpus prototypicality and keyness beyond the lexical level: Experiments with ProtAnt. Authors: Nicholas Smith (University of Leicester), Laurence Anthony (Waseda University), Sebastian Hoffmann (University of Trier) and Paul Rayson (Lancaster University). Keywords: text prototypicality, keywords, parts of speech, semantic domains, speech acts Abstract: For linguists working with corpora, a common difficulty after quantitative analysis is deciding which texts to select for follow-up, close analysis, without arousing suspicion of ‘cherry-picking’. The ProtAnt tool (Anthony & Baker 2015) provides a major boost in this respect. Building on a now well-established tradition of corpus keywords analysis (since Scott 1997), and the association between prototypes and frequency of instantiation (e.g., Rosch 1975, Gries 2003), ProtAnt ranks the texts in a target corpus from most to least prototypical according to the number of keywords they contain. ProtAnt’s capabilities have been increasingly exploited in text/discourse analysis (e.g., Levon 2016, Bednarek and Caple 2017, Price 2022), but to date, all such studies have been confined to traditional lexical-based keywords, rather than keywords generated at other linguistic levels such as parts of speech (POS), semantic domains, and speech acts. The current paper seeks to address this research gap, posing the question: How successfully can ProtAnt identify prototypical and outlier texts in corpora at various non-lexical linguistic levels? We address this question through a series of experiments. Results show that ProtAnt is able to use key POS-tags to identify stylistically prototypical texts in registers of the American AmE06 corpus and also flag outlier texts that have been artificially included from another register. Using semantic tags, outliers are identified with still higher success. Other results show that speech act tags (in SPICE-Ireland) yield more mixed results. On the whole, non-lexical key items are able to complement those at the lexical level in profiling texts, with success seemingly affected by the granularity of the tags, accuracy of the linguistic annotations, and degree of specialization of the register. We discuss the theoretical and practical implications of our work in areas such as grammar, stylistics, discourse analysis, and data-driven learning (DDL). References Anthony, L. & Baker, P. (2015). ProtAnt: A tool for analysing the prototypicality of texts International Journal of Corpus Linguistics 20(3): 273-292. Bednarek, M. & Caple, H. (2017). The Discourse of News Values: How News Organizations Create Newsworthiness. Oxford: Oxford University Press. Gries, S. (2003). Towards a corpus-based identification of prototypical instances of constructions. Annual Review of Cognitive Linguistics, 1:1-27. DOI: 10.1075/arcl.1.02gri Levon, E. (2016). Qualitative analysis of stance. In Baker. P. & Egbert, J. (eds.) Triangulating Methodological Approaches in Corpus Linguistic Research. London/New York: Routledge. Price, H. (2022) The Language of Mental Illness: Corpus Linguistics and the Construction of mental illness in the Press. Cambridge: Cambridge University Press. Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology: General, 104(3): 192–233. DOI: 10.1037/0096-3445.104.3.192 Scott, M. (1997). PC analysis of key words - and key key words. System 25(2): 233-245. APA Citation: Smith, N., Anthony, L., Hoffmann, S., & Rayson, P. (2023, July 3). Examining corpus prototypicality and keyness beyond the lexical level: Experiments with ProtAnt [Conference presentation]. CL 2023, Lancaster, UK. https://osf.io/qksev.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.