MaltLex: A database of visual lexical decision responses to 11,000 Maltese words

doi:None

Title	Authors

Home

*Introduction:* In a lexical decision “megastudy”, researchers collect responses to a wide range of words and non-words (e.g. differing in morphological complexity) in order to produce a massive database via which others can subsequently test novel hypotheses by analyzing a subset of the total dataset. Megastudies circumvent many of the shortcomings of traditional experiments and have grown in popularity in recent years (Keuleers and Balota 2015): visual lexical decision megastudies have been conducted on English (Balota et al. 2007), French (Ferrand et al. 2010), Malay (Yap et al. 2010), Dutch (Brysbaert et al. 2016), and Cantonese (Tse et al. 2017). To date, no megastudy has focused on a language which productively uses nonconcatenative morphology, which poses novel challenges for theories of word recognition (e.g. Frost et al. 1997). A Semitic language, Maltese not only uses nonconcatenative morphology, but its speakers have borrowed extensively from Indo-European languages such that roughly half the lexicon comprises Sicilian, Italian, and English loanwords which largely use concatenative morphology (Bovingdon and Dalli 2006), creating further challenges for theories of lexical processing. We report the creation of a database of Maltese visual lexical decision responses and demonstrate its use in replicating an analysis of the effect of etymology (Semitic vs. non-Semitic borrowings) on lexical decision speed. *Methods:* In total, we have collected approximately 237,000 lexical decision responses from 104 native or near-native speakers of Maltese (*M* Age = 24.0 years, range = 18−77 years; 53 participants identified as female, 51 as male; 87 identified as right-handed, 17 as left-handed) to 11,000 real Maltese words and 11,000 non-words. The real-word targets were randomly selected from Korpus Malti v3.0 (Gatt and Čéplö 2013), a 250-million-token corpus of written Maltese that we trimmed to remove non-Maltese texts and non-words (e.g. URLs), and supplemented by other written sources. The selected words were also checked against Ġabra, a Maltese lexical database containing 16,593 lemma-based entries (Camilleri 2013), and vetted by a native speaker. The final set of real-word targets consisted of 6,451 Semitic Maltese words, 4,439 non-Semitic words, and 110 words of uncertain etymology (Aquilina 1987-1990), and included both uninflected and inflected forms. Real-word targets ranged in frequency from 0−20,385.4 occurrences per million words in Korpus Malti (*M* = 36.1 occurrences per million words) and in length from 2−21 letters (*M* = 7.1 letters). For each real-word target, we constructed a non-word matched in length and in frequency-weighted neighborhood density (*M*Real = 92.9, *M* Nonce = 88.5 occurrences per million; Welch’s *t*-test: *t*(21,998) = −0.47, *n.s.*). A native speaker vetted all potential non-word targets. Participants completed 1−35 total sessions (*M* = 5.8 sessions), up to three sessions per day, during each of which they judged the lexicality of 200 visually-presented real words and 200 non-words to produce a total of 9−13 lexical decisions per item (*M* = 10.7 decisions). We excluded data from participants whose average RT exceeded 1,500 ms or accuracy rate fell below 80. *Analysis of lexical stratum:* In a visual masked priming lexical decision study, Geary and Ussishkin (2018) found that Maltese readers responded faster to Semitic-origin words than to non-Semitic borrowings, independent of frequency, neighborhood density, and word length. We replicate this with a larger dataset (10,890 versus 96 different words) by analyzing log RTs to real-word targets on trials where participants responded correctly, using the lme4 package (Bates et al. 2015) in R (R Core Team 2019) to fit an LMER model and assessing significance using the lmerTest package (Kuznetsova et al. 2016) to simulate Satterthwaite approximations for degrees of freedom. The model included lexical stratum (Semitic vs. Non-Semitic; reference: Non-Semitic), log frequency, log frequency-weighted neighborhood density, age, trial number, session number, and same-day session number as fixed effects; subjects and targets as random effects; and by-subjects random slopes for lexical stratum. The effect of lexical stratum was significant ( *t*(191.2) = −7.13, *p* < 0.001), with participants reliably faster to judge Semitic words (*M* = 847 ms) than non-Semitic words (*M* = 852 ms). While the effect size is considerably smaller (5 versus 30 ms), the replication of this effect motivates further analyses which will assess whether, for instance, the “lexical stratum” effect may in fact reflect targets’ overall morphological complexity.

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.

Create an Account Learn More Hide this message

Main content

Home

Menu

Start managing your projects on the OSF today.

Main content

Links to this project

Home

Menu

Add new wiki page

Page permissions have changed

Wiki page deleted

Connected to the collaborative wiki

Connecting to the collaborative wiki

Collaborative wiki is unavailable

Browser unsupported

Start managing your projects on the OSF today.