Main content
Moving to continuous classifications of bilingualism through machine learning trained on language production
Date created: | Last Updated:
: DOI | ARK
Creating DOI. Please wait...
Category: Project
Description: Recent conceptualisations of bilingualism are moving away from strict categorisations, towards continuous approaches. This study supports this trend by combining empirical psycholinguistics data with machine learning classification modelling. We trained support vector classifiers on two datasets of linguistic productions of Italian speakers, coded for the type of elicited description they uttered, to predict the class they belonged to (i.e., “monolingual”, “attriters”, and “heritage”). All classes can be predicted above chance (greater than 33%), even if the classifier’s performance substantially varies with monolinguals identified much better (f-score greater than 70%) than attriters (f-score greater than 50%), which are instead the most confusable class. Further analyses of the classification errors expressed in the confusion matrices qualify that attriters are identified as heritage speakers nearly as often as they could be correctly classified, therefore suggesting this class sits in the middle of the monolingual-heritage continuum. When examining the importance of the type of production for the classification performance, we found that cluster clitics are the most identifying features for discrimination. Overall, this study supports a conceptualisation of bilingualism as a continuum of linguistic behaviours rather than sets of a-priori established classes.