Corpus linguistics offers scholars the possibility to elaborate and compute
on a large scale. Data-driven approaches allow bottom-up analyses of
semiotic, societal and cultural phenomena (). The investigation of
in language is one of the most recent application of computational and
(cf. ; ; ). In light of this recent body of work, this study aims
to explore the
pragmatic, semiotic and cultural implications of Italian media
and representation of gender violence.
The topic proposed here is of particular interest in our contemporary
society. In fact,
according to the European Institute for Gender Equality (EIGE), no
statistics about "gender violence" were available until 2015, due to lack
of reliable data.
More accurately gathered information on gender-based violence can further
awareness and counteract this phenomenon. As a matter of fact, corpora may
represent precious resources. However, to the best of our knowledge,
studies on the
representation of women and gender violence in corpora of media are still
In an attempt to fill this gap, here we are focusing on the representation
in Italian media. Our purpose is to untangle the complexity of linguistic
representations and communicative intentions regarding gender-induced
adopting an integrated methodology: a data-driven approach, consisting of
quantitative and qualitative investigations, and a “cross-stylistic”/
approach. For such purpose, we built a multi-media and multi-modal corpus,
of two sub-corpora. Specifically:
- a 300,000 words corpus consisting in a nine-months collection of crime
articles (WItNECS- Women in Italian News Crime sections). To guarantee the
representativeness and the balancedness of the data collected, we surveyed 4
national newspapers: Corriere della Sera, La Stampa, Il Fatto Quotidiano
Repubblica (national + regional editions of Milan, Florence, Naples, and
- A video-database of the Italian Amore Criminale (AC) docu-fiction series
2015/2016), built with a layered/tiered methodology.
On these data, we performed different types of analyses to explore and
journalistic and television expressive and narrative strategies:
1. Computational analyses:
a) After having pre-processed the raw text in WItNECS, a structural topic
analysis was performed (STM, ). STM represents an innovative method
exploiting complex algorithms and Bayesian statistics to automatically
thematic information from texts; although this approach is fairly common in
sciences, it has only recently been employed in Linguistics, and rarely to
b) Using the toolchain of the Open Polarity Enhanced Name Entity Recognition
project (OpeNER, ), we detected named entities () in WItNECS.
2. Lexical analyses:
a) For AC television language, we performed a video-to-text alignment on
with speech and gesture tagging following the coding norms proposed in .
Accordingly, we identified: Iconic gestures (modeling the shape of an
object or the
motion of an action); Metaphoric gesture (conveying abstract/symbolic
Deictic gestures (such as pointing objects in conversational space);
gestures (with standard properties and language-like features). The same
was applied for tagging speech acts.
b) For WItNECS, corpus-based frequentist analyses were performed on
() (i.e. N-grams, collocations, keywords-in-context, association
3. Pragmatic analysis:
Along with lexical-based analyses, an investigation of metaphorical
language in WItNECS
and AC was performed, based on Conceptual Metaphor Theory (CMT, ), using
MetaNet’s online repository () as resource. In fact, recent corpus-based
shown the importance of metaphors in highlighting implicatures and
in societal and cultural discourse (cf ; ).
Some interesting patterns emerge from these multi-level investigations.
- Text mining and STM provided a corpus-level representation of the major
– and indirectly the journalistic communicative intentions and narrative
– regarding gender violence news. It seems that among the ten topics
those delineating the crime are estimated to represent high proportions in
Here we are presenting examples from four topics that together contribute
60% of the “message” conveyed in WItNECS: Topic A: maltrattamenti, minacce,
pugni, botte, calci, percosse; Topic B: stazione, treno, turista,
telecamere; Topic C:
donna, carabinieri, omicidio, delitto, uccisa, trovata; Topic D: ergastolo,
assise, aula. Other topics that emerged regarded sexism, social media, and
- The automatically classified named entities perfectly reflect the
– detailed deictic information and scarce descriptions – and the
and technicality of gender violence phenomena (e.g.: carabinieri
[Org.government_agency], omicidio [Event.crime], moglie
Facebook [Org.business]). Interestingly, besides the typical entities
crime news (i.e. people and organizations involved, types of weapons,
media entities emerge as well, presumably suggesting the intertwining of
violence and new technologies.
- The coding procedure through ELAN revealed that, out of 103 observations,
of communicative acts (both speech and gestures) were tagged as metaphoric:
within AC, metaphors were purposely used to describe both women’s and men’s
socio-psychological behaviour. On the other hand, iconic acts (26%) were
to describe and/or mime motions of violence.
- Differences in communicative strategies and conceptual representation
our study: for example, while newspapers mostly report physical violence,
investigates psychological violence; television offers a more complex
the episodes of violence narrated, which nevertheless often results in
spectacularizing the crime. However, some striking similarities transpire
cross-modal and corpus-based frequentist analyses. For instance, both
and AC tend to portray women stereotypically, as “in-relation-to” a man
mothers) and not as individuals. Moreover, the wide range of metaphors
was further subdivided into conventional (i.e. present on MetaNet) and
absent from MetaNet). Several interesting observations can be made on
networks in the corpus. Notably, both media use (almost) the same types of
metaphors. In WItNECS, metaphors are conveyed especially by verbs (),
in AC they are expressed by speech and gestures, and they are pivots for
representing the experience of abuse. The rich presence of metaphorical
affects the reader/viewer and solicits a heightened emotional response
In conclusion, our study employs up-to-date empirical approaches that
bottom-up, quantitative, and qualitative analyses to reveal pragmatic,
and psychological aspects of the representation of gender violence in
 Petruck, M. R. L. (Ed.) (2018). MetaNet. Amsterdam/Philadelphia: John
 Baker, P. (2014). Using Corpora to Analyze Gender. London: Bloomsbury.
 Fragaki, G. & Goutsos, D. (2015). Women and Men Talking About Men and
in Greek. In J. Romero-Trillo (ed), Yearbook of Corpus Linguistics and
Current approaches to Discourse and Translation Studies, New York: Springer.
 Busso, L. & Vignozzi, G. (2017). Gender Stereotypes in Film Language: a
Analysis. In R. Basili, M. Nissim, G. Satta (eds), Proceedings of the Fourth
Italian Conference on Computational Linguistics (CLiC-it 2017), Rome,
11-13, 2017. CEUR Workshop Proceedings 2006, CEUR-WS.org 2017.
 Abis, S. & Orrù, P. (2015). Il femminicidio nella stampa italiana:
linguistica. gender/sexuality/italy, 3, 18-33.
 Roberts, M.E., Stewart, B.M., Tingley, D., Airoldi, E. M. (2013). The
Model and Applied Social Science. Advances in Neural Information Processing
Workshop on Topic Models: Computation, Application, and Evaluation. 2013.
 Brookes, G., & McEnery, T. (2019). The utility of topic modelling for
A critical evaluation. Discourse Studies, 21(1), 3–
 Agerri, R., Cuadros, M., Gaines, S., & Rigau, G. (2013). OpeNER: Open
Enhanced Named Entity Recognition. Procesamiento del Lenguaje Natural, 51,
 Yadav, V. & Bethard, S. (2018). A Survey on Recent Advances in Named
Recognition from Deep Learning models. Proceedings of the 27th International
Conference on Computational Linguistics, Santa Fe, New Mexico, August
 ELAN (Version 4.9.4) [Computer software] (19 May 2016). Nijmegen: Max
Institute for Psycholinguistics. Retrieved from
 Kong, A. P., Law, S., Kwan, C. C., Lai, C. & Lam, V. (2015). A Coding
Independent Annotations of Gesture Forms and Functions during Verbal
Development of a Database of Speech and GEsture (DoSaGE). Journal of
Behaviour, 39(1), 93-111.
 Kilgarriff et al. (2014). The Sketch Engine: ten years on. In
Lexicography, volume 1,
issue 1, pp. 7-36, New York: Springer. Available online at:
 Lakoff G. (2014). Mapping the brain's metaphor circuitry: metaphorical
everyday reason. Frontiers in human neuroscience, 8, 958.
 Dodge, E. (2016). A deep semantic corpus-based approach to metaphor
case study of metaphoric conceptualizations of poverty. Constructions and
 David, O. & Lakoff, G. & Stickles, E. (2016). Cascades in metaphor and
case study of metaphors in the gun debate. Constructions and Frames, 8(2),
 Citron, F. & Goldberg, A.E. Metaphorical sentences are more
than their literal counterparts. Journal of Cognitive Neuroscience 26, 11,