This is the repository of the data used on the PhD dissertation *Color of corruption. Visual evidence of agenda-setting in a complex mass media ecosystem* by Pablo Rey-Mazón submitted on December 2022 and defended on May 8th, 2023, in Barcelona, Spain (http://hdl.handle.net/10803/688629).
The code repositories to produce and analyze these datasets are available in website of the project (https://numeroteca.org/phd/) where all the extra material are published. More explanations can be found on the dissertation manuscript.
## Data
The data are stored in two folders:
- A. Newspaper front page analysis on corruption stories (2009-2019)
- B. Cifuentes's master scandal (2018-03-21-2018-04-30)
### A. Newspaper front page surface area (2009-2019)
The *Color Corrupción* database (https://code.montera34.com/numeroteca/colorcorrupcion) analyzes the surface area dedicated to corruption news stories on Spanish newspaper front pages from 2009 to 2019, coded by institutions accused of corruption, scandal, and framing (attack-defensive). The position and percentage of each front page story dedicated to corruption is also gathered. The database is built with the aid of Pageonex software.
- Units: front pages (jpeg format).
- Units of analysis: surface area of news stories in front page.
- Meta data of each area: percentage of front page, position of area, classified by institution related to corruption, corruption scandal, framing attack-defense.
- Media source: Kiosko.net. An existing database of newspaper front pages from 2009-03 until today.
- Availability of data: from March 2009 to December 2019.
- News media outlets: printed newspapers: *El País*, *El Mundo*, *ABC*, *La Razón*, *La Vanguardia* and *El Periódico*. *Ara* has been coded as well, since its creation in 2011.
- Content analysis: Pageonex software. Online version available at https://pageonex.com. Threads available at https://pageonex.com/hreads/search_by_category?q=colorcorrupci%C3%B3n and https://cloud.montera34.org/index.php/s/6nsNTPagkfXyms5.
Code:
- Analysis software: https://github.com/montera34/pageonex
- Data processing and visualization: https://code.montera34.com/numeroteca/pageonexR
- Data and data processing: https://code.montera34.com/numeroteca/colorcorrupcion
### B. Cifuentes's master scandal (March-April 2018)
- Surface area dedicated on front pages of printed newspapers to the scandal.
- News sites home page headlines: two files: all the headlines and the headlines related to the scandal.
- TV newscasts subtitles related to the scandal.
- Twitter messages about the scandal.
#### Newspaper front pages
- News media outlets: printed newspapers: *El País*, *El Mundo*, *ABC*, *La Razón*, *La Vanguardia*, *El Periódico*, *Ara*, *El Correo* and *La Voz de Galicia*.
#### News sites headlines
This dataset has been built with the software Homepagex: first, it downloads the html of each home page every hour, then the html is parsed and the headlines are extracted with its URL.
Homepagex software code: https://code.montera34.com/numeroteca/homepagex
#### TV newscasts subtitles
This dataset comes from Verba (https://verba.civio.es/) that compiles and serves a database of the public newscast.
#### Twitter
It is not possible, due to legal restriction by Twitter policies, to publish the content of the tweets. It is planned to publish here the ID of the tweets so they can be downloaded (a process called "dehydration").
## Abstract
[Extracted from the abstract of the dissertation].
The object of this research is the empirical analysis of how news and social networks contribute to shaping public opinion in the current digital era by studying how strongly mass media have an impact and how long it takes for them to have their maximum effect. Furthermore, the interaction among different media channels and pieces of the media ecosystem is considered, with a special dedication to the information flows between news and social media.
We use corruption in Spain in the last two decades as the subject of our analysis. As an unobtrusive issue, corruption is a good choice for agenda-setting analysis since the information that arrives to the readers is heavily or exclusively mediated. Corruption as a public problem is used in two long-term analyses –11 years-long studies– to study the correlation between news coverage on newspaper front pages and public opinion surveys. An in-depth analysis of how a corruption scandal unfolds –six weeks long– offers accurate metrics of the intermedia agenda-setting processes within the media ecosystem (front pages, home pages of online news sites, television newscasts, Twitter). The results suggest great convergence and synchronicity between all mass media channels. Google Search data, a proxy for people’s attention in the short term, show strong correlation results with the other mass media channels.
## License
This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. http://creativecommons.org/licenses/by-sa/4.0/