NewsRu is a home-grown corpus of Russian-language internet news and it consists of 2,686,518 records stored in four separate JSON Lines files (one file per each source – fontanka.ru, interfax.ru, lenta.ru, vesti.ru). NewsRu can be described as rich-metadata-resource because in addition to textual data it also contains news URLs, headlines, categories, tags, authors, date and time of publication. The data covers, in various proportions, the period of approximately 19 years (since 2000-06-14 until 2019-12-31).
Corpus files can be easily processed in Python or using command-line tools, such as https://stedolan.github.io/jq/.