Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
NewsRu is a home-grown corpus of Russian-language internet news and it consists of 2,686,518 records stored in four separate JSON Lines files (one file per each source – fontanka.ru, interfax.ru, lenta.ru, vesti.ru). NewsRu can be described as rich-metadata-resource because in addition to textual data it also contains news URLs, headlines, categories, tags, authors, date and time of publication. The data covers, in various proportions, the period of approximately 19 years (since 2000-06-14 until 2019-12-31). Corpus files can be easily processed in Python or using command-line tools, such as https://stedolan.github.io/jq/.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.