Main content
Twitter Hazard Datasets
Date created: | Last Updated:
: DOI | ARK
Creating DOI. Please wait...
Category: Data
Description: These three datasets were built starting from data gathered by LINKS foundation in the period 2020--2021. The tweets gathered concern catastrophes and hazards such as wildfires, hurricanes, extreme weather conditions, covid-19, terrorism attacks and more. The data was originally gathered by retrieving tweets if they contained certain keywords: each label has a set of corresponding keywords that describe different shades of a hazard. This approach was used to gather huge numbers of relevant examples and create a dataset that could contain every different way to cite the given disasters. We can consider this dataset labeled in a distant supervised fashions, that is, automatically assigning the labels based on the keywords used for retrieving the items. This approach caused many false positives in the original data, hence the reason to build these little but more refined datasets.
Add important information, links, or images here to describe your project.