# About
This project contains a dataset of health-related online advertising scraped
from the web.
The dataset contains the content of 765 unique health-related ads,
labeled with the type of
product advertised, the health condition addressed, and deceptive advertising
techniques used.
For each ad, we collected the following data:
- Screenshot of the banner ad
- Screenshot of the landing page
- HTML content of the landing page
- Metadata, such as the landing page URL
- Qualitative labels (health condition addressed, type of product, deceptive techniques)
We collected this data using [adscraper](https://github.com/UWCSESecurityLab/adscraper), a web crawler for collecting online ad content.
You can read more about the methodology for collecting this data,
and the results of analysis in the following paper:
> _Measuring Risks to Users' Health Privacy Posed by Third-Party Web Tracking and Targeted Advertising._ Eric Zeng, Xiaoyuan Wu, Emily Ertmann, Lily Huang, Danielle Johnson, Anusha Mehendale, Brandon Tang, Karolina Zhukoff, Michael Adjei-Poku, Lujo Bauer, Ari Friedman, and Matthew McCoy. In CHI '25: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, April 2025. ACM. DOI:10.1145/3706598.3714318
This dataset is a subset of the data collected in the paper. To protect the
privacy of our participants, we do not include data such as
users' health conditions, browsing histories, or links from their browsing
profiles to the ads they saw. However, we hope that sharing a labeled dataset
of deceptive health-related ads will help others
understand the threats to consumers that these ads pose.