WordPPR: A Researcher-Driven Computational Keyword Selection Method for Text Data Retrieval from Digital Media

doi:10.17605/OSF.IO/PCYBZ

Title	Authors

Home

Despite the increasing use of digital media data in communication research, a central challenge persists: retrieving data with maximal accuracy and coverage. Our investigation of keyword-based data collection practices in communication research reveals a rudimentary one-step process. Cross-disciplinary reviews suggest an iterative query expansion guided by human knowledge and computer intelligence. Introducing the WordPPR method for keyword choice and retrieval from expansive digital media corpora, our approach entails four steps: 1) collecting an initial dataset using core/seed keyword(s); 2) constructing a word graph based on the dataset; 3) applying the Personalized PageRank (PPR) algorithm to rank words in proximity to the seed word(s) and subsequently selecting new keywords that optimize retrieval precision and recall; 4) repeating steps 1-3 to determine if additional data collection is needed. This method reduces the need for exhaustive corpus analysis and minimizes manual annotation, making it especially suited for large corpora. We validate WordPPR with specific topics on Twitter through simulations, contrasting it with alternative methods and demonstrating its effectiveness in targeted data mining. By advancing a more systematic approach to text data retrieval, this study contributes to improving digital media data retrieval practices in communication research and beyond.

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.

Create an Account Learn More Hide this message

Main content

Home

Menu

Start managing your projects on the OSF today.

Main content

Links to this project

Home

Menu

Add new wiki page

Page permissions have changed

Wiki page deleted

Connected to the collaborative wiki

Connecting to the collaborative wiki

Collaborative wiki is unavailable

Browser unsupported

Start managing your projects on the OSF today.