No Longer Lost in Translation: Evidence that Google Translate Works for Comparative Bag-of-Words Text Applications

Erik de Vries; Martijn Schoonvelde; Gijs Schumacher

doi:None

Title	Authors

No Longer Lost in Translation: Evidence that Google Translate Works for Comparative Bag-of-Words Text Applications

Contributors:

Date created: | Last Updated:

: DOI | ARK

Creating DOI. Please wait...

Create DOI

Category: Project

Description: Automated text analysis allows researchers to analyze large quantities of text. Yet, comparative researchers are presented with a big challenge: across countries people speak different languages. To address this issue, some analysts have suggested using Google Translate to convert all texts into English before starting the analysis (Lucas et al., 2015). But in doing so, do we get lost in translation? This paper evaluates the usefulness of machine translation for bag-of-words models – such as topic models. We use the europarl dataset and compare term-document matrices as well as topic model results from gold standard translated text and machine-translated text. We evaluate results at both the document and the corpus level. We first find term-document matrices for both text corpora to be highly similar, with significant but minor differences across languages. What is more, we find considerable overlap in the set of features generated from human-translated and machine-translated texts. With regards to LDA topic models, we find topical prevalence and topical content to be highly similar with only small differences across languages. We conclude that Google Translate is a useful tool for comparative researchers when using bag-of-words text models.

License: CC0 1.0 Universal

Projects
Registrations

Results: All Projects Results: My Projects Results: All Registrations Results: My Registrations

Has supplemental materials for No Longer Lost in Translation: Evidence that Google Translate Works for Comparative Bag-of-Words Text Applications on OSF Preprints

Files

Loading files...

Citation

Recent Activity

Loading logs...

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.

Create an Account Learn More Hide this message

Main content

No Longer Lost in Translation: Evidence that Google Translate Works for Comparative Bag-of-Words Text Applications

Files

Citation

Tags

Recent Activity

Start managing your projects on the OSF today.

Main content

Links to this project

No Longer Lost in Translation: Evidence that Google Translate Works for Comparative Bag-of-Words Text Applications

Link other OSF projects

Files

Citation

Tags

Recent Activity

Start managing your projects on the OSF today.