A Benchmark Experiment on How to Encode Categorical Features in Predictive Modeling

doi:10.17605/OSF.IO/6FSTX

Title	Authors

Home

**UPDATE (March 2022):** A paper based on the results reported in the master thesis with some additional analyses has been published in *Computational Statistics*. Please cite this paper from now on: **Pargent, F., Pfisterer, F., Thomas, J., & Bischl, B. Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. *Computational Statistics* (2022). https://doi.org/10.1007/s00180-022-01207-6** For questions and remarks feel free to contact Florian.Pargent@psy.lmu.de ---------- CONTENT: **upload_datasets/**: - scripts were used to upload some benchmark datasets to OpenML **analysis/high_cardinality_benchmark/**: - *main.R* builds a batchtools registry containing all computational jobs; sources most other scripts. - after jobs have been run on some compute cluster, *collect_results.R* extracts the results from the registry; saves the preprocessed results in *results.rds* **doc/**: - *high_card_final_datasets.Rmd* documents all datasets used in the benchmark along with some remarks on why they were included; outputs *high_card_final_datasets.html* as well as *analysis/high_cardinality_benchmark/oml_ids.rds* and *analysis/high_cardinality_benchmark/descr_dat.rds* which are used in the benchmark and the manuscript - *paper.Rmd* with *appendix.Rmd* is a reproducible script to build the paper submitted as master thesis in March 2019; outputs *paper.pdf* - *references.bib* contains all references - *sessionInfo_220319* is a text file documenting the package versions used to run the benchmark analysis on the Linux Cluster of the Leibniz Supercomputing Centre in Garching

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.

Create an Account Learn More Hide this message

Main content

Home

Menu

Start managing your projects on the OSF today.

Main content

Links to this project

Home

Menu

Add new wiki page

Page permissions have changed

Wiki page deleted

Connected to the collaborative wiki

Connecting to the collaborative wiki

Collaborative wiki is unavailable

Browser unsupported

Start managing your projects on the OSF today.