# List of Files:
* **gather_features.py**: Derive a set of features for each name based on number of occurrences in the Leipzig Wortschatz, base form of the name without diacritics (derived using the python unidecode package version).
* **filter_diacritics.py**: Filtering of names based on base form without diacritics.
* **filter_soundcodes.py**: Filtering of names based on sound and spelling. For spelling the jaro-winkler similarity from the python package jellyfish is used.
* **sort_names.py**: Sorting of the names based on number of occurrences.
# Package Versions Used During Filtering
All package versions are collected using the "pip show" command.
## Package: unidecode
Metadata-Version: 2.0
Name: Unidecode
Version: 0.4.20
Summary: ASCII transliterations of Unicode text
Home-page: UNKNOWN
Author: Tomaz Solc
Author-email: tomaz.solc@tablix.org
License: GPL
Location: /usr/lib64/python3.4/site-packages
Requires:
## Package: jellyfish
Metadata-Version: 1.1
Name: jellyfish
Version: 0.5.6
Summary: a library for doing approximate and phonetic matching of strings.
Home-page: http://github.com/jamesturk/jellyfish
Author: UNKNOWN
Author-email: UNKNOWN
License: UNKNOWN
Location: /usr/lib64/python3.4/site-packages
Requires: