<p><strong>About</strong></p> <p><em>The Worldwide Megadataset</em> is a huge dataset containing variables measuring aspects of countries or persons by their countries of origin. It is freely available to everyone.</p> <p>The dataset was made with code written by Emil OW Kirkegaard that can merge two existing datasets. The motivation for this was that merging datasets manually was a lot of work due to them: 1) not having the exact same countries in them, 2) using different names for the same countries, and thus 3) the countries were not even in the same order.</p> <p>The dataset is maintained by Emil OW Kirkegaard and John Fuerst.</p> <p><strong>Variables</strong></p> <p>There is <a href="https://docs.google.com/document/d/147sMlw2lOYu_8cXRKgVFFNAPjYo0qWedciEbdowmbAU/edit#" rel="nofollow">a Google Drive file</a> with information about variables that appear in the dataset. As of version 1.6b, there are 296 variables concerning 269 countries or regions. The old website for the dataset is <a href="http://www.emilkirkegaard.dk/megadataset" rel="nofollow">here</a>.</p> <p><strong>Source code</strong></p> <p>The database merger is written in R, ported from an earlier version in Python. The country name-to-ISO-3 translator is still in Python but will get translated "soon". The source code is in the repository.</p>
