Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
Here you'll find the metadata for the collection of 500 CDs covering jazz history from its beginnings in 1920s to the end of 1950s. 'JECompleteIndex_original.csv' is the original file supplied with the CD box. 'JECompleteIndex_cleaned.csv' is the product of cleaning the original file (which it seems was at least partly created through OCR), completing truncated lines, etc. 'JE_PyRDF.ttl' is the RDF repository in the turtle format. It adheres to the semantic model presented in our project. Some random notes: - musicians have been matched by name. This is suboptimal, since different musicians can have the same name, e.g. Bill Evans the pianist and Bill Evans the saxophone player. It was decided in the project that, since we didn't have a better method for the entity resolution at this stage, to assume equivalence by name. - leaders have been inferred from bandnames: each musician whose name is in the bandname is a leader (sometimes there are two or even more). The bands without musician names in the titles have no leader - bands are matched by name - sessions: tracks from one CD with the same date and place are combined to a session. No matching across CDs - instruments are also matched by name. We do not have an instrument thesaurus. - place is only recorded as a string, no geomatching was performed - dates were originally structured (but not too consistent) text strings referring to dates or datespans, often approximate. We developed a parser, a format and an rdf class for approximate datespans. - track titles on the CDs sometimes have additional info in brackets. We tried to extract the semantic meaning of the text in brackets as much as we could. Some tunes therefore can have more than one title. - medleys have been properly handled - musicians have been matched with names in LinkedJazz repository, and sameAs links to Linked Open Data were created. Approximately 10% of our musicians are covered in LinkedJazz. Mostly links are to DBPedia, but some are to MusicBrainz or Library of Congres authority files. - We added relationships between musicians based on inference from the band lineup information: - - lj:bandmember - - lj:bandLeaderOf - - rel:knowsOf - - rel:hasMet - - mo:collaborated_with - - lj:inBandTogether - - lj:playedTogether Jazz Encyclopedia collection ---------------------------- The collection is called "The Encyclopedia of Jazz: The World's Greatest Jazz Collection". It was released by Membran, a music label group with main quarters in Germany which specialises on large CD sets. The collection is enormous and is probably among the largest ever commercially released compilations. It consists of five parts: classic jazz, swing time, big bands, bebop and modern jazz, each comprised of 100 compact disks. Altogether the dataset consists of 9065 tracks, recordings of 6255 distinct tunes performed by 898 bands. A CD usually presents one band, or in some rare cases several bands from the same time/area/style. Some bands have more than one CD dedicated to them. In terms of content the collection seems to be broadly representative of the US American jazz of the period, no major performers are omitted, though it certainly does not cover the entirety of jazz. There is one notable exception: the collection seems to focus on instrumental music. Singers only appear intermittently and none of the seminal singers such as Ella Fitzgerald, Sarah Voughan or Bessie Smith are present. The reason for that might be that Membran had published a large compilation of jazz singers previously. Each part is accompanied by a printed booklet. Each CD has a dedicated 2-page opening, which contains the track list, the band's lineup and sometimes brief liner notes. Additionally, a csv file with metadata is attached. It lists CDs with their order number and title, and for each track it contains the following information: title(s), composer(s), band name, date string, area string, list of musicians with the corresponding instruments. The file is not in machine-readable form, though the creators obviously put a considerable effort into keeping it structured.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.