Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
Hear about our process to greatly increase the likelihood of making the first match the “best” match for most string matches. When we were automatically reconciling lists of strings representing entities from bibliographic metadata against a range of target vocabularies for a project, we found that we could use the representation of those target vocabularies in a separately managed large data aggregation. This provided an additional weighting to apply to the standard Levenshtein distance calculations, and thus much higher likelihood of first, best matches. We’ll describe the steps in the project, success metrics, and reflections on other data reconciliation projects that can benefit from this approach.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.