Authority Management of People Names (a working meeting) ======================================================== Biodiversity Next Preconference Workshop ---------------------------------------- Date: Monday October 21st 2019 Participants: - Quentin Groom (Meise Botanic Garden) - Elspeth Haston (Royal Botanic Garden Edinburgh) - Anne Thessen (Oregon State University, USA) - Anton Güntsch (Free University of Berlin, Germany) - Brenda Daly (South African National Biodiversity Institute) - Chloé Besombes (National Museum of Natural History, Paris, France) - Christian Bräuchler (Naturhistorisches Museum Wien) - David Shorthouse (AAFC, Ottawa, Canada) - Dominik Röpert (Botanic Garden and Botanical Museum Berlin) - Frederik Berger (Museum of Natural Science, Berlin, Germany) - Heather Lindon (Royal Botanic Gardens, Kew) - Iris Sampaio (University of the Azores / Senckenberg am Meer) - Jiri Frank (National Museum, Prague, Czech Republic) - Jonathan Krieger (Royal Botanic Gardens, Kew, UK) - Laurence Livermore (Natural History Museum, London) - Matt Woodburn (Natural History Museum, London) - Nicky Nicolson (Royal Botanic Gardens, Kew, UK) - Nicole Kearney (Biodiversity Heritage Library, Australia) - Paul Braun (National Museum of Natural History, Luxembourg) - Rafaël Govaerts (Royal Botanic Gardens, Kew, UK) - Robert Cubey (Royal Botanic Garden Edinburgh, UK) - Ron Canepa (iDigBio, USA) - Rod Page (Glasgow University) - Sarah Phillips (Royal Botanic Gardens, Kew, UK) - Simon Chagnoux (National Museum of Natural History, Paris, France) - Sharif Islam (Naturalis, Netherland) Aims: Major Challenges identified: - Multiple identifiers - Wikidata, one ID to rule them all? - Multiple individuals in record - Order of names - Disambiguation - Transliteration - The needs and motivations of the authors/collectors/institutions - Ownership of your ID (for the living) & updating metadata such as affiliations, publications - Identifiers for living collectors who do not want ORCID or wikidata - Volume of unknown people, converting strings to things - Mobilising existing datasets and resources Activities: Break up into smaller task groups. The list below are current suggestions. **1. Analysis & visualisations (Tech Group)** - Creating visualisations of data from Wikidata to illustrate the coverage of biological collectors and the links to other identifiers and different data types. - Number of specimens per linked person - Number of identifiers per person - Number of people without identifiers - With identifier, but with biographical details - Totally anonymous people - Demography of linked versus unlinked people - Creating visualisation that show the value of connecting collections, collectors and authors with identifiers - Reveal a collector's travels - Uncover missteps in digitization of specimens when cross-referenced against external information about people, produce recommendations for data quality filters - **2. Engagement group** - Draft an position/opinion piece on why all taxonomists should have an ORCID ID. - How do we encourage uptake of ORCID IDs among biological collectors and taxonomists? - How do we encourage the linking and accumulation of biographical details of biological collectors? - **3. Darwin Core and TDWG group** - Writing a charter for a TDWG Task Group on Person identifiers under the Attribution Interest Group. Defining its rationale and aims. - Best practices for storing people names - Collector teams - Where should identifiers be stored? - Where should bibliographic details be stored? - [extension to Darwin Core Archive][1], produce definitions for actions (eg collected, identified, georeferenced, etc.), decide what to do & how to reconcile relationship with other extensions where people names are recorded (eg Darwin Core Identification History) **4. Paper writing group** - Developing the introduction to the draft paper titled "Identifiers for people working on biodiversity". Reviewing what has already been published and what approaches other disciplines use to identify people uniquely. - develop thesis: challenges, solutions, next steps - identify cultural biases - reconcile with GDPR **5. Disambiguation group** - Drafting a best practise for disambiguation of people - How can disambiguation of people be improved? Can disambiguation be automated? Are there suitable algorithms? Are there statistical methods that can indicate the likelihood if a match? **6. [Datasets][3] for Challenges group** Expanding existing tests and pilots (look at options for including some of this work within SYNTHESYS+) - BGBM Model: Participating institutes could select their most common collectors and add identifiers - MNHN Model: Participating institutes could try the protocol within their own collections for top collectors - Develop communications plan to approach proprietors of relevant datasets that could be made openly available and mobilized Prior to workshop: Increase implementation of Wikidata identifiers in RDF within Stable URI implementations. ### Reading Materials ### Groom, Q.J., C. O’Reilly, and T. Humphrey. 2014. Herbarium specimens reveal the exchange network of British and Irish botanists, 1856–1932. New Journal of Botany 4: 95–103. Penn, M.G., S. Cafferty, and M. Carine. 2017. Mapping the history of botanical collectors: spatial patterns, diversity, and uniqueness through time. Systematics and Biodiversity 16: 1–13. ---------- ## Timetable ## 09:00-09:30 Introductions - Analysis & visualisations (Nicole, Íris, Rod, Ron, David, Dominik) - Disambiguation guidelines (Quentin, Paul, Chloé, Anton, Rob) - Writing the introduction to the paper (Elspeth, Sarah, Simon, Anton, Anne, Jiri) 10:30-11:00 Coffee Break 11:00-11:10 Regroup/Report - Analysis & visualisations (Rod, Ron, David, Dominik ) - Disambiguation guidelines (Quentin, Paul, Chloé, Rob) - Writing the introduction to the paper (Elspeth, Íris, Sarah, Simon, Anton, Anne, Jiri) 12:30-13:30 Lunch 13:30-13:40 Regroup/Report - Analysis & visualisations (Ron, David, Dominik) - TDWG Task Group Proposal (Quentin, Íris, Chloé, Paul..) - Engagement (Nicole, Anne, Jiri) - Datasets Group (Elspeth, Sarah, Simon, Anton, Rob) 15:00-15:30 Coffee Break 15:30-15:40 Regroup/Report - Analysis & visualisations (Ron, David, Dominik) - TDWG Task Group Proposal (Quentin, Íris, Chloé, Paul) - Engagement (Nicole, Anne, Jiri) - Datasets Group (Elspeth, Sarah, Simon, Anton, Rob) 16:30-17:00 Wrap-up - What are the next set of challenges? 