Debriefing, Discussion, and Next Steps

doi:None

Title	Authors

Files | Discussion Wiki | Discussion | Discussion

Home

# Overview from Day 1: Discussion of outcomes from the data into OSF group: - Data Rescue in a Box - Content Aggregation - For more see [https://osf.io/rvyub/][1] and Google doc with write-up ## Discussion and feedback: #### Need to be collecting methods and tools and not building more tools - Could be space for a new group, co-locating and emphasising what different groups are doing and where they have been successful - Harness what people have done in the past. - Complexity of checklist of some working groups may be difficult to re-use/work with - but taking a small/simplified mechanism from others - Want to have formal channels for feedback - Important to note about multi-pronged efforts: some scale and some don't #### EDGI success - level of awareness - Struggled with what is going to be done and by whom: critical eye to "scale and governance of process" - Aware of our communities: What is the purpose of some of our communities? #### Warning about Automation: - Automation can lead to missing metadata - We need concerned individuals to help with this - Make sure that the automation approach actually facilitates metadata creation - Raw data - Taxonomy of the data and the metadata - Need tools for citizens/students - Intuition/affiliation needs to be included... also missing in automation - Augmenting record metadata is a larger challenge - Refining of records, we need something to make them meaningful - Not only related to data rescue - Humanitarian Open Street Map - Ushahidi #### Documenting process/workflow to share - more important than any specific technology - Need formal channels of feedback of what works and what doesn't - Seriously consider **scale** - Transparent governance of process - Maximize available resources - Maximize people skills: - Habitat for Humanity - "real time adjustment - makes everybody feel like they are contributing" - What can we learn from this kind of model? - Acknowledge that not everybody has the specific skills, but that there is something for everyone to work on to contribute: - "Even if you didn't rescue data directly, you still contributed to our success" - EDGI has been doing this, has different categories of tasks for different abilities (and may have resources related to assigning people effectively) - EDGI Hackathon events: everyone given something constructive to do - Capacity building - people will gain skills through practice-based learning - A series of questions to ask to help to identify volunteers can help - Be prepared to say “I know you said you have this skillset, but we really need your help over here” (*positive re-direction*) #### Needs in this space: - Metadata: experienced people and more contributions - Education resources for how to use the "rescued data" in its "new, secondary location" - Directory of data rescue contact persons in libraries? - Make sure they have resources and have been reached out to first - Putting together a directory of librarians is feasible - Funding: - Who is going to pay for this... - How do you show the geographically broad spectrum of patrons that would available -- who could call? #### Considerations - Conversation that looks at multiple roles and expertise - it takes a village: - Federal ecosystem - careers have been spent working with this data - whole other set of people who are very invested (thinking across communities) - Keep focus on service the community: "What can we do to support this work?" and "What can libraries do to support Data Rescue efforts?" - See example of Spark - Dropping your ideas about your "role" to serve the sense of urgency - Connections with agencies whose data we want to protect: - May need to get away from the term "rescue" - "Data Justice" as alternative - Who at the agency where you are rescuing data from should be talking to.. Step 1 talk to the people - Must be very careful to protect people who work with people. Anonymity considerations. #### Collection Development - Accurate reframing of this work in libraries - This is a collection development effort - libraries have a fair amount of capacity to deal with collection development - ways to frame by speaking to a library community - Should become a persistent annual effort - There are collection funds that could start to shift in this direction - Terminology: still need to maintain the urgency. - This is a way that collectors and liaisons to pay attention to these items. - Remember the "data justice" elements of this work - Fatigue around "Data rescue" - This reframing could support the long game direction of this work -> data repository - How do we keep that sense that action is needed? - Library Roles: - Collections, - Infrastructure, - Funding - Not necessarily (and probably shouldn't be) ownership - Guardianship? - Cooperative collection development - Concept used by Architecture group #### Suggested next steps to consider: - EDGI shifting more into "Data Justice" - there are plenty of places for libraries to get involved in this moving forward. - Article for D-Lib on Justice Issues around data access (data justice) - Important to frame it as a justice issue - White paper, Tangible set of recommendations: - Standard resources and delineation of roles - Data rescue in a box - Delineation of roles - What libraries can do instead? - E.g. what if every library was part of an ipfs network - There are too many degrees of freedom, we need to add some constraints - ARL Libraries as a possible audience for paper. Perhaps also research communities? - Ambassador Program? - Mechanism for ambassadors to connect - JHU could take a leadership role in organizing this - Need funding for this - Grants can both slow and speed the process - Concerns expressed that it is easier to find funding for "flashy" work #### Multi directional approach for success: - Verticals — Research community, Gov, Organizations - RDA is interested in this - Pilot project to set up this vertical involvement with a data center - Horizontal — with libraries - Create bridges #### Logistics (TBD): - We need a home: even a rudimentary home - Could be distributed among the group? - Internet Archive - as a resource and on short notice... - End user access: - Need to keep user communities in mind when rescuing data - Are we considering access? - Chain of custody? - Preserving every event in Premis? - Workflows that that don't require provenance data? (i.e. hashes to verify accuracy and reliability of data files) - Different pipes for different discipline's metadata? - Would be great to have a way to attach the additional metadata information (for things that are dependent e.g. on date) #### Contributors - COS - Support via infrastructure and metadata aspects - Happy to be an interface (via OSF, SHARE, Ember platform) - No funding for providing all the storage necessary - Fedora - Positive support interest in involvement: levels of storage aspects, incorporating aspects of curation, preservation - Justin - working with end user perspective - functionality of results Preserving more than just the bits - ESIP has been doing some of this work - - Joan: metadata needs its own pipe for complex relationships - need to be a way to indicate diversity of metadata - different dates etc. timing of these efforts? ## LUNCH BREAK #### Libraries (includes some possible deliverables): - Identify and Notify Ambassadors - Metadata Model (For Data Registry)(Including concepts of triage) - 2 different guides - External and internal facing also actions as a reference service or actions as a library - there is automony is letting the institute figure out what this means: mapping to reference interview philosophy - Two guides: - External focus - One for the public at large - Reference interview — Preparation - OSF in a box, to empower your patrons - Connect the person to an ongoing event/collaboration - Internal focus - One for libraries - Collection development How to nominate and prioritize data - Identify risks - OSF in a box, supported by libraries - Tim: suggests LibGuides - Model for branded portal (SHARE/EMBER/OSF project)method of aggregated discovery... - A SHARE portal (for aggregating "rescued" data): - All data from all DataRescue participants would be exposed in a branded portal using the SHARE APIs - Filter by provider - Obtaining share keys for each provider is needed. #### Data Rescue in a Box See <https://osf.io/6eyfu/> for full details General outline: - Create an inventory of what’s been done already: - Identify what’s already been happening in Data Rescue - Solicit submissions/notes to make us aware of what people are doing - GitHub (managed by Edgi) Repo as a source - Create a toolkit based on these resources - Toolkit should include a directory of names of people/communities - Should include a Community coordination registry/directory: (Justin & Lynn) - Place and topic of interest - Determine terminology (e.g. a name for the “communities of practice” that you would be pointing people to) - Ambassadors that you could connect with at diff organizations - Technical next steps Re: COS. Brian from COS can fill in more info - Rsync/OSF, A topic we discussed. Rclone plugin for OSF - SHARE data portal for datasets — registry of all the datasets - Metadata Model for Data registry - What the metadata should looklike for the datasets that have already been “rescued” - This model needs to include “Triage” What's already happening? 1. Inventory Tools, Processes, etc./Who has been doing data rescue? This includes: Location and type (What do we call this representation -- community)(Also ARL) 2. Information Toolkit (Also ARL) 3. Convening discussion 4. Community Coordination Registry (Justin & Lynn) 5. SHARE data portal -- part of data in a box -- how to hook in to the broader world (This is also for ARL libraries...) Ambassadors would want to be listed within the registry Metadata discussion ### **ARL and the distributed web** *We should also have a vision for communicating this work to the other audiences as well (not just ARL)* (Elements of output: debating whether libraries need to have mission/visions dissemination to the community) #### How to present this work (esp. to ARL): - Support to member directors for whom this work is important - Collection Development - Patterns and tools (Sayeed and Others) - Alignments with traditional collection development. - There are many analogies between a given book and a given part of datasets - Aligns with the work distributed web - This will allow ARL to work as a network - Become tolerant of risk and such - Different from federation! - The bandwidth, concepts, etc. - We are increasing the catalog in this way - “Here’s what the world would look like if you were to go this path” - “Here are the problems you are currently having” - E.g. “this as a solution to your local storage problems” - There could be a resource to assist with implementation: - "This is what you should do / Here is a technical stack" #### **Next Layer: ARL as IPFS Network** Who should be involved in this conversation and implementations? This is to answer the “why” not as much how or practical implementation 2 Parallel tracks (or more) - IT Directors and experts - Practicing / practical experts - Data Together (working on bridging the technical and knowledge gaps) - Focused on solving the problem of making IPFS useful for use cases like data rescue (brandon) - EDGI - IPFS - Protocol Labs - Fedora and COS/OSF folks - ESIP (represent end users) - Representative organizations that support communities - Empowering under-resourced communities to hold their own data - Program Manager (possibly, especially for rollout) **This needs to be at least 2 conversations:** - Consensus building - Technical structure, building tools etc. **Remember:** Libraries are still in a supporting role - In this space, we're a part of the community beyond libraries #### Cascading effect - Need to build in connections to the rest of the conversation - The collective infrastructure needs to be prepared to support the whole before roll-out (?) - Bring metadata work into rescue event structure - Populate metadata - nominate and prioritize things to be described TIMELINE - Megan and Reid: Data Rescue In a Box - Notes: Things in and around ... August 4th - Sayeed - Winston: Update ... August 4th - Sayeed and others - ARL as distributed web ... August 11th - Matt - ARL as IPFS Network ...August 31st Coordinating Google doc folder in OSF ## ***An alternate set of notes: <https://osf.io/t8a9r/>*** [1]: https://osf.io/rvyub/

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.

Create an Account Learn More Hide this message

Main content

Home

Menu

Start managing your projects on the OSF today.

Main content

Links to this project

Home

Menu

Add new wiki page

Page permissions have changed

Wiki page deleted

Connected to the collaborative wiki

Connecting to the collaborative wiki

Collaborative wiki is unavailable

Browser unsupported

Start managing your projects on the OSF today.