# Overview from Day 1:
Discussion of outcomes from the data into OSF group:
- Data Rescue in a Box
- Content Aggregation
- For more see [https://osf.io/rvyub/][1] and Google doc with write-up
## Discussion and feedback:
#### Need to be collecting methods and tools and not building more tools
- Could be space for a new group, co-locating and emphasising what different groups are doing and where they have been successful
- Harness what people have done in the past.
- Complexity of checklist of some working groups may be difficult to re-use/work with - but taking a small/simplified mechanism from others
- Want to have formal channels for feedback
- Important to note about multi-pronged efforts: some scale and some don't
#### EDGI success
- level of awareness
- Struggled with what is going to be done and by whom: critical eye to "scale and governance of process"
- Aware of our communities: What is the purpose of some of our communities?
#### Warning about Automation:
- Automation can lead to missing metadata
- We need concerned individuals to help with this
- Make sure that the automation approach actually facilitates metadata creation
- Raw data
- Taxonomy of the data and the metadata
- Need tools for citizens/students
- Intuition/affiliation needs to be included... also missing in automation
- Augmenting record metadata is a larger challenge
- Refining of records, we need something to make them meaningful
- Not only related to data rescue
- Humanitarian Open Street Map
- Ushahidi
#### Documenting process/workflow to share - more important than any specific technology
- Need formal channels of feedback of what works and what doesn't
- Seriously consider **scale**
- Transparent governance of process
- Maximize available resources
- Maximize people skills:
- Habitat for Humanity - "real time adjustment - makes everybody feel like they are contributing"
- What can we learn from this kind of model?
- Acknowledge that not everybody has the specific skills, but that there is something for everyone to work on to contribute:
- "Even if you didn't rescue data directly, you still contributed to our success"
- EDGI has been doing this, has different categories of tasks for different abilities (and may have resources related to assigning people effectively)
- EDGI Hackathon events: everyone given something constructive to do
- Capacity building - people will gain skills through practice-based learning
- A series of questions to ask to help to identify volunteers can help
- Be prepared to say “I know you said you have this skillset, but we really need your help over here” (*positive re-direction*)
#### Needs in this space:
- Metadata: experienced people and more contributions
- Education resources for how to use the "rescued data" in its "new, secondary location"
- Directory of data rescue contact persons in libraries?
- Make sure they have resources and have been reached out to first
- Putting together a directory of librarians is feasible
- Funding:
- Who is going to pay for this...
- How do you show the geographically broad spectrum of patrons that would available -- who could call?
#### Considerations
- Conversation that looks at multiple roles and expertise - it takes a village:
- Federal ecosystem - careers have been spent working with this data - whole other set of people who are very invested (thinking across communities)
- Keep focus on service the community: "What can we do to support this work?" and "What can libraries do to support Data Rescue efforts?"
- See example of Spark
- Dropping your ideas about your "role" to serve the sense of urgency
- Connections with agencies whose data we want to protect:
- May need to get away from the term "rescue"
- "Data Justice" as alternative
- Who at the agency where you are rescuing data from should be talking to.. Step 1 talk to the people
- Must be very careful to protect people who work with people. Anonymity considerations.
#### Collection Development
- Accurate reframing of this work in libraries
- This is a collection development effort - libraries have a fair amount of capacity to deal with collection development - ways to frame by speaking to a library community
- Should become a persistent annual effort
- There are collection funds that could start to shift in this direction
- Terminology: still need to maintain the urgency.
- This is a way that collectors and liaisons to pay attention to these items.
- Remember the "data justice" elements of this work
- Fatigue around "Data rescue"
- This reframing could support the long game direction of this work -> data repository
- How do we keep that sense that action is needed?
- Library Roles:
- Collections,
- Infrastructure,
- Funding
- Not necessarily (and probably shouldn't be) ownership
- Guardianship?
- Cooperative collection development
- Concept used by Architecture group
#### Suggested next steps to consider:
- EDGI shifting more into "Data Justice" - there are plenty of places for libraries to get involved in this moving forward.
- Article for D-Lib on Justice Issues around data access (data justice)
- Important to frame it as a justice issue
- White paper, Tangible set of recommendations:
- Standard resources and delineation of roles
- Data rescue in a box
- Delineation of roles
- What libraries can do instead?
- E.g. what if every library was part of an ipfs network
- There are too many degrees of freedom, we need to add some constraints
- ARL Libraries as a possible audience for paper. Perhaps also research communities?
- Ambassador Program?
- Mechanism for ambassadors to connect
- JHU could take a leadership role in organizing this
- Need funding for this
- Grants can both slow and speed the process
- Concerns expressed that it is easier to find funding for "flashy" work
#### Multi directional approach for success:
- Verticals — Research community, Gov, Organizations
- RDA is interested in this
- Pilot project to set up this vertical involvement with a data center
- Horizontal — with libraries
- Create bridges
#### Logistics (TBD):
- We need a home: even a rudimentary home
- Could be distributed among the group?
- Internet Archive - as a resource and on short notice...
- End user access:
- Need to keep user communities in mind when rescuing data
- Are we considering access?
- Chain of custody?
- Preserving every event in Premis?
- Workflows that that don't require provenance data? (i.e. hashes to verify accuracy and reliability of data files)
- Different pipes for different discipline's metadata?
- Would be great to have a way to attach the additional metadata information (for things that are dependent e.g. on date)
#### Contributors
- COS
- Support via infrastructure and metadata aspects
- Happy to be an interface (via OSF, SHARE, Ember platform)
- No funding for providing all the storage necessary
- Fedora
- Positive support interest in involvement: levels of storage aspects, incorporating aspects of curation, preservation
- Justin - working with end user perspective - functionality of results Preserving more than just the bits
- ESIP has been doing some of this work -
- Joan: metadata needs its own pipe for complex relationships
- need to be a way to indicate diversity of metadata
- different dates etc. timing of these efforts?
## LUNCH BREAK
#### Libraries (includes some possible deliverables):
- Identify and Notify Ambassadors
- Metadata Model (For Data Registry)(Including concepts of triage)
- 2 different guides - External and internal facing also actions as a reference service or actions as a library - there is automony is letting the institute figure out what this means: mapping to reference interview philosophy
- Two guides:
- External focus - One for the public at large
- Reference interview — Preparation
- OSF in a box, to empower your patrons
- Connect the person to an ongoing event/collaboration
- Internal focus - One for libraries
- Collection development How to nominate and prioritize data
- Identify risks
- OSF in a box, supported by libraries
- Tim: suggests LibGuides
- Model for branded portal (SHARE/EMBER/OSF project)method of aggregated discovery...
- A SHARE portal (for aggregating "rescued" data):
- All data from all DataRescue participants would be exposed in a branded portal using the SHARE APIs
- Filter by provider
- Obtaining share keys for each provider is needed.
#### Data Rescue in a Box
See <https://osf.io/6eyfu/> for full details
General outline:
- Create an inventory of what’s been done already:
- Identify what’s already been happening in Data Rescue
- Solicit submissions/notes to make us aware of what people are doing
- GitHub (managed by Edgi) Repo as a source
- Create a toolkit based on these resources
- Toolkit should include a directory of names of people/communities
- Should include a Community coordination registry/directory: (Justin & Lynn)
- Place and topic of interest
- Determine terminology (e.g. a name for the “communities of practice” that you would be pointing people to)
- Ambassadors that you could connect with at diff organizations
- Technical next steps Re: COS. Brian from COS can fill in more info
- Rsync/OSF, A topic we discussed. Rclone plugin for OSF
- SHARE data portal for datasets — registry of all the datasets
- Metadata Model for Data registry
- What the metadata should looklike for the datasets that have already been “rescued”
- This model needs to include “Triage”
What's already happening?
1. Inventory Tools, Processes, etc./Who has been doing data rescue? This includes: Location and type (What do we call this representation -- community)(Also ARL)
2. Information Toolkit (Also ARL)
3. Convening discussion
4. Community Coordination Registry (Justin & Lynn)
5. SHARE data portal -- part of data in a box -- how to hook in to the broader world (This is also for ARL libraries...)
Ambassadors would want to be listed within the registry
Metadata discussion
### **ARL and the distributed web**
*We should also have a vision for communicating this work to the other audiences as well (not just ARL)*
(Elements of output: debating whether libraries need to have mission/visions dissemination to the community)
#### How to present this work (esp. to ARL):
- Support to member directors for whom this work is important
- Collection Development
- Patterns and tools (Sayeed and Others)
- Alignments with traditional collection development.
- There are many analogies between a given book and a given part of datasets
- Aligns with the work distributed web
- This will allow ARL to work as a network
- Become tolerant of risk and such
- Different from federation!
- The bandwidth, concepts, etc.
- We are increasing the catalog in this way
- “Here’s what the world would look like if you were to go this path”
- “Here are the problems you are currently having”
- E.g. “this as a solution to your local storage problems”
- There could be a resource to assist with implementation:
- "This is what you should do / Here is a technical stack"
#### **Next Layer: ARL as IPFS Network**
Who should be involved in this conversation and implementations?
This is to answer the “why” not as much how or practical implementation
2 Parallel tracks (or more)
- IT Directors and experts
- Practicing / practical experts
- Data Together (working on bridging the technical and knowledge gaps)
- Focused on solving the problem of making IPFS useful for use cases like data rescue (brandon)
- EDGI
- IPFS
- Protocol Labs
- Fedora and COS/OSF folks
- ESIP (represent end users)
- Representative organizations that support communities
- Empowering under-resourced communities to hold their own data
- Program Manager (possibly, especially for rollout)
**This needs to be at least 2 conversations:**
- Consensus building
- Technical structure, building tools etc.
**Remember:** Libraries are still in a supporting role
- In this space, we're a part of the community beyond libraries
#### Cascading effect
- Need to build in connections to the rest of the conversation
- The collective infrastructure needs to be prepared to support the whole before roll-out (?)
- Bring metadata work into rescue event structure
- Populate metadata
- nominate and prioritize things to be described
TIMELINE
- Megan and Reid: Data Rescue In a Box - Notes: Things in and around ... August 4th
- Sayeed - Winston: Update ... August 4th
- Sayeed and others - ARL as distributed web ... August 11th
- Matt - ARL as IPFS Network ...August 31st
Coordinating Google doc folder in OSF
## ***An alternate set of notes: <https://osf.io/t8a9r/>***
[1]: https://osf.io/rvyub/