Main content

Files | Discussion Wiki | Discussion | Discussion
default Loading...

Home

Menu

Loading wiki pages...

View
Wiki Version:
# Overview from Day 1: Discussion of outcomes from the data into OSF group: - Data Rescue in a Box - Content Aggregation - For more see [https://osf.io/rvyub/][1] and Google doc with write-up ## Discussion and feedback: #### Need to be collecting methods and tools and not building more tools - Could be space for a new group, co-locating and emphasising what different groups are doing and where they have been successful - Harness what people have done in the past. - Complexity of checklist of some working groups may be difficult to re-use/work with - but taking a small/simplified mechanism from others - Want to have formal channels for feedback - Important to note about multi-pronged efforts: some scale and some don't #### EDGI success - level of awareness - Struggled with what is going to be done and by whom: critical eye to "scale and governance of process" - Aware of our communities: What is the purpose of some of our communities? #### Warning about Automation: - Automation can lead to missing metadata - We need concerned individuals to help with this - Make sure that the automation approach actually facilitates metadata creation - Raw data - Taxonomy of the data and the metadata - Need tools for citizens/students - Intuition/affiliation needs to be included... also missing in automation - Augmenting record metadata is a larger challenge - Refining of records, we need something to make them meaningful - Not only related to data rescue - Humanitarian Open Street Map - Ushahidi #### Documenting process/workflow to share - more important than any specific technology - Need formal channels of feedback of what works and what doesn't - Seriously consider **scale** - Transparent governance of process - Maximize available resources - Maximize people skills: - Habitat for Humanity - "real time adjustment - makes everybody feel like they are contributing" - What can we learn from this kind of model? - Acknowledge that not everybody has the specific skills, but that there is something for everyone to work on to contribute: - "Even if you didn't rescue data directly, you still contributed to our success" - EDGI has been doing this, has different categories of tasks for different abilities (and may have resources related to assigning people effectively) - EDGI Hackathon events: everyone given something constructive to do - Capacity building - people will gain skills through practice-based learning - A series of questions to ask to help to identify volunteers can help - Be prepared to say “I know you said you have this skillset, but we really need your help over here” (*positive re-direction*) #### Needs in this space: - Metadata: experienced people and more contributions - Education resources for how to use the "rescued data" in its "new, secondary location" - Directory of data rescue contact persons in libraries? - Make sure they have resources and have been reached out to first - Putting together a directory of librarians is feasible - Funding: - Who is going to pay for this... - How do you show the geographically broad spectrum of patrons that would available -- who could call? #### Considerations - Conversation that looks at multiple roles and expertise - it takes a village: - Federal ecosystem - careers have been spent working with this data - whole other set of people who are very invested (thinking across communities) - Keep focus on service the community: "What can we do to support this work?" and "What can libraries do to support Data Rescue efforts?" - See example of Spark - Dropping your ideas about your "role" to serve the sense of urgency - Connections with agencies whose data we want to protect: - May need to get away from the term "rescue" - "Data Justice" as alternative - Who at the agency where you are rescuing data from should be talking to.. Step 1 talk to the people - Must be very careful to protect people who work with people. Anonymity considerations. #### Collection Development - Accurate reframing of this work in libraries - This is a collection development effort - libraries have a fair amount of capacity to deal with collection development - ways to frame by speaking to a library community - Should become a persistent annual effort - There are collection funds that could start to shift in this direction - Terminology: still need to maintain the urgency. - This is a way that collectors and liaisons to pay attention to these items. - Remember the "data justice" elements of this work - Fatigue around "Data rescue" - This reframing could support the long game direction of this work -> data repository - How do we keep that sense that action is needed? - Library Roles: - Collections, - Infrastructure, - Funding - Not necessarily (and probably shouldn't be) ownership - Guardianship? - Cooperative collection development - Concept used by Architecture group #### Suggested next steps to consider: - EDGI shifting more into "Data Justice" - there are plenty of places for libraries to get involved in this moving forward. - Article for D-Lib on Justice Issues around data access (data justice) - Important to frame it as a justice issue - White paper, Tangible set of recommendations: - Standard resources and delineation of roles - Data rescue in a box - Delineation of roles - What libraries can do instead? - E.g. what if every library was part of an ipfs network - There are too many degrees of freedom, we need to add some constraints - ARL Libraries as a possible audience for paper. Perhaps also research communities? - Ambassador Program? - Mechanism for ambassadors to connect - JHU could take a leadership role in organizing this - Need funding for this - Grants can both slow and speed the process - Concerns expressed that it is easier to find funding for "flashy" work #### Multi directional approach for success: - Verticals — Research community, Gov, Organizations - RDA is interested in this - Pilot project to set up this vertical involvement with a data center - Horizontal — with libraries - Create bridges #### Logistics (TBD): - We need a home: even a rudimentary home - Could be distributed among the group? - Internet Archive - as a resource and on short notice... - End user access: - Need to keep user communities in mind when rescuing data - Are we considering access? - Chain of custody? - Preserving every event in Premis? - Workflows that that don't require provenance data? (i.e. hashes to verify accuracy and reliability of data files) - Different pipes for different discipline's metadata? - Would be great to have a way to attach the additional metadata information (for things that are dependent e.g. on date) #### Contributors - COS - Support via infrastructure and metadata aspects - Happy to be an interface (via OSF, SHARE, Ember platform) - No funding for providing all the storage necessary - Fedora - Positive support interest in involvement: levels of storage aspects, incorporating aspects of curation, preservation - Justin - working with end user perspective - functionality of results Preserving more than just the bits - ESIP has been doing some of this work - - Joan: metadata needs its own pipe for complex relationships - need to be a way to indicate diversity of metadata - different dates etc. timing of these efforts? ## LUNCH BREAK #### Libraries (includes some possible deliverables): - Identify and Notify Ambassadors - Metadata Model (For Data Registry)(Including concepts of triage) - 2 different guides - External and internal facing also actions as a reference service or actions as a library - there is automony is letting the institute figure out what this means: mapping to reference interview philosophy - Two guides: - External focus - One for the public at large - Reference interview — Preparation - OSF in a box, to empower your patrons - Connect the person to an ongoing event/collaboration - Internal focus - One for libraries - Collection development How to nominate and prioritize data - Identify risks - OSF in a box, supported by libraries - Tim: suggests LibGuides - Model for branded portal (SHARE/EMBER/OSF project)method of aggregated discovery... - A SHARE portal (for aggregating "rescued" data): - All data from all DataRescue participants would be exposed in a branded portal using the SHARE APIs - Filter by provider - Obtaining share keys for each provider is needed. #### Data Rescue in a Box See <https://osf.io/6eyfu/> for full details General outline: - Create an inventory of what’s been done already: - Identify what’s already been happening in Data Rescue - Solicit submissions/notes to make us aware of what people are doing - GitHub (managed by Edgi) Repo as a source - Create a toolkit based on these resources - Toolkit should include a directory of names of people/communities - Should include a Community coordination registry/directory: (Justin & Lynn) - Place and topic of interest - Determine terminology (e.g. a name for the “communities of practice” that you would be pointing people to) - Ambassadors that you could connect with at diff organizations - Technical next steps Re: COS. Brian from COS can fill in more info - Rsync/OSF, A topic we discussed. Rclone plugin for OSF - SHARE data portal for datasets — registry of all the datasets - Metadata Model for Data registry - What the metadata should looklike for the datasets that have already been “rescued” - This model needs to include “Triage” What's already happening? 1. Inventory Tools, Processes, etc./Who has been doing data rescue? This includes: Location and type (What do we call this representation -- community)(Also ARL) 2. Information Toolkit (Also ARL) 3. Convening discussion 4. Community Coordination Registry (Justin & Lynn) 5. SHARE data portal -- part of data in a box -- how to hook in to the broader world (This is also for ARL libraries...) Ambassadors would want to be listed within the registry Metadata discussion ### **ARL and the distributed web** *We should also have a vision for communicating this work to the other audiences as well (not just ARL)* (Elements of output: debating whether libraries need to have mission/visions dissemination to the community) #### How to present this work (esp. to ARL): - Support to member directors for whom this work is important - Collection Development - Patterns and tools (Sayeed and Others) - Alignments with traditional collection development. - There are many analogies between a given book and a given part of datasets - Aligns with the work distributed web - This will allow ARL to work as a network - Become tolerant of risk and such - Different from federation! - The bandwidth, concepts, etc. - We are increasing the catalog in this way - “Here’s what the world would look like if you were to go this path” - “Here are the problems you are currently having” - E.g. “this as a solution to your local storage problems” - There could be a resource to assist with implementation: - "This is what you should do / Here is a technical stack" #### **Next Layer: ARL as IPFS Network** Who should be involved in this conversation and implementations? This is to answer the “why” not as much how or practical implementation 2 Parallel tracks (or more) - IT Directors and experts - Practicing / practical experts - Data Together (working on bridging the technical and knowledge gaps) - Focused on solving the problem of making IPFS useful for use cases like data rescue (brandon) - EDGI - IPFS - Protocol Labs - Fedora and COS/OSF folks - ESIP (represent end users) - Representative organizations that support communities - Empowering under-resourced communities to hold their own data - Program Manager (possibly, especially for rollout) **This needs to be at least 2 conversations:** - Consensus building - Technical structure, building tools etc. **Remember:** Libraries are still in a supporting role - In this space, we're a part of the community beyond libraries #### Cascading effect - Need to build in connections to the rest of the conversation - The collective infrastructure needs to be prepared to support the whole before roll-out (?) - Bring metadata work into rescue event structure - Populate metadata - nominate and prioritize things to be described TIMELINE - Megan and Reid: Data Rescue In a Box - Notes: Things in and around ... August 4th - Sayeed - Winston: Update ... August 4th - Sayeed and others - ARL as distributed web ... August 11th - Matt - ARL as IPFS Network ...August 31st Coordinating Google doc folder in OSF ## ***An alternate set of notes: <https://osf.io/t8a9r/>*** [1]: https://osf.io/rvyub/
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.