## Notes for Tuesday July 18th Morning Presentations ##
Sayeed Opening Remarks:<br>
Scale<br>
Sense of Urgency from Faculty about Federal Data
- this is not an issue that we can "depoliticize"
- "We" not looking to own this initiative but share
- How would you deal with these distributed efforts?
- Putting the pieces together with a blueprint to share with the library community...
Assumptions:
- we won't have the agencies represented - but we have the ability to connect and we will
- The OSF is a good way to start to share and get others involved
Comments/Questions:<br>
OSF as a framework for further development and not intended as solely data storage
Question of whether the OSF will be adopted by others for this activity
Agenda overview...
Joan Saez Presentation:<br>
CloudBIRST - hosts data, financially supports DRB - federal data
- Federal data has a lot of errors from a tech
standpoint
- Volunteers primarily IT background - but also fed data individuals, environmental eng. Many were personally responible for originally creating this data.
- Never intended to be a repository for data
- feel like this is a short term solution believe these data belong somewhere beyond the private company
- the importance of preserving the file structure - try not to touch any of these files
- Shares 4 projects 2 in process 2 in queque
- Social media and network spaces
-- Github has code for tools - target servers Portal in Beta - very large data sets
- Mass Rescue Methodology:
- id problems
- pose solutions
- proof of concept
- Refine solution - repeat until stable
- Execute - repeat as needed
- Event Day
- org team: train captains (SMEs) and coders
- wholesale capture
- portal configuration
- Data defenders
- Shows tools used - 12 or so - all used in the back room not with GUI
- created metadata behind the scenes to go with the materials
- standard naming convention is built into the portal
- Challenges: logistics, scope larger than anticipated, tools exisiting and created,
data can be old, chain of custody, integrity, privacy, and language
-Language: need to agree on terms, glossary
"incrawlables": often databases with data tables without referential data
<br>
"Data Recipes": info pulled from multiple tables
- What is the value of these data recipes?
-
Differentiators for success:
- Tech expertise
- bias towards action
- focus on results
- retain tech structure
- narrow definition for action
- autonomy to act
- shallow and informal approval chain
- requirements to validate
- divide and conquer mentality
- common language
- project language
- reiterate remote process
- online resources
-
Questions:
- Water Butler - more later
- Issue of Chain of Custody
-(particularly private)
- who is less important than how sometimes
- the who can actually be agents, don't need to disclose the identity, but just share the UUID to see patterns of problems
- ?Librarians as SMEs?
Aaron - Introducing Data Conservancy
Fedora 4 and the linked data platform (manages both RDF and binaries)
- API Extension Architecture API-X diagram
- glue to plug in these different extensions
- bind sevices based on the content/description of the resources
- Packaging: Semantic format - profile of bag-it primary use case is creating packages in these preservation services
- Package Ingest Extension
- RMap
**Fedora Overview - David Wilcox**
Duraspace: D Space, VIVO, Fedora (also services duracloud, hykudirect, dspacedirect, archivesdirect) <br>
OS repo software - preserves supplies access to digital objects
- based in the world of linked data
- interoperability is key
- flexibility, adaptability, durability, standards-based, thriving community
- Fedora is middleware --samvera, islandora
- Storage and Preservation: RDF and not RDF (binary), versioning, checksums
- Provenance: optional audit
- metadata and data can be accessed without fedora
- Is a linked data infrastructure
- Ecosystem: Fedora + messaging + ext:triple store + website? + IIIF Server? + Research Data? + Profiles? + import and export: FS or DDP (API-X is a layer in between that can mediate..
- There is a lot of training available and mailing lists
- See links in David's presentation...
-
**Questions:**
- Are there any prepackaged - usable blank platforms? Yes - Islandora and Samvera, depends on use case...
**Brian - OSF**
What words would you use to describe the OSF?
platform, collaboration,
- You need at least one bibliographic contributor on a project
- Nodes/Components
- Example Project
- Glacier storage - keep parity information, have checksums for OSF storage (or other providers)
- version control
- Registration metadata
- API - Metadata Schema
- WaterButler - manages streams (python tornado)
- test rescue projects: issue of FTP no waterbutler mechanism...
- "Fish Guy" - scans all the fish in the world with CT scanner
- Bash scripting: on GitHub - mostly complete...
- Brian has added links to Dev Docs API documentation and relevant resources
Questions:<br>
Provenance aspects available - COS philosophy is not to tell people what to do - don't have standards, but have API application standards
SHARE: shared access research ecosystem - takes standardized metadata from outputs of scholarly research and standardizing and normalizing (Major challenge of normalization) original intent as a notification service for academic services - what institutional outputs are happening within a certain amount of time. Some comparison with CORUS...
Sayeed: Karen and Elliot - RMap and SHARE: did start to find cool patterns of people's work - version 1 lots of things to work out - looking forward to seeing how RMap works with V2.
Preprint services
Megan: SHARE as a place to document discover data that had already been rescued
Ruth: Defining a project - how do we establish this
Joan: Project Management permissions aspects - Collaborators can have different access for each component
Brian shows the institutions API nodes and linking projects to other projects
Ruth - are you pursuing other types of integrations specifically publication services
- Yes - examples, Evernote
-
Dave asks about Dataverse - it is possible to connect with DVN
**Matt Zumwalt - Decentralized Technologies**
Decentralized web is decentralized and distributed network...
Paradigm shift - heavy shift ini the direction of P2P from centralized...
Natural mapping with distr.
Patterns emerging to manage decentralized
- model for activities
- view into activities
- tools for activities
-
public record with communities and collections
activities:
- Harvesting
- Monitoring
- Storing,
- Analyzing,
- Rescue
-
Matt's article: "The Internet has been stolen from you - take it back"
Hash-linked Data Structures: git ex
Benefits: Cryptographic, immutable, decoupling th location from contents identity
Precarious Web (Right Now): location address - falicy that there is only one copy
Storing together: Peers Coordinating - holding a copy with discovery services
Content Addressing - not where but verifying the exact content is the important thing..
- identify exact content by cryptographic hash
IPFS - transition to content-addressed protocol
- dweb address space
- http gateway
-
Article "instructions for saving endangered data: It's time to get decentralized"
Issue of learning new tools during a disaster...
No transparency - archivers 2.0 addressing this
Turkey - banned wikipedia - IPFS worked on uploading materials
- arising issues of provenance
- language
- motivations for interactions
Providing a reference point - what does it look like to use decentralized?
Not a done deal, this is a conversation
Questions:
- Aaron - provisioning, names = trust SSL and TLS, how does this deal with trust issue
- cultural changes - projects like keybase and matrix
- Jason - where does this come into play - validation: level 0 compliance
- Sayeed: are libraries still trusted?
- Tim: also going back to the researchers
- Ruth: Infrastructure bottle- necks related to net neutrality - any action there?
- more flexibility to circumvent - services from the closest possible source
- IPFS is transport agnostic
-Ruth:example of NASA where dbs are petabytes in size -- how would this work?
- constrained by the size of the pipes, but individuals only need a portion - have the ability to subselect is easily done...
- Protocol Labs creates an open market that allows participation
- Sayeed - can network of libraries be the actors behind this type of decentralized situation.
- Elliot - collection vs. time series based view of data: can IPFS index and deplay
- IPFS is an intermediate layer
-Elliot: Is there a particular format?
Hashes the bits that you gave it - not a sense of canonical representation, but you can choose what format you want to use.