Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
## Notes for Tuesday July 18th Morning Presentations ## Sayeed Opening Remarks:<br> Scale<br> Sense of Urgency from Faculty about Federal Data - this is not an issue that we can "depoliticize" - "We" not looking to own this initiative but share - How would you deal with these distributed efforts? - Putting the pieces together with a blueprint to share with the library community... Assumptions: - we won't have the agencies represented - but we have the ability to connect and we will - The OSF is a good way to start to share and get others involved Comments/Questions:<br> OSF as a framework for further development and not intended as solely data storage Question of whether the OSF will be adopted by others for this activity Agenda overview... Joan Saez Presentation:<br> CloudBIRST - hosts data, financially supports DRB - federal data - Federal data has a lot of errors from a tech standpoint - Volunteers primarily IT background - but also fed data individuals, environmental eng. Many were personally responible for originally creating this data. - Never intended to be a repository for data - feel like this is a short term solution believe these data belong somewhere beyond the private company - the importance of preserving the file structure - try not to touch any of these files - Shares 4 projects 2 in process 2 in queque - Social media and network spaces -- Github has code for tools - target servers Portal in Beta - very large data sets - Mass Rescue Methodology: - id problems - pose solutions - proof of concept - Refine solution - repeat until stable - Execute - repeat as needed - Event Day - org team: train captains (SMEs) and coders - wholesale capture - portal configuration - Data defenders - Shows tools used - 12 or so - all used in the back room not with GUI - created metadata behind the scenes to go with the materials - standard naming convention is built into the portal - Challenges: logistics, scope larger than anticipated, tools exisiting and created, data can be old, chain of custody, integrity, privacy, and language -Language: need to agree on terms, glossary "incrawlables": often databases with data tables without referential data <br> "Data Recipes": info pulled from multiple tables - What is the value of these data recipes? - Differentiators for success: - Tech expertise - bias towards action - focus on results - retain tech structure - narrow definition for action - autonomy to act - shallow and informal approval chain - requirements to validate - divide and conquer mentality - common language - project language - reiterate remote process - online resources - Questions: - Water Butler - more later - Issue of Chain of Custody -(particularly private) - who is less important than how sometimes - the who can actually be agents, don't need to disclose the identity, but just share the UUID to see patterns of problems - ?Librarians as SMEs? Aaron - Introducing Data Conservancy Fedora 4 and the linked data platform (manages both RDF and binaries) - API Extension Architecture API-X diagram - glue to plug in these different extensions - bind sevices based on the content/description of the resources - Packaging: Semantic format - profile of bag-it primary use case is creating packages in these preservation services - Package Ingest Extension - RMap **Fedora Overview - David Wilcox** Duraspace: D Space, VIVO, Fedora (also services duracloud, hykudirect, dspacedirect, archivesdirect) <br> OS repo software - preserves supplies access to digital objects - based in the world of linked data - interoperability is key - flexibility, adaptability, durability, standards-based, thriving community - Fedora is middleware --samvera, islandora - Storage and Preservation: RDF and not RDF (binary), versioning, checksums - Provenance: optional audit - metadata and data can be accessed without fedora - Is a linked data infrastructure - Ecosystem: Fedora + messaging + ext:triple store + website? + IIIF Server? + Research Data? + Profiles? + import and export: FS or DDP (API-X is a layer in between that can mediate.. - There is a lot of training available and mailing lists - See links in David's presentation... - **Questions:** - Are there any prepackaged - usable blank platforms? Yes - Islandora and Samvera, depends on use case... **Brian - OSF** What words would you use to describe the OSF? platform, collaboration, - You need at least one bibliographic contributor on a project - Nodes/Components - Example Project - Glacier storage - keep parity information, have checksums for OSF storage (or other providers) - version control - Registration metadata - API - Metadata Schema - WaterButler - manages streams (python tornado) - test rescue projects: issue of FTP no waterbutler mechanism... - "Fish Guy" - scans all the fish in the world with CT scanner - Bash scripting: on GitHub - mostly complete... - Brian has added links to Dev Docs API documentation and relevant resources Questions:<br> Provenance aspects available - COS philosophy is not to tell people what to do - don't have standards, but have API application standards SHARE: shared access research ecosystem - takes standardized metadata from outputs of scholarly research and standardizing and normalizing (Major challenge of normalization) original intent as a notification service for academic services - what institutional outputs are happening within a certain amount of time. Some comparison with CORUS... Sayeed: Karen and Elliot - RMap and SHARE: did start to find cool patterns of people's work - version 1 lots of things to work out - looking forward to seeing how RMap works with V2. Preprint services Megan: SHARE as a place to document discover data that had already been rescued Ruth: Defining a project - how do we establish this Joan: Project Management permissions aspects - Collaborators can have different access for each component Brian shows the institutions API nodes and linking projects to other projects Ruth - are you pursuing other types of integrations specifically publication services - Yes - examples, Evernote - Dave asks about Dataverse - it is possible to connect with DVN **Matt Zumwalt - Decentralized Technologies** Decentralized web is decentralized and distributed network... Paradigm shift - heavy shift ini the direction of P2P from centralized... Natural mapping with distr. Patterns emerging to manage decentralized - model for activities - view into activities - tools for activities - public record with communities and collections activities: - Harvesting - Monitoring - Storing, - Analyzing, - Rescue - Matt's article: "The Internet has been stolen from you - take it back" Hash-linked Data Structures: git ex Benefits: Cryptographic, immutable, decoupling th location from contents identity Precarious Web (Right Now): location address - falicy that there is only one copy Storing together: Peers Coordinating - holding a copy with discovery services Content Addressing - not where but verifying the exact content is the important thing.. - identify exact content by cryptographic hash IPFS - transition to content-addressed protocol - dweb address space - http gateway - Article "instructions for saving endangered data: It's time to get decentralized" Issue of learning new tools during a disaster... No transparency - archivers 2.0 addressing this Turkey - banned wikipedia - IPFS worked on uploading materials - arising issues of provenance - language - motivations for interactions Providing a reference point - what does it look like to use decentralized? Not a done deal, this is a conversation Questions: - Aaron - provisioning, names = trust SSL and TLS, how does this deal with trust issue - cultural changes - projects like keybase and matrix - Jason - where does this come into play - validation: level 0 compliance - Sayeed: are libraries still trusted? - Tim: also going back to the researchers - Ruth: Infrastructure bottle- necks related to net neutrality - any action there? - more flexibility to circumvent - services from the closest possible source - IPFS is transport agnostic -Ruth:example of NASA where dbs are petabytes in size -- how would this work? - constrained by the size of the pipes, but individuals only need a portion - have the ability to subselect is easily done... - Protocol Labs creates an open market that allows participation - Sayeed - can network of libraries be the actors behind this type of decentralized situation. - Elliot - collection vs. time series based view of data: can IPFS index and deplay - IPFS is an intermediate layer -Elliot: Is there a particular format? Hashes the bits that you gave it - not a sense of canonical representation, but you can choose what format you want to use.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.