# SHARE Barn Raising Hackathon
SHARE is an open source project dedicated to taking metadata on new research from around the web and bringing it into one normalized format, accessible via a JSON API.
SHARE just had a Beta release this April, and we're searching for ways to make it better. We need your help to include more sources, make some cool visualizations of what the existing SHARE metadata contains, and develop tools to connect and enrich the data by combining it in new ways.
SHARE uses Cassandra along with elasticsearch to make aggregations and data manipulation a breeze.
Day 1 Coffee, tea, juice in the morning, lunch in the afternoon. Day 2
More information about the SHARE organization available at http://share-research.org
## Activities
----
### Harvesters
So far, we have over 43 content providers, and we need your help to add more! Have a favorite research aggregator or journal with an API with open metadata? Want to browse our list of APIs that have open research metadata? That's all you need to get started.
* [SHARE Core - API Harvesting and Normalization - scrAPI](https://github.com/fabianvf/scrapi)
* [Creating a metadata harvester for SHARE](https://osf.io/wur56/wiki/Creating%20a%20Harvester/)
* [SHARE Schema](https://github.com/JeffSpies/SHARE-Schema)
### Data Analysis/Visualization
We have a nearly full Elasticsearch API, and have started developing a python library to facilitate analysis of SHARE data. You can help contribute to this library, including porting it to other languages (like R). You can also use the library to create interesting and meaningful analysis of the data, looking at terms, fields provided, etc.
* [SHARE Parsing and Analysis Python library - sharepa](https://github.com/fabianvf/sharepa)
* [Elasticsearch API](http://osf.io/api/v1/share/search/?raw) and [API Documentation](http://osf.io/share/docs)
* [Old tool for visualizing SHARE data](http://github.com/erinspace/scrapi_stats)
* [SHARE Schema](https://github.com/JeffSpies/SHARE-Schema)
* [Metadata Analysis](https://osf.io/wur56/wiki/Metadata%20Analysis/) of what each provider has and how it is mapped to our schema
Here are some ideas for creating visualizations:
* Analyzing what keywords appear the most across services
* Analyzing identifiers that appear across services (dois, urls, etc)
* Analyzing contributors that appear across services
* Top keywords/contributors/titles/identifiers/etc
* Analyzing the number of providers that include certain fields
* Histograms would be a nice addition to the command line tool
### Data Enrichment
We gather a lot of unique identifiers (DOI, urls, etc), which is an easy way to connect data from disparate sources. Using information like contributors, titles, descriptions, and sponsors to perform cross-provider analyses will allow us to enrich the data we are gathering. We can also pull information about authors and sponsors from third party websites (ORCID, Impact story, Altmetrics, etc), and add that data to the aggregate metadata for a document.
* [Gist for adding text storage instead of using Cassandra](https://gist.github.com/fabianvf/597f57ffe8351156bb98)
### Dev Ops
* Work on improving the dev-ops flow
* We use Docker for almost all of scrAPI
* Automatically scaling Cassandra, Elasticsearch, and Celery workers
### Documentation
* Find places where documentation is ambiguous or missing
* Add new examples, fix wording, or make other improvements
---
## More Information
#### General SHARE Information
* [SHARE organization website](http://share-research.org)
* [SHARE Notification Service Repo](https://github.com/CenterForOpenScience/share) - used for issue tracking and discussion
#### Potential new Provider Sources
If you'd like to add a new provider to SHARE, here's a few places to get started.
When looking for a new provider, keep these things in mind:
* Service has an API that we can query by date - for perdiodic harvesting
* Service provides metadata that includes at the minimum the required fields for the [normalized SHARE schema](https://github.com/JeffSpies/SHARE-Schema/blob/master/share.yaml) - title, contributors, a link to the original source, and the last time the document was updated.
* Metadata provided can explicitly be redistrbuted freely under their terms of service and license.
Here are some places to look for sources:
* [OpenDOAR: Directory of Open Access Repositories](http://opendoar.org/)
- We need sources that are licensed CC0. OpenDOAR has an API that allows searching by subject, metadata licensing state, existence of an OAI url and others.
- [Open Doar API documentation](ttp://www.opendoar.org/tools/api.html) NOTE: the PDP contains more information about search parameters, read that first
- [Small Python script that uses the above API to query for all sources that are in English, have science content, and allow free access to their metadata; the script parses the XML output and returns a JSON-formatted list of dicts containing the repository name, main URL, and OAI URL](https://gist.github.com/stitchinthyme/dfeac2c8579bbd2d2fb0)
* Open archives list of OAI PMH Repositories http://www.openarchives.org/Register/BrowseSites
* Mendeley - http://dev.mendeley.com/
* Budapest Open Access Inititive - searching for BOAI or looking on this page for sources dedicated to open accesses to data http://www.budapestopenaccessinitiative.org/list_signatures
* CalTech Library - http://caltechs.library.caltech.edu/cgi/oai2
* Oklahoma State Thesis and Dissertation Archive - http://www.library.okstate.edu/thesis/
* Oklahoma Library: General archive, some non-science content - http://www.library.okstate.edu/digital/
* Aberdeen University Research Archive - http://eprints.aston.ac.uk/cgi/oai2?verb=Identify
* Digital Commons Network - http://network.bepress.com
* Birkbeck Institutional Research Online - general archive, science and non-science content - http://eprints.bbk.ac.uk/cgi/oai2?verb=Identify
* Bournemouth University Research Online - http://eprints.bournemouth.ac.uk/cgi/oai2?verb=Identify (general - science at http://eprints.bournemouth.ac.uk/view/subjects/sci.html)
* Bradford Scholars - general archive, science and non-science content - http://bradscholars.brad.ac.uk/dspace-oai/request?verb=Identify
* Canterbury Research and Theses Environment - http://create.canterbury.ac.uk/cgi/oai2?verb=Identify (general - science at http://create.canterbury.ac.uk/view/subjects/Q.html)
* CEDA (Centre for Environmental Data Archival) - http://cedadocs.badc.rl.ac.uk/cgi/oai2?verb=Identify
* CentAUR (Central Archive at the University of Reading) - general archive, science and non-science content - http://centaur.reading.ac.uk/cgi/oai2?verb=Identify
* CADAIR (Aberystwyth University Repository) - http://cadair.aber.ac.uk/dspace-oai/request?verb=Identify
* William &Mary Virginia Institute of Marine Science - https://digitalarchive.wm.edu/handle/10288/615
* ARRO (Anglia Ruskin Research Online - general archive, includes science) - http://angliaruskin.openrepository.com/arro/oai/request?verb=Identify
* University of California - http://escholarship.org/
* Aston University Research Archive - http://eprints.aston.ac.uk/cgi/oai2?verb=Identify
* Cognitive Sciences ePrint Archive - http://cogprints.org/cgi/oai2?verb=Identify
* Central Lancashire Online Knowledge - http://clok.uclan.ac.uk/cgi/oai2?verb=Identify
* City University Research Online - http://openaccess.city.ac.uk/cgi/oai2?verb=Identify (general - science at http://openaccess.city.ac.uk/view/subjects/Q.html)