Main content

Feed Options

Menu

Loading wiki pages...

View
Wiki Version:
@[toc] # Using the SHARE JSON API You query the json feed using any valid [Lucene query syntax](http://lucene.apache.org/core/2_9_4/queryparsersyntax.html). You can also filter your search on any of the [metadata fields that each data provider gives to SHARE](https://osf.io/wur56/wiki/Metadata%20Analysis/). You can also query by date ranges (using [Lucene date formats](https://lucene.apache.org/solr/4_10_3/solr-core/org/apache/solr/schema/DateField.html)), by keywords, or any combination of the above. Here are some examples of valid queries: - [?q=open AND science AND source:scitech](https://osf.io/api/v1/share/search?q=open%20AND%20science%20AND%20source:scitech) - [?q=tags:frogs](https://osf.io/api/v1/share/search?q=tags:frogs) - [?q=providerUpdatedDateTime:[2014-10-01 TO 2014-10-10]](https://.osf.io/api/v1/share/search?q=providerUpdatedDateTime:[2014-10-01T00:00:00Z%20TO%202014-10-10T00:00:59Z]) Even more flexibility is available via the elasticsearch query dsl. Access the raw elasticsearch results by adding the keyword [raw=True](https://osf.io/api/v1/share/search?raw=True). You can read more about [elasticsearch dsl](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html). # Using the SHARE Atom feed The [SHARE Atom feed](https://osf.io/share/atom) can be found at: https://osf.io/share/atom The feed is a layer on top of the existing scrAPI search API developed for SHARE. It takes a query URL parameter, and returns an Atom feed with the results of that query. The query syntax is as follows: /atom/?q=query_term You can also specify the field you would like to search, by adding the prefix of ```field:``` to the query, like so: /atom/?q=source:plos The fields themselves are documented on our [metadata schema page](https://github.com/CenterForOpenScience/SHARE/wiki/SHARE-schema) To query the Atom feed, you can use any valid [Lucene query syntax](http://lucene.apache.org/core/2_9_4/queryparsersyntax.html). You can query by any field in the metadata schema, by date ranges (using [Lucene date formats](https://lucene.apache.org/solr/4_10_3/solr-core/org/apache/solr/schema/DateField.html)), by keywords, or any combination of the above. Here are some examples of valid queries: - [/atom/?q=open AND science AND source:scitech](https://osf.io/share/atom?q=open%20AND%20science%20AND%20source:scitech) - [/atom/?q=tags:frogs](https://osf.io/share/atom/?q=tags:frogs) - [/atom/?q=providerUpdatedDateTime:[2014-10-01 TO 2014-10-10]](https://osf.io/share/atom/?q=providerUpdatedDateTime:[2014-10-01T00:00:00Z%20TO%202014-10-10T00:00:59Z]) Note: Atom allows for some greater flexibility in the way the metadata payload is structured. It also allows for some flexibility is specifying the date range of items to be returned. While this flexibility has not yet been leveraged in this Atom feed, we expect to improve it in time. For now, you can treat it much as you would the RSS feed. # Exploring the SHARE NS with curl and underscore This tutorial explores querying the previous version of the SHARE Notification Service API (NS) and the data it offers to its consumers. ## Getting Set Up This tutorial will use the [bash](http://www.gnu.org/software/bash/) shell (though in a simple enough way that most any Unix shell should do the trick), [curl](http://curl.haxx.se) which is available on most Unix and Mac systems, and [underscore](https://github.com/ddopson/underscore-cli) which is not very common yet and requires that [Node.js](http://nodejs.org) be installed. On most Macs this boils down to: 1. [Download and install Node.js.](http://nodejs.org/#download) 2. In a terminal window do the following to install underscore: ``` npm install -g underscore-cli ``` Once you have done this, you should be able to try the following command (the "%" is the unix prompt in a terminal window, like the one provided by the Mac Terminal program): ``` % curl -s "https://osf.io/api/v1/share/search/" | underscore extract 'count' 331116 ``` Of course, the number will be bigger since the count of records in the NS continually changes. ## A Few Basics The NS dataset is indexed in elasticsearch (ES) and natively ES likes talking via POST rather than GET. Also, to see the raw ES results we need to tweak our URL a bit. First the URL. We want to talk to raw ES, so we will add a bit to the end of the URL. Since that makes it long and awkward we will put this URL in a variable for future use. Since the NS uses the Open Science Framework to handle searchers, let's call this variable OSF. ``` % OSF='https://osf.io/api/v1/share/search/?raw=True' ``` You can test this URL... ``` % curl -s $OSF | underscore extract 'took' 25 ``` Just like the 'count' above, the 'took' number will change frequently, since it is a measure of how long the query took to execute on ES. To try a real ES query, though, we have to add POST to the mix. Since ES queries are written in JSON, it can be helpful to place the JSON code in a file that is easier to read and edit. Let's put this one in a file called 'sources.json': ``` { "size" : 0, "aggs" : { "sources" : { "terms" : { "field" : "source", "size" : 0 } } } } ``` Send it off to elasticsearch like this: ``` % curl -s $OSF -H "Content-Type: application/json" --data-binary @sources.json ``` ES should return a blob of JSON describing all the sources in the NS. Let's create a bash alias to make this easier to manage in future examples: ``` % alias OSFCURL='curl -s $OSF -H "Content-Type: application/json" --data-binary' ``` Now we can simplify the submission of an elasticsearch query to OSF as so: ``` % OSFCURL @sources.json ``` And if we want the response to be extra legible we can pipe the resulting JSON to underscore: ``` % OSFCURL @sources.json | underscore pretty ``` ## Getting the most recent titles from a source To get the most recent records from a particular source, you need to know the key for that source. See the section above to learn how to find the keys of all the sources in the NS. Once you have your key, include it in a JSON query like the example here, which we will call "stcloud.json": ``` { "size" : 5, "filter" : { "term" : { "source" : "stcloud" } } } ``` This query combined with underscore will provide you with a list of recent titles from St. Cloud State: ``` % OSFCURL @stcloud.json | underscore select .title ``` ## Get the most recent item added to the NS _Note: this does not actually return the most recent item, yet. We're working on that. This will be updated as soon as we have a way to ensure the item returned is the most recent addition to the NS._ Put the query into "recent.json": ``` { "size" : 1, "filter" : { "match_all" : { } } } ``` Use underscore to make the result more manageable: ``` % OSFCURL @recent.json | underscore pretty ``` Actually, this case is a touch more complicated than that. At the moment we have two types of records in the NS and you are probably interested in "event" records rather than "resource" records (which are only used to document collisions in the NS). To limit our result to the most recent non-resource record, replace "recent.json" with this: ``` { "size" : 1, "filter" : { "missing" : { "field" : "isResource" } } } ``` And run the query again: ``` % OSFCURL @recent.json | underscore pretty ``` ## Learn more You can run any elasticsearch query against the Notification Service dataset with this technique. To learn more about the queries that are possible, consult the [elasticsearch query dsl documentation](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html) and our own [NS metadata schema](https://osf.io/wur56/wiki/Schema/).