Feed Options

Menu

Loading wiki pages...

View
Wiki Version:
<p>@[toc]</p> <h1>Using the SHARE JSON API</h1> <p>You query the json feed using any valid <a href="http://lucene.apache.org/core/2_9_4/queryparsersyntax.html" rel="nofollow">Lucene query syntax</a>. You can also filter your search on any of the <a href="https://osf.io/wur56/wiki/Metadata%20Analysis/" rel="nofollow">metadata fields that each data provider gives to SHARE</a>. You can also query by date ranges (using <a href="https://lucene.apache.org/solr/4_10_3/solr-core/org/apache/solr/schema/DateField.html" rel="nofollow">Lucene date formats</a>), by keywords, or any combination of the above. </p> <p>Here are some examples of valid queries: </p> <ul> <li> <p><a href="https://osf.io/api/v1/share/search?q=open%20AND%20science%20AND%20source:scitech" rel="nofollow">?q=open AND science AND source:scitech</a></p> </li> <li> <p><a href="https://osf.io/api/v1/share/search?q=tags:frogs" rel="nofollow">?q=tags:frogs</a></p> </li> <li> <p><a href="https://.osf.io/api/v1/share/search?q=providerUpdatedDateTime:[2014-10-01T00:00:00Z%20TO%202014-10-10T00:00:59Z]" rel="nofollow">?q=providerUpdatedDateTime:[2014-10-01 TO 2014-10-10]</a></p> </li> </ul> <p>Even more flexibility is available via the elasticsearch query dsl. Access the raw elasticsearch results by adding the keyword <a href="https://osf.io/api/v1/share/search?raw=True" rel="nofollow">raw=True</a>. You can read more about <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html" rel="nofollow">elasticsearch dsl</a>.</p> <h1>Using the SHARE Atom feed</h1> <p>The <a href="https://osf.io/share/atom" rel="nofollow">SHARE Atom feed</a> can be found at:</p> <pre class="highlight"><code><a href="https://osf.io/share/atom" rel="nofollow">https://osf.io/share/atom</a></code></pre> <p>The feed is a layer on top of the existing scrAPI search API developed for SHARE. It takes a query URL parameter, and returns an Atom feed with the results of that query. The query syntax is as follows:</p> <pre class="highlight"><code>/atom/?q=query_term</code></pre> <p>You can also specify the field you would like to search, by adding the prefix of <code>field:</code> to the query, like so:</p> <pre class="highlight"><code>/atom/?q=source:plos</code></pre> <p>The fields themselves are documented on our <a href="https://github.com/CenterForOpenScience/SHARE/wiki/SHARE-schema" rel="nofollow">metadata schema page</a></p> <p>To query the Atom feed, you can use any valid <a href="http://lucene.apache.org/core/2_9_4/queryparsersyntax.html" rel="nofollow">Lucene query syntax</a>. You can query by any field in the metadata schema, by date ranges (using <a href="https://lucene.apache.org/solr/4_10_3/solr-core/org/apache/solr/schema/DateField.html" rel="nofollow">Lucene date formats</a>), by keywords, or any combination of the above. </p> <p>Here are some examples of valid queries: </p> <ul> <li> <p><a href="https://osf.io/share/atom?q=open%20AND%20science%20AND%20source:scitech" rel="nofollow">/atom/?q=open AND science AND source:scitech</a></p> </li> <li> <p><a href="https://osf.io/share/atom/?q=tags:frogs" rel="nofollow">/atom/?q=tags:frogs</a></p> </li> <li> <p><a href="https://osf.io/share/atom/?q=providerUpdatedDateTime:[2014-10-01T00:00:00Z%20TO%202014-10-10T00:00:59Z]" rel="nofollow">/atom/?q=providerUpdatedDateTime:[2014-10-01 TO 2014-10-10]</a></p> </li> </ul> <p>Note: Atom allows for some greater flexibility in the way the metadata payload is structured. It also allows for some flexibility is specifying the date range of items to be returned. While this flexibility has not yet been leveraged in this Atom feed, we expect to improve it in time. For now, you can treat it much as you would the RSS feed.</p> <h1>Exploring the SHARE NS with curl and underscore</h1> <p>This tutorial explores querying the previous version of the SHARE Notification Service API (NS) and the data it offers to its consumers.</p> <h2>Getting Set Up</h2> <p>This tutorial will use the <a href="http://www.gnu.org/software/bash/" rel="nofollow">bash</a> shell (though in a simple enough way that most any Unix shell should do the trick), <a href="http://curl.haxx.se" rel="nofollow">curl</a> which is available on most Unix and Mac systems, and <a href="https://github.com/ddopson/underscore-cli" rel="nofollow">underscore</a> which is not very common yet and requires that <a href="http://nodejs.org" rel="nofollow">Node.js</a> be installed. On most Macs this boils down to:</p> <ol> <li><a href="http://nodejs.org/#download" rel="nofollow">Download and install Node.js.</a></li> <li>In a terminal window do the following to install underscore:</li> </ol> <pre class="highlight"><code>npm install -g underscore-cli</code></pre> <p>Once you have done this, you should be able to try the following command (the "%" is the unix prompt in a terminal window, like the one provided by the Mac Terminal program):</p> <pre class="highlight"><code>% curl -s &quot;<a href="https://osf.io/api/v1/share/search/" rel="nofollow">https://osf.io/api/v1/share/search/</a>&quot; | underscore extract 'count' 331116</code></pre> <p>Of course, the number will be bigger since the count of records in the NS continually changes.</p> <h2>A Few Basics</h2> <p>The NS dataset is indexed in elasticsearch (ES) and natively ES likes talking via POST rather than GET. Also, to see the raw ES results we need to tweak our URL a bit.</p> <p>First the URL. We want to talk to raw ES, so we will add a bit to the end of the URL. Since that makes it long and awkward we will put this URL in a variable for future use. Since the NS uses the Open Science Framework to handle searchers, let's call this variable OSF.</p> <pre class="highlight"><code>% OSF='<a href="https://osf.io/api/v1/share/search/?raw=True'" rel="nofollow">https://osf.io/api/v1/share/search/?raw=True'</a></code></pre> <p>You can test this URL...</p> <pre class="highlight"><code>% curl -s $OSF | underscore extract 'took' 25</code></pre> <p>Just like the 'count' above, the 'took' number will change frequently, since it is a measure of how long the query took to execute on ES.</p> <p>To try a real ES query, though, we have to add POST to the mix. Since ES queries are written in JSON, it can be helpful to place the JSON code in a file that is easier to read and edit. Let's put this one in a file called 'sources.json':</p> <pre class="highlight"><code>{ &quot;size&quot; : 0, &quot;aggs&quot; : { &quot;sources&quot; : { &quot;terms&quot; : { &quot;field&quot; : &quot;source&quot;, &quot;size&quot; : 0 } } } }</code></pre> <p>Send it off to elasticsearch like this:</p> <pre class="highlight"><code>% curl -s $OSF -H &quot;Content-Type: application/json&quot; --data-binary @sources.json</code></pre> <p>ES should return a blob of JSON describing all the sources in the NS.</p> <p>Let's create a bash alias to make this easier to manage in future examples:</p> <pre class="highlight"><code>% alias OSFCURL='curl -s $OSF -H &quot;Content-Type: application/json&quot; --data-binary'</code></pre> <p>Now we can simplify the submission of an elasticsearch query to OSF as so:</p> <pre class="highlight"><code>% OSFCURL @sources.json</code></pre> <p>And if we want the response to be extra legible we can pipe the resulting JSON to underscore:</p> <pre class="highlight"><code>% OSFCURL @sources.json | underscore pretty</code></pre> <h2>Getting the most recent titles from a source</h2> <p>To get the most recent records from a particular source, you need to know the key for that source. See the section above to learn how to find the keys of all the sources in the NS. Once you have your key, include it in a JSON query like the example here, which we will call "stcloud.json":</p> <pre class="highlight"><code>{ &quot;size&quot; : 5, &quot;filter&quot; : { &quot;term&quot; : { &quot;source&quot; : &quot;stcloud&quot; } } }</code></pre> <p>This query combined with underscore will provide you with a list of recent titles from St. Cloud State:</p> <pre class="highlight"><code>% OSFCURL @stcloud.json | underscore select .title</code></pre> <h2>Get the most recent item added to the NS</h2> <p><em>Note: this does not actually return the most recent item, yet. We're working on that. This will be updated as soon as we have a way to ensure the item returned is the most recent addition to the NS.</em></p> <p>Put the query into "recent.json":</p> <pre class="highlight"><code>{ &quot;size&quot; : 1, &quot;filter&quot; : { &quot;match_all&quot; : { } } }</code></pre> <p>Use underscore to make the result more manageable:</p> <pre class="highlight"><code>% OSFCURL @recent.json | underscore pretty</code></pre> <p>Actually, this case is a touch more complicated than that. At the moment we have two types of records in the NS and you are probably interested in "event" records rather than "resource" records (which are only used to document collisions in the NS). To limit our result to the most recent non-resource record, replace "recent.json" with this:</p> <pre class="highlight"><code>{ &quot;size&quot; : 1, &quot;filter&quot; : { &quot;missing&quot; : { &quot;field&quot; : &quot;isResource&quot; } } }</code></pre> <p>And run the query again:</p> <pre class="highlight"><code>% OSFCURL @recent.json | underscore pretty</code></pre> <h2>Learn more</h2> <p>You can run any elasticsearch query against the Notification Service dataset with this technique. To learn more about the queries that are possible, consult the <a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html" rel="nofollow">elasticsearch query dsl documentation</a> and our own <a href="https://osf.io/wur56/wiki/Schema/" rel="nofollow">NS metadata schema</a>.</p>
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.