Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
Welcome to the [NYU Health Sciences Library's Data Catalog project][1]. Our aim is to encourage the sharing and reuse of research data among insitutions and individuals by providing a simple yet powerful search platform to expose existing datasets to the researchers who can use it. There is a basic backend interface for administrators to manage the metadata which describes these datasets. Here is the documentation (also available on [GitHub][2]) **Components** The Data Catalog runs on Symfony2, a popular PHP application framework. Installation and management of this package is best performed by a PHP developer familiar with this framework. The search functionality is powered by Solr, which will need to be running and accessible by the server hosting the website. A sample Solr schema is included with this package. The Solr index can be updated regularly by setting up a cron job which calls an update script. A sample update script is also included with this package. The metadata and some information about users is stored in a database. We used MySQL and there's a good chance you will too. IMPORTANT NOTE: This package comes with a very basic form of authentication that should only be used in a local development environment. There are methods in place to use your institution's LDAP server, or you can use Symfony's built-in user management. Please read app/config/common/security.yml for more info. **Installation** This repository is essentially a Symfony2 distribution (i.e. it is not simply a Symfony "bundle"). As such, you should be able to install this site fairly easily, after configuring it for your environment. 1. Install Composer, Solr, and set up a suitable database software such as MySQL. Create an empty database schema for this application. 2. Clone this repository into a directory your web server can serve. git clone https://github.com/nyuhsl/datacatalog.git 3. Run `composer install` to install any dependencies 4. Read `app/config/parameters.yml.example`. Fill in the information about your MySQL server, and the URL where your Solr installation lives. You'll need a version of this in `app/config/dev` and `app/config/prod`. Remember to choose a "secret" according to the documentation [here][3]. There is also a README file in `app/config` with more information. 5. [Configure your web server][4] to work with Symfony. NOTE: You will want to require HTTPS connections on the login and administrative pages (at least), so remember to set up an SSL certificate for your server when you move the site to production. There is code in app/config/common/security.yml that will tell Symfony to require HTTPS connections. 6. In the root of your Symfony installation, run `php app/console doctrine:schema:update --force`. If you have configured your database correctly, this will set up your database to match the data model used in this application. If you haven't configured it correctly, this will let you know. 7. Copy the example Solr schema ("SolrSchemaSample.xml") from the root site directory to your Solr installation's configuration directory. Copy the default "schema.xml" that came with Solr into "schema.xml.default", and rename this one "schema.xml". Perform any customizations you require, or leave as is. 8. At this point, the site should function, but you won't see any search results because there is nothing in the database, and thus nothing to be indexed by Solr. Click on the "Admin" tab, click "Add a New Dataset" in the sidebar menu, and get going! 9. Once you've added some test data, you'll want to index it in Solr. Navigate to your site's base directory and edit the file "SolrIndexer.py" to specify the URL of your Solr server where indicated. Then, run the script. **Follow-up Tasks** 10. You'll most likely want to regularly re-index Solr to account for datasets you add or edit using the Admin section. There is a script in the root directory called "SolrUpdater.py" which can update a Solr index. You'll probably want to call this script or something similar with a cron job every Sunday or every night or whatever seems appropriate, depending on much updating you do. I recommend weekly, since you can also run this script on-demand from the command line if you want. 11. You'll most likely want to brand the site with your institution's logo or color scheme. Some placeholders have been left in `app/Resources/views/base.html.twig` that should get you started. 12. You'll most likely want to have some datasets to search. Get to it!! **Licensing** All files in this repository that are NOT components of the main Symfony distribution are Copyright 2016 NYU Health Sciences Library. This application is distributed under the GNU General Public License v3.0. For more information see the LICENSE file included in this repository. **Bibliography / Presentations** Lamb I, Larson C. Shining a light on scientific data: Building a data catalog to foster data sharing and reuse. Code4Lib. 2016;32. Available from: [http://journal.code4lib.org/articles/11421][5] Read K, Surkis A. Building a data catalog: Promoting data reuse and collaboration at an academic medical center. Presented at BioCADDIE Webinar. UCSD: 2014 Nov 13; San Diego, CA. Available from: [https://biocaddie.org/events/webinars/building-data-catalog-promoting-data-reuse-and-collaboration-academic-medical-center][6] Read K, Surkis A, Lamb I, Athens J, Chin S, Xu J, Rambo N. Promoting data reuse and collaboration at an academic medical center. IJDC. 2015;10(1): 260-267. Available from: [http://www.ijdc.net/index.php/ijdc/article/view/366][7] Surkis A, Read K, Lamb I, Athens J, Nicholson J, Chin S, Xu J, Hanson K, Larson C. Building a data catalog: Promoting data reuse and collaboration at an academic medical center. JMLA Virtual Projects;103(4): 222-231. [doi:10.3163/1536-5050.103.4.015][8] [1]: https://datacatalog.med.nyu.edu/ [2]: https://github.com/nyuhsl/data-catalog [3]: http://symfony.com/doc/current/reference/configuration/framework.html#secret [4]: http://symfony.com/doc/current/cookbook/configuration/web_server_configuration.html [5]: http://journal.code4lib.org/articles/11421 [6]: https://biocaddie.org/events/webinars/building-data-catalog-promoting-data-reuse-and-collaboration-academic-medical-center [7]: http://www.ijdc.net/index.php/ijdc/article/view/366 [8]: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4613389/
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.