Home

Menu

Loading wiki pages...

View
Wiki Version:
<p>See <strong><a href="https://osf.io/uz4wc/wiki/OntoSoft%20Metadata/" rel="nofollow">here</a></strong> for more detailed information about the software (same information contained in SoftwarMetadata.zip)</p> <p><strong>Downloads</strong>:</p> <ul> <li><em>dr-boulder-master.zip</em> - The Github repo at commit <a href="https://github.com/rchakra3/dr-boulder/commit/ee0dd20ba77d27bc0edeceeb268dd8c3e59d44f6" rel="nofollow">ee0dd20 (Feb 18, 2017)</a></li> <li><em>SoftwareMetadata.zip</em> - Detailed information about the software</li> </ul> <p><strong>Notes:</strong></p> <p>The original repository that was used to download data from FTP servers/crawled websites can be found here: <a href="https://github.com/rchakra3/dr-boulder" rel="nofollow">https://github.com/rchakra3/dr-boulder</a></p> <p>While there are a lot of tools available for the linux platform, a large percentage of the participants at the Boulder event were expected to bring Windows systems. I decided to write python scripts for 3 main reasons:</p> <ul> <li>The organizers would only need to explain a single method for pulling data</li> <li>It worked across platforms without a lot of work</li> <li>It would be easier to help with technical questions if there was only a single tool being used</li> </ul> <p>In terms of the tools themselves, there were 3 important pieces:</p> <ul> <li>Crawling through an FTP server and generating a list of files</li> <li>A script that took the list of FTP files as input and downloaded the files to a local folder with exactly the same folder structure as the server</li> <li>A third script that could be fed a list of files being served over HTTP</li> </ul> <p>Given more time, the goal was to generate crawlers for all the sites that were high priority and did not have an FTP server. However, given the time constraints, we only managed to write one for an EPA site. A description of that script can be found <a href="https://github.com/rchakra3/dr-boulder#domain-specific-scripts" rel="nofollow">https://github.com/rchakra3/dr-boulder#domain-specific-scripts</a></p>
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.