See **[here][2]** for more detailed information about the software (same information contained in SoftwarMetadata.zip)
**Downloads**:
- *dr-boulder-master.zip* - The Github repo at commit [ee0dd20 (Feb 18, 2017)][1]
- *SoftwareMetadata.zip* - Detailed information about the software
**Notes:**
The original repository that was used to download data from FTP servers/crawled websites can be found here: <https://github.com/rchakra3/dr-boulder>
While there are a lot of tools available for the linux platform, a large percentage of the participants at the Boulder event were expected to bring Windows systems. I decided to write python scripts for 3 main reasons:
- The organizers would only need to explain a single method for pulling data
- It worked across platforms without a lot of work
- It would be easier to help with technical questions if there was only a single tool being used
In terms of the tools themselves, there were 3 important pieces:
- Crawling through an FTP server and generating a list of files
- A script that took the list of FTP files as input and downloaded the files to a local folder with exactly the same folder structure as the server
- A third script that could be fed a list of files being served over HTTP
Given more time, the goal was to generate crawlers for all the sites that were high priority and did not have an FTP server. However, given the time constraints, we only managed to write one for an EPA site. A description of that script can be found https://github.com/rchakra3/dr-boulder#domain-specific-scripts
[1]: https://github.com/rchakra3/dr-boulder/commit/ee0dd20ba77d27bc0edeceeb268dd8c3e59d44f6
[2]: https://osf.io/uz4wc/wiki/OntoSoft%20Metadata/