## CAN-SAR: A Database of Canadian Species at Risk Information ![Overview of the CAN-SAR: A database of Canadian Species at Risk Information][1] ***CAN-SAR: A Database of Canadian Species at Risk Information*** is an initiative led by Dr. Ilona Naujokaitis-Lewis from Environment Climate Change Canada. The aim of this database is to provide open and accessible data reflecting information obtained from Canadian species at risk listing and recovery planning documents. Ongoing efforts include development of a living database that will facilitate contributions from other parties in an effort to increase efficiencies and decrease multiple (redundant) efforts with the broad over-arching goal of improving the conservation of species at risk. **NOTE:** The current version of CAN-SAR includes documents available as of **March 23, 2021** for species with SARA statuses Endangered, Threatened and Special Concern. For the authoritative source of current species at risk information please consult the [SARA Public Registry]( ### Contributing to the CAN-SAR database We intend to update the database as time and resources allow but we would also encourage anyone interested in extending or expanding on the CAN-SAR database to reach out to discuss a collaboration. We have not provided a tool for adding to the database automatically because although the data extraction process is well documented, some training and validation is needed to ensure that the interpretation of variables remains consistent. ### Project Files The CAN-SAR database consists of the following documents: 1. CAN-SAR_database.csv: current version of the database. Each row in the database represents a single document and species or designatable unit and columns include information extracted from status reports, management plans and recovery documents. File format: ".csv" 2. CAN-SAR_data_dictionary.xlxs: The data dictionary contains three spreadsheets: Database Fields that contains a description of each field in the database, how it was extracted and the format of the data; Action Types that contains a table with definitions of the action types and subtypes used, and; Threat Classes that contains a copy of version 1.1 of the IUCN threats classification system. File format: ".xlxs" 3. eml.xml: Machine readable metadata for the CAN-SAR database in the Ecological Metadata Language ### Data validation We created an automated process using continuous integration to test the internal consistency of the database. Whenever changes are made to the [GitHub repository]( that holds the development version of the data, a GitHub Action is triggered which runs an R script to test the internal consistency of the data. If errors are found in the data a notification is sent to the database maintainer allowing the error to be fixed. The data is checked for missing values in required fields, that data is of the correct type and has values within the expected range and that the data is internally consistent. In addition, an independent validation of the database using 10% of records has been conducted on data extracted before 2018 and we are in the process of validating data extracted in 2021. |Subset of database|Overall error rate|Error rate for threats|Error rate for other fields| |-----------------|-------------|-------------|-------------| |With threats calculator|3.1%|3.5%|1.5% |Without threats calculator|4.3%|11.8%|2.8% [1]: