Main content

Files | Discussion Wiki | Discussion | Discussion
default Loading...

Course outline


Loading wiki pages...

Wiki Version:
**!! IMPORTANT NOTE: The dates and times of the course are for 2021, but the content in the course outline has not yet been updated** ## **Scientific Data Management in Ecology and Evolution** # #### **Instructors** - Sally Taylor (Research Librarian, UBC-Vancouver) - sally <dot> taylor <at> ubc <dot> ca - Raymond Ng (Professor, UBC-Vancouver) - rng <at> cs <dot> ubc <dot> ca - Diane Srivastava (Professor, UBC-Vancouver) - srivast <at> zoology <dot> ubc <dot> ca - Postdocs and TAs from the Living Data Project # #### **Course description** This course will develop best practices in data management in ecology and evolution research. We will use a combination of instruction, in-class activities and projects to guide students through all parts of the research data lifecycle, starting with the collection and storage of data, progressing through the organizing of data (database design, “tidy” data principles, data versioning), the cleaning of data (quality assessment, geospatial and taxonomic data standards), and ending with the sharing of data (metadata and documentation, and archiving and accessing data in digital repositories following the new FAIR principles). Each student will work progressively through the course on an individual data management plan for the data they will collect - or have already collected - for one of their research projects. As well, students will work in small groups on preparing an existing biological dataset for archiving using R scripts. This course, one of the first such courses in Canada specifically geared to ecology and evolution, will give students the tools for managing their own research data as well as rescuing previously collected data. # #### **Meeting Times** - 10:00am - 11:30am PDT (British Columbia) - 11:00am - 12:30pm CST (Saskatchewan) - 1:00pm - 2:30pm EDT (Ontario/Quebec) **Week 1**: Tues Sept. 07, Thurs Sept. 09 **Week 2**: Tues Sept. 14, Thurs Sept. 16 **Week 3**: Tues Sept. 21, Thurs Sept. 23 **Week 4**: Tues Sept. 28, Thurs Sept. 30 # #### **Pre-requisites** - Graduate student conducting thesis research specifically in Ecology or Evolutionary Biology - In order to create a Data Management Plan (major part of course grade), you are at a stage where you have an idea of what data you are *likely* to collect for part of your thesis, or you have already collected data for your thesis or a previous research project. - Introductory R programming experience (i.e. base R) # #### **Delivery format** - 8 sessions, 1.5 hours per session - each session generally includes a lecture and hands-on component, each varying in length among sessions # #### **Required materials** - Personal computer with videoconferencing ability - Internet access **NOTE**: We focus on the use of open-source and free software and tools to maximize accessibility # #### **Online Resources** All materials for this course will be available through the [OSF site for the Living Data Project]( Here you can find pre-reading materials, lecture slides and recordings, additional resources, and details on how to coordinate and communicate with your groups. # #### **Assessment** - **Data management plan** (individual) = 60% - Due date: December 10, 2020 (final version) - See: [assignment description and rubric]( - **Data cleaning and standards** (group) = 40% - Due date: December 1st, 2020 - See: [assignment description and rubric]( # #### **Workload** In general, students in each Living Data Project module should anticipate approximately 30-45 work hours, in line with the normal expectations for one credit of coursework. This will comprise the following activities: 12 hours formal instruction in a class setting, 4 hours of individual/small group mentoring, 5 hours of preparatory reading, up to 15 hours on required assignments. Assignments are completed progressively as the course proceeds to allow instructor feedback on drafts, with a final version of major projects due within a week of course completion # #### **Logistics** For help on course material, students are encouraged to contact their primary mentor as follows (but if you cannot reach then, feel free to contact another instructor): - **UBC-V and Regina*** - Ellen Bledsoe (ellen <dot> bledsoe <at> uregina <dot> ca) - **UdeM, SFU, Toronto, McGill*** - Joey Burant (jburant <at> uoguelph <dot> ca) - **UBC-O** - Mauro Sugawara (sugawara <at> zoology <dot> ubc <dot> ca) - **Carleton, Guelph, Lethbridge, Manitoba, UQAM** - Bruno Carturan (bruno <dot> carturan <at> ubc <dot> ca) **including students doing transfer credits via these institutions* Participation in this course requires adherence to the Living Data Project [Code of Conduct]( We have set-up a **course chat forum**and an **assignment submission portal** on OSF. This is the preferred platform, but the instructional team can also accommodate students who prefer not to use OSF. Instructions on how to set-up and use OSF for these purposes is found [here]( and we are happy to help you with any questions. **International students at UBC** are directed to read [this]( statement * * * ### **Approximate schedule of topics and activities (8 sessions)** * * * `This schedule was last updated on 30 November 2020` * * * #### **Session 1: Nov. 3rd** **Suggested pre-session readings:** **Note for students**: PDFs of suggested and required pre-readings are available in the ["Session materials - private" folder]( within the Course Communications component. - Roche, D.G., Kruuk, L.E., Lanfear, R., and Binning, S.A. (2015). Public data archiving in ecology and evolution: how well are we doing? *PLoS Biology*. [PDF]( - Williams, M., Bagwell, J., and Nahm Zozuz, M. (2017). Data management plans: the missing perspective. *Journal of Biomedical Informatics*. [PDF]( **In-session:** 1. **Introductions** (Diane Srivastava; 15 min) - See: [slides for introduction]( - Introduction to instructors, postdocs, and teaching assistants - Overview of the course and expectations around activities and assignments - Code of Conduct: respectful participation in the Living Data Project modules - Inform students of privacy requirements and precautions - Refer to web resources, including instructions for assignment submission etc. 2. **Brief introduction to OSF** (Ellen Bledsoe; 5 min) - See: [instructions for setting up OSF]( 3. **Lecture** (Sally Taylor; 25 min) - Introduction to management plans - See: [session 1 lecture slides]( - [RDM institutional contacts][1] 4. **Activities** (40-45 min) - Create [DMP Assistant]( account - Begin working on Plan Details, Data Collection, and Storage & Back-up sections of the DMP - Group discussion (10 min) **Homework:** - Complete and submit the "Plan Details", "Data Collection", and "Storage & Back-up" sections of your DMP -- see [DMP assignment description]( - **Due**: before the beginning of next session (Thursday, November 5th) - Please also be sure to set up your OSF submissions component! * * * #### **Session 2: Nov. 5th** **Suggested pre-session readings:** - Borer, E.T., Seabloom, E.W., Jones, M.B., and Schildhauer, M. (2009). Some simple guidelines for effective data management. *ESA Bulletin*. [PDF]( **In-session:** 1. **Q & A / Admin** (5 min) - Address questions 2. **Lecture/Workshop** (Raymong Ng; 25 min) - Designing databases (part 1) - entity-relationship models - See: [session 2 lecture slides]( 3. **Activities** (50 min) - Develop entity-relationship models for example datasets - See: [session 2 activity](; [BWG database relationship-entity diagram]( **Follow-up materials:** - Looking for more information on structuring databases with entity-relationship diagrams? Check out [this youtube video -- ERD Part 1]( * * * #### **Session 3: Nov. 17th** **Required pre-session activities:** - Complete this [self-directed exploration of the BWG database structure]( (includes R script and the Bromeliad data) **In-session:** 1. **Q & A / Admin** (5 min) - Address questions 2. **Lecture** (Raymong Ng; 25 min) - Designing databases (part 2) - choosing relations and constraints - See: [session 3 lecture slides]( 3. **Activities** (50 min) - Talk through cardinality relationships, participation constraints, and primary/foreign keys in the BWG database - See: [session 3 activity]( **Follow-up materials:** - Looking for more information on primary and foreign keys? Check out [this youtube video -- ERD Part 2]( **Homework:** - Design a database for the data and submit it as part of the "Data Collection" section of your Data Management Plan (DMP) - **Due**: before 5pm Monday, November 23rd * * * #### **Session 4: Nov. 19th** **Required pre-session activities:** - Complete this [self-directed tidyverse tutorial]( (includes R script and the Bromeliad data) **Suggested pre-session readings:** - Feel free to check out the ["cheat sheets"]( that have been uploaded in the folder for this session's materials. **In-session:** 1. **Q & A / Admin** (5 min) - Address questions 2. **Lecture** (Diane Srivastava; 25 min) - Working with data: script-based approaches to data - Principles of tidy data - See: [session 4 lecture slides]( 3. **Activities** (50 min) - Break-out groups (~4 people grouped according to R skill level): work on one of the questions in the activity (of increasing difficulty) - Class discussion (10 min) - See: [session 4 activity]( - See: [session 4 activity solutions]( **Follow-up materials:** - Here is a short introduction to [tidyr, dplyr, and the pipe %>%](, which was provided in the Synthesis Stats module. - An excellent tutorial on [10 up-to-date ways to do common data tasks in R]( * * * #### **Session 5: Nov. 24th** **Required pre-session activities:** - Please run this [R code](, so that you are familiar with the functions that we will mention in class. Note that you will first need to install a few packages: `assertr`, `stringdist`, `GGally`, and `palmerpenguins`. **Optional pre-session readings:** - Broman, K.W., and Woo, K.H. (2018). Data organization in spreadsheets. *American Statistician* [PDF]( - de Jonge, E., and van der Loo, M. (2013). An introduction to data cleaning with R. *Statistics Netherlands*. [PDF]( **In-session:** 1. **Q & A / Admin** (5 min) - Address questions 2. **Lecture** (Diane Srivastava; 25 min) - Data cleaning and quality control - identifying outliers/typos/etc. with functions and algorithms - data standards for time, species, space - See: [session 5 lecture slides]( 3. **Activities** (50 min) - Break-out groups (~4 people): data cleaning exercise - Choose a [data cleaning question]( to work on in your group - for reference on cleaning functions, see: - [session 5 pre-tutorial code]( - [assertr vignette]( - [lubridate cheatsheet]( - [stringr cheatsheet]( **Homework:** - Complete and submit the group data cleaning assignment - One person from each group should upload the completed assignment (R script, cleaned data, and description) to their OSF personal submissions component. - **Due**: Tuesday, December 1st by 5:00 pm PST * * * #### **Session 6: Nov. 26th** **Suggested pre-session activity:** - Register for an [ORCID iD]( **Optional pre-session reading:** - Fegraus, E.H., Andelman, S., Jones, M.B., and Schildhauser, M. (2005). Maximizing the value of ecological data with structured metadata: an introduction to Ecological Metadata Language (EML) and principles for metadata creation. *Bulletin of the Ecological Society of America*. [PDF]( **In-session:** 1. **Q & A / Admin** (5 min) - Address questions 2. **Lecture** (Sally Taylor; 25 min) - Describing datasets (documentation and metadata) - Metadata standards in E&E (e.g., DataCite, EML, Darwin Core) - See: [session 6 lecture slides]( 3. **Activity** (50 min) - Break-out groups (~3 people): submit a sample dataset to the sandbox KNB repository following EML standard - See: [session 6 activity](; `palmerpenguins` [dataset]( **Homework:** - Complete and submit your group project on data organization and cleaning - **Due**: before the beginning of the next session (Tuesday, December 1st) * * * #### **Session 7: Dec. 1st** **Suggested Pre-session readings:** - Carroll, M. (2015). Sharing research data and intellectual property law: a primer. *PLoS ONE*. [PDF]( **In-session:** 1. **Q & A / Admin** (5 min) - Address questions 2. **Lecture** (Sally Taylor; 25 min) - FAIR principles for scientific data management - data sharing, archiving, licencing - accessing and citing publicly-available data - See: [session 7 lecture slides]( 3. **Activities** (50 min) - Identifying suitable repositories for depositing and finding datasets - See: [session 7 activity](; [activity Google sheet]( **Homework:** - Complete and submit the "Documentation and Metadata", "Data Preservation", and "Data Sharing and Reuse" sections of your DMP. - Due: before the beginning of the next session (Thursday, December 3rd) * * * #### **Session 8: Dec. 3rd** **Required pre-session readings:** - Roche, Dominique G., et al. "Public data archiving in ecology and evolution: how well are we doing?." *PLoS Biology*. [Open Access]( In addition, please skim one of these two articles: - Vines, T.H., Albert, A.Y., Andrew, R.L., Débarre, F., Bock, D.G., Franklin, M.T., Gilbert, K.J., Moore, J.S., Renaut, S., and Rennison, D.J. (2014). The availability of research data declines rapidly with article age. *Current Biology*. [PDF]( - Soeharjono, Sandrine, and Dominique Roche. 2020. “Individual Costs and Benefits of Sharing Open Data in Ecology and Evolution.” *EcoEvoRxiv*. [Pre-print]( **In-session:** 1. **Q & A / Admin** (5 min) - Address questions 2. **Presentation and class discussion** (Dominique Roche; 30 min) - current quality of archived data - why people don't archive 3. **Lecture** (Diane Srivastava 30 min) - Loss of data in ecology, evolution, environment - Cost of losing data - Data rescue examples - See: [session 8 lecture slides]( 3. **Course wrap-up** (25 min) - Q& A on individual Data management plans - additional discussion of course material **Follow-up materials:** - [The state of open data 2020](( ). *Digital Science*. **Homework:** - Complete "Responsibilities and Resources", and "Ethics and Legal Compliance" sections of your DMP. - Submit the final version of your DMP - **Due**: Thursday, December 10th. * * * [1]: