# Start Here
To go directly to data files, click the "Files" link above, or go to the [Files page][1].
To view or use data, we typically recommend downloading the most recent available dataset. Data is organized by date.
Data sharing is in an early "beta"-like period, in which we welcome suggestions about better organization, different shared data, or any other ways to improve accessibility and useability of shared data. CCAB Normative data is shared under [CC-BY-NC-SA][2] license and cannot be used for commercial purposes. For questions, please send us an email [email][6].
### Folder Structure
Each batch of normative data is contained in a single folder organized by date. Normative data in each folder contains the following:
* Data for each task, with each task in a single file
* A matching data dictionary for each task, which contains explanations of individual column headers for that sheet only
* A "global" data dictionary which contains descriptions of individual conditions/measures that make up column headers
### File Naming
For convenience, each task in CCAB is assigned a unique, short "task id" used to identify it in column and file names. For example, the Auditory Screen has a task id of **AS** and data for that task will be found in a file name **AS_(date).xlsx**
In some cases, results from multiple tasks are compiled into a single table/file (for example, Trails A and Trails B results). These groupings also have a unique short name, e.g., **Trails_(date).xlsx**.
For a complete list of task abbreviations, see the [Tests][3] page.
### Sheet Structure
Normative data sheets are structured so that each row represents a unique subject/participant. In addition, each sheet contains the same subset of subject-specific demographic information prepended as the first ~50 columns. All subsequent columns contain task-specific results.
CCAB is a longitudinal design, meaning that participants perform most tests multiple times, at different test sessions. Columns are ordered left-to-right by time point, such that results from the first time the test was run by a given subject are the leftmost, and test results from the last time the test was run are rightmost columns.
Below the subject data is a set of rows containing summary data for each column (e.g., means, ranges, standard deviations, etc.)
### Test Sessions
CCAB contains multiple tests, and comprises multiple different studies and subject pools.
In general, we refer to a "test session" as the sequence of tests run by a single subject on a single day. Each of these test sessions is assigned a label, which defines both (a) what tests were run, or expected to be run, and (b) when in the longitudinal sequence the session occurred.
The following session labels are used:
* **E0**: enrollment session (day 1 of original 3-day test sequence)
* **E1**: enrollment session (day 2, repeating day 1, enrollment test-retest study only)
* **M1**: CCAB session 1 (day 2 of original 3-day test sequence)
* **M2**: CCAB session 2 (day 3 of original 3-day sequence)
* **M3**: First follow-up, at 6 months or 1 year
* **M4**: Second follow-up, at 1 year since previous follow-up
* **M5**: Third follow-up, at 1 year since last follow-up
* **R1**: Enrollment session for new 1-day only sequence
Because the different sessions and test orders within sessions can sometimes make cross-study comparison difficult, we instead label results in reference to the test *repetition* and *interval* since the first time that specific test was run. For example, *r1_d0* refers to test repetition 1 (r1), and d0 refers to 0 days since first test run (all r1_ labels will be assigned an interval of d0, and all r1_ labels refer to the first time a test was run for a given subject). The following repetition labels are used:
| Repetition Label | Description | Session(s) Associated |
| --- | --- | --- |
| r1_d0 | First test run | E0, M1, R1 |
| r2_d1 | Second test run, next day | E1, M2 |
| r3_m12 | Third test run, 12 months | M3 |
| r4_m24 | Fourth test run, 24 months | M4 |
| r5_m36 | Fifth test run, 36 months | M5 |
For more information about sessions and repetition labels, see [Test Sessions][4].
### Column Naming Conventions
Specific information about each column name can be found in the data dictionary for each sheet (see below).
Column names use the following convention: **[task id]\_[session/repetition id]\_[measure name]**
For example, a column name of *AS\_r1\_d0\_hit\_ct* can be parsed as task AS (Auditory Screen), repetition 1, interval 0 days, and measure hit\_ct (hit count).
Additional test repetitions are appended as additional columns, in sequential order. For example, repetition r2 measures appear to the left of repetition r3 measures for a given task.
### Data Dictionary
For each task results file, there is a paired data dictionary. The filename of the data dictionary is the same as the filename of the sheet, but with a **dict_** prefix. Data dictionaries contain one row for each column in the associated file, and have the following columns:
| Column | Description |
| -------- | -------- |
| *col_name* | Column name from data table
| *col_pretty_name* | Alternate, unabbreviated column name; can be used in graphs or charts
| *description* | Description of data in that column
| *task* | Associated unique task id, if any
| *session* | Associated session or interval id, if any
| *data_type* | string, float, int, bool...
| *data_range* | Sets possible range for values, if any. Note square brackets are inclusive, and parentheses are exclusive; [0,2) implies values >=0 and <2.
| *data_units* | Unit name, if any
| *notes* | Any additional information about the measure
### Test Sequences
A test sequence defines the number, order, and selection of tests run in a given test session. The current test orders for each session can be found on the [Test Orders][5] page.
### Data Collation
Test results are only listed in data tables for individual test runs in which the subject successfully completed the task. There are several reasons why data may be "missing" for a given subject and session, even if the given task should have been run by that subject in that study session:
* The subject may have started, but not completed, the task due to poor performance.
* The examiner may have manually skipped the task due to time constraints, or concerns about subject performance, or other reasons
* Technical errors may have prevented the task from completing
In the case where the subject successfully completed the task but certain individual measures are missing, it generally means that specific measure could not be computed. Most often, this occurs for measures based on questionnaire items, which cannot be scored if the subject does not respond to a given item.
[1]: https://osf.io/3kprb/files/
[2]: https://creativecommons.org/licenses/by-nc-sa/4.0/
[3]: https://osf.io/x8u5z/wiki/CCAB%20Tests%20Home/
[4]: https://osf.io/x8u5z/wiki/Test%20Sessionttps://osf.io/x8u5z/wiki/Tests/
[5]: https://osf.io/x8u5z/wiki/Test%20Orders/
[6]: mailto:osf_support@neurobs.com