# **FMRI Open QC Project**
This project is an invitation for the FMRI community to share quality control steps broadly. We provide one task-based collection and one resting-state collection, so that participating groups can all discuss a consistent set of underlying data. We hope this serves as a useful educational resource for FMRI researchers.
**The very brief project summary:** all participants will QC the same rest and/or task FMRI data, and describe their QC criteria in detail, highlighting some representative examples for clarity. This will help compile a reference of QC strategies from across the research community. Please see below for a detailed description of the data, as well as the detailed format for describing procedures in "Processing and QC description" and "Items to include in submitted articles."
To participate in this project and submit an article, please visit the [Demonstrating Quality Control (QC) Procedures in fMRI](https://www.frontiersin.org/research-topics/33922/demonstrating-quality-control-qc-procedures-in-fmri) Research Topic webpage at Frontiers, and click the "Participate" button just below the title. The due date for submissions is ~~Sept. 16, 2022~~ **Oct. 14, 2022** (updated!). We ask that submissions are made just before this time in September, so that all articles undergo revision simultaneously.
Please contact the Topic Editors (same webpage) with any scientific questions, and the journal editors for any article/journal questions.
### **Overview**
Quality control (QC) has long been an important part of FMRI processing, but it is typically underreported and too often underappreciated, whether for small or large, public or local datasets. This project aims to showcase examples of QC practices across institutions and to foster discussions within the field. Here, we invite researchers and developers across the globe to describe their QC methods in detail and to show them "in action" for a varied dataset acquired across multiple sites and scanners.
Current imaging practices and acquisition details vary widely across the neuroimaging community, depending on study aims, available hardware and more. QC procedures then typically vary along with these different acquisitions (for example, different criteria are relevant to single- or multi-echo data), as well as by software used and by the people interpreting it. The current project focuses on what is likely the most common MRI acquisition protocol both historically and today: non-accelerated (single-band) FMRI. QC for single-band EPI datasets is in some ways the foundation for all others; it is something that all neuroimaging researchers will encounter either directly or indirectly through the literature, making it the ideal target for the current project. To provide a common set of examples with which to demonstrate different QC practices, we have gathered data from several open, public FMRI repositories: specifically, datasets which have already been used in many studies and can be considered fairly representative of MRI data forming the basis of studies in the field.
We invite researchers to present their quality control assessments of the subjects in the provided data collection, listing which would be included or excluded from further analyses, and which might be considered borderline or "uncertain." Both task and several resting state cohorts are included, and participating researchers can choose to examine either or both of these types of data.
The goals of this project are:
* For researchers to be detailed and didactic about their quality control methods;
* To present possible QC pipelines (with an emphasis on visualization and understanding) in the context of several "real world" data packages;
* To share ideas of quality control more broadly among all researchers.
We fully expect that no two groups perform QC the same way, and we note that there is no single "correct" set of QC steps nor one "correct" set of answers for categorizing subjects. We expect the results to show a diverse set of tools and ideas that can be applied generally to FMRI studies and to enrich the wider neuroscience community with useful ways to look at and understand their data. One main result of this project will be an assemblage of QC criteria from active researchers around the FMRI community, with detailed descriptions and examples.
-------------------------
### **Description of provided data**
**Task data.** There is one collection of task FMRI data, called fmri-open-qc-task, which has 30 subjects:
+ sub-000, sub-001, ... from data-site 0
The task FMRI tarball `fmri-open-qc-task.tar.gz` is available on this webpage under the Files section, and is about 1.5 GB.
The data-package contains the original events TSV file for each subject. Since these have a large number of columns and multiple interpretations, we also provide separate, simplified BIDS-formatted event files, which may be used instead (contained in: `simplified_task_fmri_timing.tgz`). Note: this project is focused on QC and not on modeling/interpretation, so participants may choose to ignore event timing information altogether. However, for those who utilize event information within their task FMRI QC, they have a choice of stimulus timing file to use. The simplified event files represent only a single possible interpretation of the original ones. Since this project does not include group analysis, potential heterogeneity of modeling is not problematic---just explain your procedures.
**Rest data.** There is one collection of resting state FMRI data, called fmri-open-qc-rest, which has subjects originating from 7 different sites, each with about 20 subjects (total N = 139). For each subject ID, the first number reflects the data-site from which it originally came:
+ sub-100, sub-101, ... from data-site 1
+ sub-200, sub-201, ... from data-site 2
+ sub-300, sub-301, ... from data-site 3
... etc. Note: for downloading more bite-sized (or byte-sized) parcels, the
rest data has been split into 7 separate tarballs: `fmri-open-qc-rest0?.tar.gz`. Each tarball is about 1-2GB. They all unpack into the same `fmri-open-qc-rest/` directory.
In both collections, each subject has one session consisting of one T1w anatomical volume and one or two EPI time series datasets, collected from a 3T MRI scanner. Each EPI dataset is single echo, without slice acceleration/multiband, and does not include B0 inhomogeneity correction.
The datasets in this project are subsamples of public data-packages from various sites around the world (ABIDE, ABIDE-II, Functional Connectome Project and OpenNeuro). For convenience of processing, subjects were given new subject IDs and a uniform directory structure: for each subject, a single session directory with one anatomical and one or two EPI volumes. No dataset was otherwise altered or manipulated.
**Download note:** The downloads are all available from this OSF website, under the Files section, as compressed tarballs. To unpack a tarball `EXAMPLE.tgz`, you can type `tar -xf EXAMPLE.tgz`, and go into the newly created `EXAMPLE/` directory that should be created (see above noted about how the `fmri*rest*tgz` tarballs unpacking into one `fmri-open-qc-rest/` directory.
-------------------------
### **QC (and processing) goals**
Participating researchers can choose to analyze either the task data collection, the rest data collection, or both. Researchers can choose to process and QC their data for volumetric, surface-wise, or grayordinate-based analysis.
For the purposes of the QC, assume that the data from each data-site will be analyzed as part of a whole brain study, in which the final EPI data would be aligned/warped/"normalized" to a standard template, specifically to the mni_icbm152_t1_tal_nlin_asym_09c.nii ([available here](https://www.bic.mni.mcgill.ca/ServicesAtlases/ICBM152NLin2009), specifically [from this package](http://www.bic.mni.mcgill.ca/~vfonov/icbm/2009/mni_icbm152_nlin_asym_09c_nifti.zip)). In this case "whole brain" means cortex, not cerebellum (many public datasets do not include a whole cerebellum), but including subcortical structures.
Perform whatever QC steps your group would normally use for such an analysis, which may utilize any software, visualization or (pre)processing steps: e.g., alignment, SNR, regression modeling. For each data-site, place each subject into one of the following categories:
+ **Include:** those who pass QC criteria, and whom you have high confidence to use in the hypothetical study;
+ **Exclude:** those who fail one or more QC criteria, and whom you have high confidence to remove;
+ **Uncertain:** those for whom there is a question about whether to include.
For each data-site, there may end up being zero, one or more subjects in any of the Include/Exclude/Uncertain categories---that depends on your QC and interpretation.
Researchers may choose their own QC and processing steps. They may select the final EPI voxel resolution, as well as processing steps such as blurring/smoothing, how to perform modeling (e.g., what regressors to include, whether/how to censor volumes), etc. Groups are asked to provide examples of their QC process, including visualization if applicable.
Below are detailed guidelines for each participating group to use when presenting their methods and results.
-------------------------
## **Items to include in submitted articles**
### **Descriptions in the Methods section**
**Data processing.** Include a detailed description of any processing steps performed on the datasets, in a step-by-step or "protocol" style. Processing can optionally include any of the following, as well as other steps: motion correction, despiking, alignment, smoothing, list of regressors if modeling (both nuisance and those of interest), bandpassing, etc. Specify details in each case (type of alignment; blur radius; etc.). The goal is to provide enough details to allow a reader to closely reproduce the processing.
If processing the resting state data, researchers can choose to process all subjects identically, or they might prefer to process each data-site's set of subjects separately (since the properties of each data-site's data are expected to differ; again, the subject IDs reflect their data-site).
**Resources.** Please clearly list any software (with version number) utilized in data processing and/or quality control assessment. If and where possible, please provide any processing-related scripts in a Supplement.
**QC criteria summary table(s).** Create a brief summary/reference table of your QC criteria for rest FMRI data, and similarly one for task FMRI (unless only analyzing on data-package). Include all criteria used, even if no dataset actually fails on the particular criterion here. Assign a letter to each item for reference (A, B, C, ..., AA, BB, ...), listing all quantitative criteria before qualitative criteria, for clarity. Consider categorizing via:
1. *quantitative*, thresholded, automated or other value-based items;
2. *qualitative*, visual or other assessment.
For example (using possibly exaggerated thresholds):
Table 1. Resting state FMRI QC criteria: Exclude a subject if:
A) censoring removes >80% of time points
B) maximum motion >25 km
C) part of cortex out of field of view
...
Table 2. Task-based FMRI QC criteria: Exclude a subject if:
Any criterion from Table 1 is met but with a censoring threshold of 75% in A).
D) censoring removes > 71% of a particular stimulus type
**QC criteria details.** Describe each item listed in the QC criteria table(s) in sufficient detail for others to apply the same criteria. The criteria may also be structured as a protocol. Write the descriptions in a didactic manner, as if explaining each item to a new research assistant. Please detail quantities used (e.g., TSNR can be defined in many ways, so please specify how it should be calculated to apply your criteria).
-------------------------
### **Descriptions in the Results section**
**Presenting QC examples.** For each data-site, provide interesting and/or representative examples of QC items for which subjects were categorized as Exclude or Uncertain, or representative data of subjects in these categories. In particular, display example images of visual/qualitative criteria which lead to exclusion or uncertainty (for example, an EPI truncated by the FOV or affected by severe ghosting, or artifactual patterns), contrasted where possible with an example of an Include subject's corresponding data. For cases of failing quantitative QC criteria, researchers may also find it useful to show visualizations of data from the Exclude or Uncertain subjects, e.g., correlation map of a subject failing motion censoring criterion, again contrasted with an Include subject; this can provide a posteriori evidence for why such subjects would/could be removed.
In this section, it would be useful to highlight a variety of QC methods and the kinds of issues they can detect, rather than showing repeated examples of the same issue (at least within the same data-site; similar QC issues may present differently from different data-sites, and therefore repetition might still be useful, due to the variability of detail). For example, if 4 subjects all have huge motion throughout the scan, showing just one of these as representative from a data-site is adequate, particularly if other subjects have other QC considerations; the aim is to show an interesting subset of cases identified by your QC procedure, rather than a comprehensive catalog of all decisions.
-------------------------
### **Descriptions in a Supplement**
**Exclusion/uncertain subject lists.** Make a table listing subjects who were categorized as Exclude or Uncertain, for the task and/or rest data separately. In each table, include a Comment column with one or more reasons for these evaluations, using the criteria listed in the corresponding Methods section Tables 1 and/or 2. Subjects may be excludable based on several QC criteria, but only one or two root/predominant criterion(a) need be included. For example:
Table S1. Excluded/uncertain subjects in resting state data-sites
SUBJ ID EXCLUDE UNCERTAIN COMMENT (failed QC criteria, etc.)
sub-937 X A (censor frac 0.95)
sub-956 X C (no visual cortex in FOV)
sub-999 X A, B (censor frac 0.81, max disp 30mm)
...
(and similarly for task data-site, if applicable).
-------------------------
### **Data citations to include**
Please cite the following public data collections as part of your write up, as these are the origins for the datasets used here (exact datasets and subject listings will be noted once the submissions are complete; see below in 'Facts and Questions').
* **Functional Connectome Project (FCP):** Biswal BB, Mennes M, Zuo XN, Gohel S, Kelly C, Smith SM, Beckmann CF, Adelstein JS, et al (2010). Toward discovery science of human brain function. Proc Natl Acad Sci USA 107(10):4734-9. doi: 10.1073/pnas.0911855107.
* **ABIDE:** Di Martino A, Yan CG, Li Q, Denio E, Castellanos FX, Alaerts K, Anderson JS, Assaf M, et al (2014). The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Mol Psychiatry. (6):659-67. doi: 10.1038/mp.2013.78.
* **OpenNeuro:** Markiewicz CJ, Gorgolewski KJ, Feingold F, Blair R, Halchenko YO, Miller E, Hardcastle N, Wexler J, Esteban O, Goncavles M, Jwa A, Poldrack R (2021). The OpenNeuro resource for sharing of neuroscience data. Elife 10:e71774. doi: 10.7554/eLife.71774.
-------------------------
-------------------------
-------------------------
# **Facts and Questions**
1. **When will we see the "correct" answer for these datasets?**<br/>
It is important to note that there is no single, correct way to do QC, nor are there absolute judgments for the subjects in these collections. Indeed, subjects with borderline or tricky data quality were intentionally left in these collections---because that is a reality of data analysis (we reiterate: none of the datasets were altered; these are real, publicly available datasets from widely used repositories). The organizers themselves use different QC protocols, and will likely vary in their QC criteria and subject exclusion tables. Again, the point of this project is to promote methodological clarity, not a particular protocol.
2. **What do we do about timing for the task-based data-package?**<br/>
The question of how to translate a BIDS TSV event file into specific event timing with duration can lead to many debates. Since this project does not include a group level analysis, we have decided to skip this point of possible contention. The original BIDS-format timing information is included with the data; but because it has many columns and multiple potential interpretations, we have also made a separate and simplified set of unambiguous TSV event files available for download, which some participants may prefer to use. The folder of simplified timings contains a description of how they were derived.
3. **Where did these datasets come from? And when will they be de-identified?**<br/>
Each data-package here came from a major, open repository containing FMRI and structural FMRI data. The datasets were selected to be representative acquisitions, from very common acquisition parameters (field strength, voxel size, TR, etc.). <br/>
The subjects have been de-identified temporarily so that the focus of the participants' analyses is just on the presented data: "what you see is all there is". When the project is complete and the participants have finished their studies, the origins of the datasets will be provided.<br/>
The task data-package is minimally de-identified. Throughout this project, we have aimed to avoid altering the original data; therefore, each subject's events file was unchanged, including column headers, even though these can be used to find the specific task dataset. Therefore, the identity of this task dataset is knowable (even from the event name).
4. **Why are there so many details about how to submit dataset Methods and Results?**<br/>
The aim of this project is to be educational about QC methods in FMRI. We ask for a great amount of detail to maximize that aspect for all readers. Some assumptions made by one group might not be made by another one---clarity is key. <br/>
Because so many different approaches are made for FMRI QC, having a fairly uniform reporting/description style will help readers understand and think about combining various methods.
-------------------------
-------------------------
-------------------------
# **Note on downloading+unpacking the data**
The data can be downloaded directly by going to the "Files" tab in this OSF webpage, and then clicking on each file. Each file is a compressed tarball, which can be uncompressed+unpacked with `tar -xf NAME.tgz`.
Alternatively (and maybe unnecessarily complicatedly, depending on your viewpoint), these datasets can be downloaded via the command line in at least one way:
* Using the ``osfclient``, whose installation with ``pip`` is described here: <br/> https://github.com/osfclient/osfclient#osfclient<br/>
or which can be added to a conda environment, as described here: <br/> https://anaconda.org/search?q=osfclient
The syntax ``osfclient`` program is described here:<br/>
https://osfclient.readthedocs.io/en/latest/cli-usage.html<br/>
namely:<br/>
```# fetch all files from a project and store them in `output_directory` ```<br/>
```$ osf -p <projectid> clone [output_directory]```
Once downloads are complete, use the same `tar -xf NAME.tgz` command to unpack each one.