# Data, Scripts, and Output for the use-case **BMI and Cortisol** This repository contains the information for the systematic review published by [Valk et al. (2021)][1]. The repository also contains the scripts and output for the simulation results re-enacting the screening phase using active learning in ASReview ([Van de Schoot et al., 2021][2]). ## Overview To increase the quality of a systematic review on Body Mass Index and cortisol, [Valk et al. (2021)][3] applied a 2-step screening. First, a search was performed identifying 996 records of which 93 were deemed as relevant. Then, a broader search was conducted identifying 3477 records. The 93 relevant records of the first search plus 93 irrelevant records were used as training data for the second search using ASReview. Then, 996 records were screened and the first 28 additionally found relevant records were inspected for full-text in detail and 3 additional papers appeared to be relevant. ## Search The search terms for the original search are available in the file 201116 Search query original search.docx and were used for the publication [Valk et al. (2021)][4]. A new search has been conducted with broader search terms as listed in the file 210111 Search query new search.docx. ## Raw data The data obtained in the first search are available in the file BMIandCortisol_original_data which includes (at least) the following columns: - *Included abstract* indicating which records have been included in the abstract inclusion phase (n_1=351); - *Included_fulltext* indicating which records have been included in the abstract inclusion phase (n_2=93); - *Title* containing the titles for the records; - *Abstract Note* containing the abstracts for the records. The second search resulted in an additional 3.477 records, see the file BMIandCortisol_newdata. A column titled *included* was added in the file titled BMIandCortisol_newdata_priorknowledge which containts prior knowledge. The 93 relevant records from BMIandCortisol_prior_inclusions.cs were added to the new data and labelled '1'. Also, 93 irrelevant records were selected, see BMIandCortisol_prior_exclusions.csv, added to the new data and labelled '0'. ## Data The raw data, BMIandCortisol_original_data.csv was split into two subsets  python scripts/split_data_with_multiple_labels.py BMIandCortisol_original_data.csv bmi_and_cortisol/data --split Included_abstract Included_fulltext  so that each subset contains only one column with labels denoting the relevant papers for the: 1. abstract inclusions: file name BMIandCortisol_original_data_Included_abstract.csv; 2. full text inclusions: file name BMIandCortisol_original_data_Included_fulltext.csv; A wordcloud of the words used in the abstracts of the 93 included papers was created using [asreview-wordcloud][5] (the script can be found in jobs.sh): ![enter image description here][6] ## Installation requirements For the installation requirements run the following code in the CLI:  pip install -r requirements.txt  ## Simulation The data files in the data-folder were used for running a simulation study. To run the simulation, run  sh jobs.sh  The results are stored in output/simulation. The simulation was conducted on the original data with 93 runs with each relevant record being a prior inclusion and 10 randomly chosen irrelevant records. In each run the same 10 irrelevant records have been used. To extract this information run  python scripts/get_prior_knowledge.py  The results are stored in output/tables. The dataset characteristics are obtained with  python scripts/merge_descriptives.py  and stored in output/tables. The metrics resulting from the simulation study per run, can be obtained with  python scripts/merge_metrics.py  and are stored in output/tables. ## Simulation Results The results are sumarized in this recall plot: ![enter][7] On average, after screening 56% of the records (n = 567), you would have found 95 % of all the relevant records (89 out of 93). If you would screen records in a random order, at this point you would have found 52 of the relevant records and finding 89 of the relevant records would take on average 972 records. In other words, the time that can be saved using active learning expressed as the percentage of records that do not have to be screened is 40% (sd=2.4), while still identifying 95% of the relevant records. This metric is also known as the Work Saved over random Sampling at 95% recall (WSS@95). Another way to interpret the results is the RRF metric. The RRF@10 is 39% (sd=1.76), meaning that after screening 10% of records, already 39% of the relevant records have been identified. The relevant record that was in row 44 of the dataset was the easiest to find over all trials (Title: *Hair Cortisol Concentrations in Overweight and Obese Children and Adolescents*). Discovering this record took screening 24 records on average, that is screening 2.35% of all records in the BMIandCortisol_original_data_Included_fulltext dataset. The record that was most difficult to find was in row 2 of the dataset, which was discovered after screening on average 702 records, 69.48% of all records in this dataset (title: *A double-edged sword: Relationship between full-range leadership behaviors and followers’ hair cortisol level*). ## ASReview Files The file BMIandCortisol_newdata_priorknowledge was used for screening in ASReview and contains (at least) the following colums: - title / abstract note of 3663 records, of which 186 labelled and 3477 unlabelled record; - included with 93 relevant and 93 irrelevant records. The file was uploaded in ASReview, the 186 labelled records were used as prior knowledge and the **996** records were screened via the default model settings in version 0.16 (see asreview-hairf-and-anthropometrics.asreview for the project files) of which **28** records were denoted as relevant, see the file asreview_result_asreview-hairf-and-anthropometrics.xlsx. The 996 was chosen to represent an equal amount of labour as was invested for the original search. Then, the first **28** additionally found relevant records were inspected in detail and 3 appeared to be relevant, see the file new_records_classified.xlsx. ## Contact For any questions or remarks, please send an email to m.mohseni@erasmusmc.nl . [1]: https://doi.org/10.1111/obr.13376 [2]: https://doi.org/10.1038/s42256-020-00287-7 [3]: https://doi.org/10.1111/obr.13376 [4]: https://doi.org/10.1111/obr.13376 [5]: https://zenodo.org/record/6625855#.Yu1bKnZBxPY://doi.org/10.1111/obr.13376 [6]: https://mfr.osf.io/export?url=https://osf.io/download/zj8d4/?direct=%26mode=render&format=2400x2400.jpeg [7]: https://mfr.osf.io/export?url=https://osf.io/download/2ufsv/?direct=%26mode=render&format=2400x2400.jpeg