Data Analytic Plan (Before registration)

doi:None

Title	Authors

Files | Discussion Wiki | Discussion | Discussion

Home

The form of the testweek is still unknown so most scripts are not complete. We expect that the activity of BAT (the dependent variable) is a reliable predictor of the result of the questionnaire ECR and STRAQ-1. Thus, we conduct the following analyses: - A Bayesian Sequential Analyses (BSA): Bayes Factors are interesting in exploratory studies because they provide evidence and explore the likelihood for the null or the alternative hypothesis, which in this case have not been characterized yet [(Dienes, 2016)][1]. It is finished when BSA provides the substantial evidence for the null hypothesis or the alternative hypothesis and data collection stops - We run a Conditional Random Forest (CRF) with all variables included investigating which individual item(s) is (are) the best predictor(s) of BAT. Then, there are two possibilities: - If the sample required by the BSA to provide a result is small (say less than 125 ie. the half of the expected maximum sample size, see [Power Analysis][2]), we will try to replicate the results with another sample. We will complete the analysis with cross-validation by testing the predictors we had with the first sample to see if they generalize to the second sample. It evaluates the accuracy of the model predictions and their external validity. This would make up for the effect that when a sequential analysis ends early the resulting evidence is likely to be far from the true likelihood [(Schonbrodt & Wagenmakers, 2018)][3]. - if the sample required by the BSA is large (say more than 125), we may use a slit-half cross-validation approach with CRF. We split the dataset into two groups. Like in the first possibility, the first group is used to train and build the CRF model. Then we see if the model works on the second sample. We expect that the activity of BAT (the dependent variable) will be a reliable predictor of the result of the questionnaire ECR and STRAQ-1. ---------- **Preprocessing the data** We will check for statistical outliers according to [Judd et al, 2017][4] and [Leys et al, 2018][5]. They recommended to eliminating observations with a very large Cook's distance (conventionally two times larger than the next larger Cook's distance value) or with standardized deleted residuals (SDR) larger than 4. The script designed to preprocess data is not complete yet. Our method is similar to a previous study of the CORE lab you can find [here][6]. The raw data we have (see the (incomplete) [codebook][7]) undergo several preprocessing steps: - Predictor variables: data for each item are preprocessed depending on the scale of the questionnaire (see [Methods, Procedure][8]). Demographic qualitative elements are coded with dummy variables (see [codebook][9]). - Outcome variables: BAT activation variable is the elevation of temperature ΔBAT due to BAT in response to cooling. It is the difference between the temperature of BAT during acclimation and the temperature during cooling. This difference is normalized by the difference between the temperature of the sternal skin area during acclimation and the temperature during cooling (see [codebook][10]). - Control variables: qualitative variables are coded with dummies. All control variables are included as covariates in the regression for the BSA. Preprocessed data are regrouped in the file "preprocessed_data.xlsx". ---------- **Data analytic plan** *BSA* For our Bayesian Sequential analysis (sequential hypothesis testing with Bayes factor), we will focus on the relations between the predictor variables and the outcome variable, BAT activation. The BF we mention below refers to the BF evaluating (the null and the alternative hypothesis of the correlation between) BAT and the score of the ECR questionnaire. BSA relies on several steps (see [Schhnbrodt et al, 2015][11]): - Defining a priori a BF threshold that indicates when the decision that our results support one of the two hypotheses is taken. For our stopping rule, we choose BF>10 as a threshold for the H1 hypothesis and thus, BF<0,1 for the null hypothesis. - Choosing a prior distribution for the effect sizes under the H1 hypothesis. Here, we use the default setting of a non-selective prior for analyzing the Bayes factor of a regression: 0,5 ([source][12]). The pertinence of this choice will be evaluated with a sensitivity analysis (see below). - Running a minimal sample then compute the Bayes factor (see our [script][13]). In our case, we will start computing the BF after 20 participants. However, we know that we will require a lot more (see Power analysis). If the BF exceeds one of the thresholds, we stop sampling. If not, we add a new participant. - Reporting the posterior distribution of the effect-size estimate. - (Optional) Running a sensitivity analysis. It consists of using a different prior to see if the conclusion of the BF is invariant to reasonable changes of prior. It shows whether the inference is robust [(Schonbrodt & Wagenmakers, 2018)][14]. Source: Kevin Vezirian’s script (https://osf.io/ut28c/). Once the sampling stopped, the results of the BSA may provide us with one or several predictors of the BAT activation. Further analysis of the BF of each relation among our variables will give us a first approximation of the best predictors of BAT. *CRF* To have a more robust estimation of the best predictors for BAT activity among the item of the questionnaires, we use a supervised machine learning method known as conditional random forests (CRF). In supervised machine learning, the algorithm draws a pattern from the data derived from an outcome variable. The method relies on repeated sampling from a training dataset [(Breiman, 2001)][15]. Multiple trees are formed by assessing whether each variable influences the outcome variable (here, it's BAT activity). The trees (constitute votes on whether variables matter for the outcome variable or not) are then assembled into a forest. An assembled model summarizes all information from the forests. The outcome in the case of conditional random forests is a variable importance list. The importance list allows us to identify which are the best predictors of the outcome variable and which of the computed variables differ from random noise when predicting the variable of interest (see also [Wittmann et al, 2020][16]). Those results would allow us to form mediation hypotheses. We chose conditional random forests over regression to explore our data for several reasons. First, linear regression is a parametric approach that requires a priori predictions of relationships between variables and also requires hypotheses regarding potential non-linearities and interactions [(Grömping, 2012)][17]. As we were interested in exploring our data, we relied on a non-parametric type of machine learning, which relies on a flexible number of parameters and where the number of parameters can grow as the algorithm tests more data. Overall, they are well-suited for situations in which researchers have no specific a priori predictions such as in exploratory research. Further, random forests are less prone to overfitting in relatively small samples with multiple variables [(Grömping, 2012)][18]. Finally, random forests have a smaller chance for collinearity in case multiple predictors are relied on, such as in the current situation [(Matsuki, Kuperman, & Van Dyke, 2016)][19]. CRF consists of two steps: first, the model is fitted to our data for one variable (training) then we use the trained model to predict the dataset of one variable for 100 iterations. The importance and the R2 are averaged over the 100 models to evaluate the robustness of the model for that variable. If we found predictors that differ from random noise, we will compare the traditional regression lines to the locally estimated scatterplot smoothing (LOESS) curves by calculating as many local least squares regression functions as there are data segments. Based on these analyses, we then generated regression models for out-of-sample predictions. If the sample we had to recruit for the BSA is large (say more than 100), we may use a slit-half cross-validation approach with CRF. We split the dataset into two groups. The first group is used to fit the CRF model and then used to predict the dataset of the second group. If the sample we have is small, we will use the same dataset in both steps and try to replicate the results (see below). *(optional) Replicating the results* If a low sample size enables us to reach the threshold required by the BSA (BF10) and if we can still recruit participants, we will try to replicate the results. We will estimate the new sample size required to replicate the results and start a new BSA. If the second BSA provides evidence for the alternative hypothesis, we will use the same method for a CRF and see whether our results replicate. *Interpretation* The BSA will provide a first approximation of the effect size and the likelihood of the association between BAT activity and the ECR score. Then, the CRF can identify whether the scores of the ECR and the STRAQ-1 questionnaires (among others) are strong predictors of BAT activity. The analyses will potentially suggest a link between social behaviours and BAT which has never been explored before. Perhaps they will favour one of the two directional hypotheses. Overall, it will assess to what extent individual BAT thermogenesis account for social behaviours and social thermoregulation. ---------- **Software and scripts** All our analyses were conducted with the R programming language (R Core Team, 2020 https://www.r-project.org/). Some of our R scripts for planned analyses are attached to this OSF page. However, the form of the testweek is still unknown so most scripts are not completed. [1]: https://www.researchgate.net/publication/289569505_How_Bayes_factors_change_scientific_practice [2]: https://osf.io/tva7q/wiki/home/ [3]: https://www.researchgate.net/publication/314158724_Bayes_factor_design_analysis_Planning_for_compelling_evidence [4]: https://www.researchgate.net/publication/316552712_Data_Analysis_A_Model_Comparison_Approach_to_Regression_ANOVA_and_Beyond [5]: https://core.ac.uk/download/pdf/147045405.pdf [6]: https://osf.io/49vgb/ [7]: https://osf.io/f2dhp/ [8]: https://osf.io/43yqc/wiki/home/ [9]: https://osf.io/f2dhp/ [10]: https://osf.io/f2dhp/ [11]: https://www.researchgate.net/publication/314554498_Sequential_Hypothesis_Testing_with_Bayes_Factors_Efficiently_Testing_Mean_Differences [12]: https://richarddmorey.github.io/BayesFactor/#regression [13]: https://osf.io/hgz7r/ [14]: https://www.researchgate.net/publication/314158724_Bayes_factor_design_analysis_Planning_for_compelling_evidence [15]: https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf [16]: https://psyarxiv.com/35jtx/ [17]: https://www.researchgate.net/publication/279546803_Relative_Importance_for_Linear_Regression_in_R_The_Package_relaimpo [18]: https://www.researchgate.net/publication/279546803_Relative_Importance_for_Linear_Regression_in_R_The_Package_relaimpo [19]: https://www.researchgate.net/publication/287996404_The_Random_Forests_statistical_technique_An_examination_of_its_value_for_the_study_of_reading

Compare

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.

Create an Account Learn More Hide this message

Main content

Home

Menu

Start managing your projects on the OSF today.

Main content

Links to this project

Home

Menu

Add new wiki page

Page permissions have changed

Wiki page deleted

Connected to the collaborative wiki

Connecting to the collaborative wiki

Collaborative wiki is unavailable

Browser unsupported

Start managing your projects on the OSF today.