Secondary Data Preregistration

doi:None

Title	Authors

Files | Discussion Wiki | Discussion | Discussion

Preregistration Template

@[toc] ## Study Information ## **Title** *Description*: Provide the working title of your study. It is helpful if this is the same title that you submit for publication of your final manuscript, but it is not a requirement. Example: Effect of sugar on brownie tastiness. **Question title: Authors** *Example*: Jimmy Stewart, Ava Gardner, Bob Hope, Greta Garbo **Question title: Research Questions** Research Question 1 A1: Hypothesis 1 (related to RQ1) A1.1: Statistical test of hypothesis 1 A1.2: Statistical test of hypothesis 1 A2: Hypothesis 2 (related to RQ1) A2.1: Statistical test of hypothesis 2 B. Research Question 2 B1: Hypothesis 3 (Related to RQ2) B1.1: Statistical test of hypothesis 3 Description: Please list each research question included in this study. Include only one question in each box. If you need to add additional questions, use the “Add Another Research Question” button. Example: Though there is strong evidence to suggest that sugar affects taste preferences, the effect has never been demonstrated in brownies. Therefore, we will measure taste preference for four different levels of sugar concentration in a standard brownie recipe to determine if the effect exists in this pastry. **Question title: Hypotheses and/or Estimates** Description: For each of the research questions listed in the previous section, provide one or multiple specific and testable hypotheses, or one or more specific estimates related to those research questions. If doing hypothesis testing, please state if the hypotheses are directional or non-directional. If directional, state the direction. A predicted effect is also appropriate here. ## Data description ## **Name or brief description of data set(s):** **Is this data open or publically available? [Yes/No] [Required]** **How can the data be accessed?** Provide link if available online: **Date of download or access:** **Question title: Data Source** Description: Please describe what entity originally collected this data. Question type: Check all that apply Response options: - National Data Set - a nationally representative sample collected by another team of researchers - Private Organizational Data - Internally collected data by an organization made available for academic purposes - Own Lab Collection - Data were connected by one of the analysts’ lab - Other Lab Collection - Data were collected by another researcher’s lab (analysts were not involved in data collection) - Meta-Analysis - A systematic review of published studies. - Multi-lab collaboration - Data were collected at several sites using the same procedure. - Other - please explain **Question title: Codebook** Description: Some studies (usually publically available) offer codebooks to describe their data. If such a codebook is available, please link to it here or upload the document Question type: Open-ended **Question title: Sampling and data collection procedures** If the data collection procedure is already well documented, please provide a link to the information If the the data collection procedure is not yet well documented, please describe, to the best of your ability, how data were collected. What populations were sampled from, what were the recruitment efforts, what was the procedure for running participants through the study, were researchers blind to the research question, hypotheses or conditions, was randomization of any kind used, etc? Question type: Open-ended Example: Participants will be recruited through advertisements at local pastry shops. Participants will be paid $10 for agreeing to participate (raised to $30 if our sample size is not reached within 15 days of beginning recruitment). Participants must be at least 18 years old and be able to eat the ingredients of the pastries. ## Knowledge of data ## **Question Title: Prior work based on the dataset** Option: Never worked with this dataset Description: List any publications, conference presentations (papers, posters), or working papers (in-prep, unpublished, preprints) based on this data set that you have worked on. Describe what information, down to the level of the variable, that you have previously analyzed. If this dataset is longitudinal, include here information about what wave of data was previously analyzed. You don’t not have to describe the results, simply indicate which aspects of the data you have analyzed. [Open-ended textbox here.] **Question Title: Prior Research Activity** Select one of the following items: - I have never analysed these data before - I have used this dataset before, but I am using variables and measures that I have never analyzed. - I have used this dataset before, including at least some of the variables and measures in this study. However, all analyses were on a mutually exclusive subset (e.g., different participants or different waves) of the dataset. - I have used all of these variables before on a different and mutually exclusive subset of these data. - I have used some of these variables before on this (sub)set of these data - I have used all of these variables before on this (sub)set of data **Question title: Prior Knowledge current dataset** Option: No prior knowledge Description: What amounts of prior knowledge do you already have for the specific data set you will be working with? For example, are you aware of descriptive statistics or covariation between variables from previously published research or codebooks. Please provide information about your first-hand knowledge of the data set or your familiarity with existing publications that use this data set. Example A: There are a number of variables that overlap between this current question and previous work. Specifically, I have read about the use of variables W, X, Y, and Z (and a number of controls) in the publications listed below. However, to my knowledge, variables D, E, and F have not been used previously. I have seen descriptives statistics of the variables used in this study. **Question title: Moment of preregistration** Description: Preregistration is designed to make clear the distinction between confirmatory tests, specified prior to seeing the data, and exploratory analyses conducted after observing the data. Therefore, creating a research plan in which existing data will be used presents unique challenges. It is very important that you specify the exact moment of the preregistration. Please select the description that best describes your situation. For example, if you will be using longitudinal data in a national panel study that has not been collected, select “Registration prior to creation of data.” If you are using data that will be provided by a private organization, and people there have already looked at the data, but your team has not accessed it, pick “Registration prior to accessing the data.” Please do not hesitate to contact us if you have questions about how to answer this question (prereg@cos.io). Question type: Choose one: Response options: Registration prior to creation of data Registration prior to any human observation of the data Registration prior to any researcher on this team accessing the data Registration prior to any researcher on this team handling or analysis of the data Registration after data cleaning, but before any main analyses Registration following analysis of the data ## Current study: Variables ## **Question title: Manipulated variables** Description: Identify the manipulated variables you plan to use. Describe these variables and the levels or treatment arms of each variable. For observational studies and meta-analyses, simply state that this is not applicable. If you are collapsing groups across variables, this should be explicitly stated and the formula provided. Question type: Open-ended Example: The percentage of sugar by mass added to brownies was manipulated in the original collection of data. The four levels of this categorical variable are: 15%, 20%, 25%, or 40% cane sugar by mass. This variable is not included in the present analyses. **Question title: Measured variables** Description: Describe each variable that was measured. This will include outcome measures, as well as any predictors or covariates that were measured. You do not need to include any variables that are included in the dataset if they are not going to be included in the confirmatory analyses of this study. Question type: Open-ended Example: The single outcome variable will be the perceived tastiness of the single brownie each participant will eat. We will measure this by asking participants ‘How much did you enjoy eating the brownie’ (on a scale of 1-7, 1 being ‘not at all’, 7 being ‘a great deal’) and ‘How good did the brownie taste’ (on a scale of 1-7, 1 being ‘very bad’, 7 being ‘very good’). **Question title: Scales** Description: A scale is a measure of a construct that includes at least two items. These items are then aggregated into a smaller set of scores which are then incorporated into statistical models. If you are using a scale, what construct does this scale represent? (Describe a single scale here. You will have the opportunity to add additional scales if needed.) Question type: Open-ended Description: Please indicate which variables in the data set will be used to create this scale. Add a new line for each scale you will create or use. Type: Open-ended Description: How will the variables will be aggregated? Type: Drop-down Options: Mean score Sum score Weighted mean or weighted sum (provide more detail about how weights will be determined below) Exploratory Factor Analysis (provide more detail below, e.g., rotation, how number of factors will be determined, how best fit will be selected) Structural Equation Modeling/CFA (provide more detail below, e.g., how loadings will be specified, how fit will be assessed, which residuals variance terms will be correlated) Other (provide more detail) [also have open-ended text box for more detail] Description: Is this aggregation is based on recommendations from the study codebook or validation research? Question type: Select one: Yes, No (if yes, provide detail in text box) **Question title: Index/Indice**s Description: An index is an indicator of a value or quantity. For example, an exam score is an indicator of a student’s understanding of course material. If you are using or creating an index, please indicate which variables in the data set will be used to create this index, how those variables exactly will be aggregated, what you believe this variable or aggregation represents, and whether this aggregation is based on recommendations from the study codebook or validation research. Question type: Open-ended Example: We will take the mean of the two questions above to create a single measure of ‘brownie enjoyment.’ **Question title: Transformations** Description: If you plan on transforming, centering, recoding the data, or will require a coding scheme for categorical variables, please describe that process. Question type: Dropdown menu Options: Grand mean centering Group mean centering Standardizing Log (log e) Log (base 10) Square-root transform 1/x Winsorizing Other (text box appears) Example: The “Effect of sugar on brownie tastiness” does not require any additional transformations. However, if it were using a regression analysis and each level of sweet had been categorically described (e.g. not sweet, somewhat sweet, sweet, and very sweet), ‘sweet’ could be dummy coded with ‘not sweet’ as the reference category. **Question title: Data Inclusion/exclusion** Description: Which units of analysis will be included or excluded in your study? Please consider not only participants, but cohorts, waves, even data sets. Example: All participants who have complete data at Wave 1 will be included. We will use data from waves 1, 3 and 5. **Question title: Outliers** Description: How will you define what an outlier is your data, and what will you do when you encounter them? Example: An outlier is any data point outside the interquartile is considered an outlier. If these are encountered, I will use bootstrapping to estimate my parameters. **Question title: Weights** Description: Are there sampling weights available with this data set? If so, are you using them and how? Question type: Open-ended Example: This data set provides sampling weights to better mirror the national population. We will use these to weight responses in our regression model. **Question title: Sample size** Description: What is the sample size (to the best of your knowledge)? For each of your research questions, what is the smallest effect size you will consider meaningful? How much power will you have to detect each of those effects? Question type: Open-ended Example: The existing data set has 1,500 participants. Based on our inclusion criteria, we expect to use 1,200 participants. The smallest effect size that we will consider meaningful is r = .10. We have 80% power to detect this effect. **Question title: Missing data** Description: What do you know about missing data in the data set already (e.g., overall missingness rate, information about differential dropout)? How will you deal with incomplete or missing data? Example: If a subject does not complete any of the three indices of tastiness, that subject will not be included in the analysis. ## Current study: Analyses ## **Question title: Statistical models** Required: Yes Description: For each hypotheses, describe the statistical model will you use to test the hypothesis. Please include the type of model (e.g. ANOVA, multiple regression, SEM, etc) and the specification of the model (this includes each variable that will be included as predictors, outcomes, or covariates). Please specify any interactions and post-hoc analyses that will be tested and remember that any test not included here must be noted as an exploratory test in your final article. Example: We will use a one-way between subjects ANOVA to analyze our results. The manipulated, categorical independent variable is 'sugar' whereas the dependent variable is our taste index. **Question title: Follow-up analyses** Description: If not specified previously, will you be conducting any confirmatory analyses to follow-up on effects in your statistical model, such as subgroup analyses, pairwise or complex contrasts, or follow-up tests from interactions? Remember that any analyses not specified in this research plan must be noted as exploratory. Example: If the ANOVA indicates that the mean taste perceptions are significantly different (p<.05), then we will use a Tukey-Kramer HSD test to conduct all possible pairwise comparison. **Question title: Inference criteria** Description: What criteria will you use to make inferences? Please describe the information you’ll use (e.g. specify the p-values, effect sizes, confidence intervals, Bayes factors, specific model fit indices), as well as cut-off criterion, where appropriate. Will you be using one or two tailed tests for each of your analyses? If you are comparing multiple conditions or testing multiple hypotheses, will you account for this? Examples: We will use the standard p<.05 criteria for determining if the ANOVA and the post hoc test suggest that the results are significantly different from those expected if the null hypothesis were correct. The post-hoc Tukey-Kramer test adjusts for multiple comparisons **Question title: Sensitivity Analyses** Description: Provide a series of decisions about evaluating the strength, reliability, or robustness of your focal hypothesis test. This may include additional control variables, cross-validation efforts (out-of-sample replication, spilt/hold-out sample), any machine learning application/analyses, applying weights, selectively applying constraints in an SEM context (e.g., comparing model fit statistics), any alpha or multiple comparison adjustments, overfitting adjustment techniques used (e.g., regularization approaches such as ridge regression), or some other simulation/sampling/bootstrapping. Question type: Open-ended box Example A: In interrogating the association between X and Y, we will submit the data to a specific type of sensitivity analyses. Specifically, we will take two random halves of the broader sample. We will conduct our primary analyses in the first sample and reproduce our efforts in the second sample; both will be reported in the final manuscript. Example B: In interrogating the association between X and Y, we ran two models. The first model tested our focal hypothesis of interest without the covariates; the second model tested the same hypothesis with covariates. Both associations will be reported. See variable/design question for full list of covariates. (Likewise, we tested two sets of covariates--some standard and some substantial ones that might explain our focal hypothesis to serve as an alternative explanation). We will use ridge regression to regularize all our model coefficients. We will use leave-one-person-out cross validation to measure root-mean-squared predictive error of taste preference indices. **Question title: Statistical Analysis Backup Plan** Description: Describe what you will do should your data violate assumptions, your model not converge, or some other analytic problem arise? Example: If out multilevel model with random variables A, B, and C predicting Y using X, we will utilize smaller models (A and B only/B and C/A and C) and remove random variables that are unnecessary (i.e. with close to 0 variance) according to these smaller models in order to choose the optimal model. **Question title: Exploratory analysis** Description: If you plan to explore your data set to look for unexpected differences or relationships, you may describe those tests here. An exploratory test is any test where a prediction is not made up front, or there are multiple possible tests that you are going to use. A statistically significant finding in an exploratory test is a great way to form a new confirmatory hypothesis, which could be registered at a later time in a different dataset. Example: We expect that certain demographic traits may be related to taste preferences. Therefore, we will look for relationships between demographic variables (age, gender, income, and marital status) and the primary outcome measures of taste preferences. ## Other relevant information ## Other If there is any other relevant information, please add it here.

Compare

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.

Create an Account Learn More Hide this message

Main content

Preregistration Template

Menu

Start managing your projects on the OSF today.

Main content

Links to this project

Preregistration Template

Menu

Add new wiki page

Delete wiki page

Page permissions have changed

Wiki page deleted

Connected to the collaborative wiki

Connecting to the collaborative wiki

Collaborative wiki is unavailable

Browser unsupported

Start managing your projects on the OSF today.