Main content

Validation Plan

Menu

Loading wiki pages...

View
Wiki Version:
#### [&larr; Back to Experiment Design](https://osf.io/2tqj7/wiki/Experiment%20Design/) ## Validation Plan This wiki article describes how the key data for the project will be collected and analyzed. The following sections refine the hypothesis and goals of this project to establish statistical boundaries, present the details of the data to be collected, and specify how the collected data will be analyzed to establish the statistical significance of the findings of this research. ### Refinement of Hypothesis and Goals This project will utilize the two-sample t-test to determine the statistical significance of applying a new teaching method that incorporates the Framework. Here, we acknowledge that the research hypothesis and the goals stated so far for the project require further refinement for the application of statistical analysis. More narrowly defined Null and Alternate hypothesis are constructed as below: #### Null Hypothesis > Use of a teaching method consisting of Design Recipes, Code Outlining, and Peer Review practices backed by Automatic Code Template and Unit Test Generation *does not* generate any difference in beginning students' performance on quiz and exam questions that test their *ability to program*. #### Alternate Hypothesis > Use of a teaching method consisting of Design Recipes, Code Outlining, and Peer Review practices backed by Automatic Code Template and Unit Test Generation *does* generate improvement in beginning students' performance on quiz and exam questions that test their *ability to program*. With the hypotheses refined, definition of the variable and significance level (Alpha) to be used in the t-test, as well as the discussion of sample and population, is in order. #### Selection of Variable to Analyze A single variable, namely `Composite ATP Score`, will be used in the statistical significance test. It will be composed of scores earned by a single subject on a quiz and exam questions that are developed to evaluate their *ability to program* (ATP) throughout the academic term. Below is an example procedure that may be used to compute the `Composite ATP Score` for a single subject: 1. Collect applicable quiz or exam question responses from the subject. 2. Evaluate the subject's response based on a class-wide rubric. 3. Convert the score earned to the percentage out of maximum achievable score for the given quiz or exam question. 4. At the end of the term, collect all scores and compute a simple mean (sum of percentages divided by total number of scores collected) for the subject. This is the subject's `Composite ATP Score`. #### Fixture of Significance Level (Alpha) In this subsection, we pre-define the significance upfront. By doing so, we aim to prevent any post-experiment alteration of the parameters of statistical analysis in order to fit the results to the hypothesis. In performing the t-test, Alpha of 0.05 (5%) will be used initially to determine the statistical significance of the results found. If the t-test results in a p-value exceeding 0.05, we will accept the null hypothesis which states that the new teaching method that incorporates the Framework *does not* yield any statistically significant difference in students' *ability to program*. If the t-test results in p-value less than or equal to 0.05, then we will reject the null hypothesis which states that the new teaching method that incorporates the Framework *does* yield statistically significant difference in students' *ability to program*. In the first case where we accept the null hypothesis, we will still report any findings such as difference in `Composite ATP Score` found between the control and experimental groups, and future improvements to the Framework that may potentially yield a statistically significant difference. In the latter case where we reject the null hypothesis, we will state any potential threats to validity, and further compare p-values to stronger Alpha of 0.25 and 0.10 and report the findings. #### Remarks on Sample Selection and the Population The sample of prospective subjects for this experiment are considered to have been randomly selected, as student enrollment in different course offerings of Fundamentals of Computer Science I occurs in a manner that is not in any way selective. Because the prospective subjects are students enrolled in the course offered exclusively at the Computer Science and Software Engineering (CSSE) Department of California Polytechnic State University (Cal Poly), San Luis Obispo, we consider the population to be all students enrolled in the same course, but in different course offerings, during the Winter term of 2018. The result of this research may provide insight into the effects of the new teaching method that incorporates the Framework in beginning programming courses. However, we acknowledge upfront that we will not be able to establish sound statistical support for making any generalized claims about all beginning CSSE students beyond the population at Cal Poly. ### Statistical Implication of Project Goals As outlined in the **Main Goals** section on [Home](https://osf.io/2iqj7/wiki/) article of this wiki, this project has primary and secondary goals. We will limit the scope of statistical analysis to only the primary goal (validating the effectiveness of incorporating the Framework into teaching introductory CSSE students). Because the secondary goal (developing a tool and a workflow for the incorporation) is more of an implementation detail than the integral component of the Framework, we will not be collecting any data or performing analysis for the purpose of determining statistical significance for the secondary goal. ### Data Collection for the Validation of **the Framework** Any data collected for the validation of the Framework will be data populating the `Composite ATP Score` variable defined earlier. Throughout the academic term, quizzes and exams given out to evaluate student understanding of the course material will include questions designed to test students' *ability to program*. A plan to test each component of *ability to program* is listed below: 1. Students' ability to "effectively decompose the given problem into discrete subproblems" could be tested with a multiple-choice-style quiz or an exam question that presents a moderately complex problem (i.e. given a CSV file of student records consisting of `FirstName`, `LastName`, `StudentId`, and `ClassLevel`, sort the records by a specified attribute and output the result to a new CSV file) and asks students to select discrete subproblems that would be appropriate for the decomposition of the given problem. 2. Ability to "devise solutions to the subproblems in terms of the implementation plan" could be evaluated with as a short-response follow up question to 1, where the question asks for an implementation plan for one of the subproblems identified previously. 3. Students' ability to "communicate the plans of implementation" with other students could be tested with questions that are designed to evaluate the ability to understand and critique the given implementation plans. Such a question on a quiz or an exam could present a sample implementation plan and ask students to find semantic flaws or points of improvement can be asked. 4. Ability to "devise a wide range of test values for program-to-be-implemented" can be tested with a question that provides a sample implementation plan and asks students to provide test values for the input and the output of the program implemented using the plan. 5. Students' ability to "follow the implementation plan and test values to compose an executable program" can be tested with a fill-in-the-blank type of question that provides a sample implementation plan and a set of test values and asks for a correct executable implementation. Questions that test *ability to program* will be mixed in with other questions on the quiz or the exam that test students' understanding of other key concepts covered so far in the course, as to not skew the students' attitude in attempting to respond. Once the quiz or the exams are graded, scores earned by students on questions that were composed to test *ability to program* will be tracked separately from the overall quiz or exam scores, and the overall `Composite ATP Score` will be computed at the end of the academic term, following the procedure laid out in **Selection of Variable to Analyze** section above. ### Data Collection for the Measure of **Friction** Measure of Friction will not contribute to the determination of statistical significance of the Framework. Nevertheless, it is still important to measure friction. Even if the Framework proves to generate statistically significant improvements in students' *ability to program*, notably negative student morale and emotional responses would suggest modification to the implementation of the Framework is necessary. Thus, surveys and interview processes will be developed to collect students' morale and emotional response to the incorporation of the Framework into the course administration. For the measure of Friction, the following statements could be given to students with a numerical scale that will allow varying responses based on the individual student's degree of agreement to the statement: * "Using the design recipe was helpful in designing parts of my program." * "Code outlining is an important process in constructing a working program." * "Peer review process has helped me better plan my implementation for a program." * "Automatically generated code template made writing and completing my programs easier." * "Automatic unit test generation produced helpful unit tests I used throughout the course." Surveys and interviews can include a free response component which will allow the discussion of students' sentiment towards the incorporation of the Framework. In the case that a large volume of negative responses were received from the surveys and interviews, reasons for the negative responses will be itemized and reported as potential future work to improve the implementation of the Framework. In addition, if the Framework did not yield any improvement of students' *ability to program*, further discussion will be included in the final report in attempts to speculate the relationship between of failure of the Framework and the measure of Friction. ### Expected Findings Although the Framework has been thoughtfully designed and will be carefully implemented to minimize Friction and maximize students' learning of *ability to program*, showing statistically significant improvement from the control group is not very likely. Nonetheless, incorporation of the Framework is still expected to produce some notable improvement in general. "Some notable improvement" may not be statistically quantifiable with the data collected for the validation of the Framework alone. However, in the case that the Framework yields some improvement that is not statistically significant, the measure of Friction will be further studied to determine whether replication studies that utilize the same Framework has the potential to yield statistic significant results. That is, if the participating students found the Framework to be helpful and some evidence is found in their `Composite ATP Score` to substantiate their responses, then we will at least be able to report the *potential* benefits of the Framework in studies or larger scale, or with more efficient implementation into the introductory courses. ### Reporting the Findings After the conclusion of the research, anonymized data used to derive the statistical significance with `Composite ATP Score` will be made available on this OSF project for validation by any third party. <br> #### [&uarr; Return to Home](https://osf.io/2tqj7/wiki/)
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.