Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
## Background ## In [Experiment 6][1], we found some evidence that **a)** participants may have varied more in the strategies they adopted at test for CR, relative to FR, **b)** high-performers on CR and FR tended to report more use of *imagery* and *story*-based strategies, relative to low-performers, and **c)** use of these strategies may be more predictive of performance for CR than for FR. Taken together, these results suggest a potential role of imagery and imageability in the effects we have observed thus far. Previous research has found that both the imageability of word pairs ([Madan et al., 2010][2]) and imagery instructions ([Hockley & Christi, 1996][3]) impact cued recall performance, ostensibly by improving memory for the association. Applying these findings to the current effect, it could be that participants vary in their tendency to adopt imagery-based strategies (either similarly for FR and CR, or more for CR). Because imagery/imageability primarily affect association memory, perhaps the variance in adoption of these strategies (either at study or at test) inflates performance variance more for CR than for FR. Or, it could be that the varying imageability of the words in the pairs inflated variance in the memorability of CR pairs (moreso than the memorability of items in FR lists). Although we attempted to restrict imageability in our primary wordset to a middle/average range, we did not specifically control for the relative imageability of cues and targets in pairs, and did not restrict imageability in any of the secondary wordsets (i.e., object words, DRM word pairs). [Madan et al. (2010)][4] tested cued recall of pairs of varying imageability (e.g., high-high, high-low), holding other important word characteristics constant. They found that high-high pairs were significantly more memorable than low-low and mixed pairs -- suggesting that relative cue-target imageability matters. The authors did not examine variability, but visual inspection of error bars (Fig. 3a) did not seem to indicate a difference in variability, i.e., it was not obvious that performance on mixed pairs was more or less variable than pure pairs. So maybe imageability impacts mean performance but not variability. Of course, [Madan et al. (2010)][5] did not directly analyze variability, did not test FR of the words, and were not specifically encouraged to adopt an imagery-based strategy. ## Proposed Experiment ## A stronger test of the hypothesis that imageability and imagery-based strategies explain differences in CR and FR variability would be an experiment in which participants completed FR and/or CR test(s) for [Madan et al.'s (2010)][6] high-imageability wordset whilst being instructed to adopt imagery-based strategies as best they can (e.g., [Hockley & Christi, 1996][3], or perhaps some hybrid of imagery- and story-based strategies), both at study and at test. Participants would then be queried on the degree to which they were able to use imagery, the perceived effectiveness of such strategies, and complete some measure of mental imagery. Such an experiment would allow for a more controlled examination of the impacts of stimulus imageability *and* a deeper look at variability in imagery across participants. Specifically, participants would: 1) Complete the Paper Folding Task ([French, Ekstrom, & Price, 1963](https://apps.dtic.mil/sti/citations/AD0410915)), a behavioural measure of mental imagery in which participants must judge the outcome of folding and hole-punching a piece of paper, e.g.: @[osf](xbr79) This task has been shown to correlate with CR memory performance as well as other associative memory tasks (Thomas et al., 2023, although may be due to some more general factor such as motivation). In addition to measuring imagery ability, the idea would that this task would in a sense prime the use of imagery in participants. However, some have criticized the PFT because: it predicted memory performance even in aphantasics ([Thomas et al., 2023](https://www-sciencedirect-com.ezproxy.library.uvic.ca/science/article/pii/S0191886918304495?via%3Dihub#s0090)), and participants are prone to adopting simple non-imagery-based heuristic strategies. An alternative measure is the *Image Comparisons Task*, where participants are presented with a perceptual adjective (e.g., "shiny") and two nouns (e.g., "trumpet", "violin"), and must decide which of the two nouns best fits the adjective ([Suggate & Lenhard, 2022](https://www-sciencedirect-com.ezproxy.library.uvic.ca/science/article/pii/S0959475222000548?via%3Dihub)). An example of ICT task trials: @[osf](w67kf) This task predicts reading performance in adults, and other versions of the task (e.g., comparing size of stimuli, whether animals have a long tail compared to their body, whether the color of two objects are similar) have been widely used in the literature to measure imagery ([Pearson et al., 2013](dx.doi.org/10.1016/j.cpr.2012.09.001)). Unlike the PFT, this task is closer to the imagery and memory tasks we want participants to engage in (i.e., imagining pairs of concrete nouns). In terms of implementation, participants view 75 adjectives (13 unique) and must adjudicate between two options for both (141 unique). The adjectives and attendant pairs were translated from Suggate & Lenhard's German set (2022). 2) Complete one FR study-test cycle *or* one CR study-test cycle (order counterbalanced). Words for both tasks would be drawn from the "high-imageability" word pool used by Madan et al. (2010). Due to the increased memorability of high-imageability words, lists would be slightly longer than in previous experiments (e.g., 18 words/pairs per list, + primacy/recency buffers). Importantly, prior to each study phase, participants would be given explicit instructions to use interactive imagery, drawn from Thomas et al. (2023), who found that they improved memory performance relative to control instructions: **CR: Studies have indicated that forming mental images of words significantly improves one's memory for them. Please try this technique for the pairs you are about to study. Form a mental image with both of the words interacting together when you are presented with a word pair. For example: For the word pair CAT-DOG, you could imagine the cat chasing the dog.** **FR: Studies have indicated that forming mental images of words significantly improves one's memory for them. Please try this technique for the words you are about to study. Form a mental image of the words presented, and try to imagine them interacting. For example: If you study the words CAT, DOG, and HOUSE you could imagine the cat chasing the dog in front of the house.** *[Thomas et al. did not have FR, so these instructions were modified. Might need to tweak due to the task differences]* As in Experiment 6, participants would complete a brief task-familiarization phase before each study-test cycle (i.e., participants would study a list of 5 words before FR, 5 pairs before CR). Also as in Experiment 6, the study phases would be self-paced (up to 30s per word/pair). Finally, at test, participants would be encouraged to rely on mental imagery when trying to recall the words. For CR, the cues would be presented in the same order that they were studied in. 3) Participants would answer self-report questions about strategy use, e.g.: - When studying and recalling the words, how often would you say you used the strategy of generating mental images of the words interacting? [1-5] - If you used any memory strategies other than mental imagery, what were those strategies (e.g., repeating the words)? 4) Participants would complete the Object and Spatial Imagery Questionnaire (OSIQ; Blajenkova et al., 2006), a 30-item self-report measure of object and spatial imagery that has been shown to correlate with associative memory (albeit in aphantasics but not controls, Wittman & Şatırer, 2022). *[Scale's not perfect, but seems to be the best questionnaire-based imagery measure among those available.]* For this experiment, the focal analyses would be: - Comparison of CR vs. FR variability under "high-imageability" conditions (predict that CR would not be more variable) - Basic examination of variability in objective/subjective imagery ability (predict that there is a good degree of individual differences) - Predicting CR & FR accuracy from objective/subjective imagery ability (predict that these measure more strongly predict CR than FR accuracy) - Correlation between objective/subjective imagery ability (predict that these measures correlate) Idea would be to run on Prolific, probably in the neighborhood of 120 participants. ## Pilot Testing ## We piloted the above design with *N* = 24 Prolific participants (12 Free, 12 Cued), with the primary aim of determining **a)** whether memory performance was suitably away from floor/ceiling for FR and CR, and **b)** how participants did on the novel imagery measures. Pilot participants were restricted to those who reported English as a first & fluent language, had at least a 95% Prolific approval rating, and had completed at least 3 submissions on Prolific. For recall accuracy: @[osf](gmzq3) Well away from floor/ceiling, and another case where we might observe higher CR than FR accuracy (although the variability difference is not promising for our hypotheses!). For accuracy on the ICT (i.e., responding with the correct noun in the pair given a target adjective): @[osf](vcg4b) Accuracy was good overall (esp. given that some of the comparisons were a bit ambiguous), with not a lot of variability. For reaction time on the ICT (conditional on accuracy): @[osf](ar3ke) More variability across participants than accuracy, and relatively consistent individual differences for correct and incorrect responses. RT on this task might be a better candidate for a predictor/covariate than accuracy. Finally, some preliminary explorations of potential relationships between memory accuracy, ICT, and scores on the OSIQ. Predicting recall accuracy from OSIQ scores (both the 'object' and 'spatial' subscales): @[osf](ubqcv) Hard to say much about any potential relationships with data this sparse, but at the very least there is a good amount of variability in OSIQ scores. And, relationships between RT on the ICT and OSIQ scores: @[osf](8derx) Again, sparse data, but seems like there might be a modest relationship between the 'objective' and 'subjective' imagery measures. **Overall, this design seems suitable for a full-blown experiment** ## Preliminary data check ## At post-exclusion *N* = 44 (just under our preregistered preliminary data check *N* = 50), we examined the data, primarily to check exclusion rates for our various criteria and determine whether average performance levels allowed for suitable wiggle-room. For exclusions: Of a full sample of 62, we excluded 7 participants who did not get at least 3/18 correct, 2 participants who did not report understanding at least 75% of the presented words, 2 participants who reported cheating, 1 who reported a substantial technical difficulty, 2 participants with more than 5 CR "fast skips" (empty responses with rt < 1s), and 11 participants who did not report using the assigned imagery strategy at least most of the time. For average performance: @[osf](zbm96) FR accuracy was around the optimal level of .5, but CR accuracy was a bit too close to ceiling for comfort (24% of CR responses). In light of this, we chose to **increase the number of studied items from 18 words/pairs to 21 words/pairs**, working on the crude assumption that a 1.16x increase in list length might work out to a proportional .86x decrease in performance, ideally putting CR and FR performance rougly equidistant from the optimal .5 performance. To test the effect of longer study length, we planned to collect data up to a post-exclusion *N* = 100 and check average performance again. Additionally, in light of the relatively high exclusion rate on the imagery-strategy criterion, we chose to relax that criteria such that participants would only be excluded if they reported never using the assigned imagery-based strategy. Of course, we planned to conduct our primary analyses with both this new relaxed criterion and the original criterion. ## Results ## Of a total sample of *N* = 246, 38 participants were excluded: - 23 who did not get at least 3/21 correct at recall (15 CR, 8 FR) - 3 who reported cheating (1 CR, 2 FR) - 2 who reported technical difficulties (1 CR, 1 FR) - 5 CR participants with 6 or more "fast skips" - 9 who reported "Never" using the assigned imagery strategy (1 CR, 8 FR) *(sum of the above does not equal 38 because some participants were excluded on multiple criteria)* ...leaving us with a final sample of *N* = 208 (95 CR, 113 FR), for which we calculated recall accuracy: @[osf](92guh) The bootstrapped CR:FR variance ratio was 1.03 [95% CI: .87, 1.19]. The Levene's test of CR and FR variances was non-significant, *F*(94) = 1.06, *p* = .38. And crucially, the TOST equivalence test of the bootstrapped variance ratio was significant, *t*(999) = 27.64, *p* < .001, providing evidence that the observed variance ratio did not exceed our prespecified equivalence bounds (.9 and 1.1). Thus, for the first time across our experiments, we did not observe greater CR than FR variance. Because we obtained compelling evidence for equivalence at our first prespecified stopping point, we chose not to continue data collection to our final prespecified stopping point (*N* = 300). There are some considerations before concluding that imagery explains the previously-observed CR:FR variance effect. First, there is the ever-present threat of ceiling/floor effects constraining variability. For instance, could it be that high CR performance artificially capped CR variability? I would argue against this -- in the final sample only 7% of CR repsonses were at ceiling, and there were the same total number of participants at ceiling or (task) floor in CR and FR (*n* = 9, and similar proportions: 9% and 8%). Qualitatively, the distributions appear almost mirrored (whereas in previous experiments the CR distribution often had a noticeably different shape). Second, what of the higher rate of CR participants excluded for low performance (nearly double that of FR)? Does including these participants change the overall pattern of results? @[osf](wgr87) With the addition of 10 CR and 3 FR previously-excluded participants, the bootstrapped CR:FR variance ratio jumps to 1.23 [95% CI: 1.06, 1.40], with a significant Levene's test, *F*(104) = 1.49, *p* = .02, and a non-significant equivalence test, *t*(999) = 44.20, *p* = 1. Putting aside for a moment the question of whether these exclusions are justified (e.g., these participants blew off the task vs. these participants made a bona fide attempt but were low CR-performers), what do these differing results imply for this (and prior) experiments? First, it is merely a statistical fact that extreme scores exert greater influence on measures of central tendency and variability, so not surprising that 10 outliers substantially shift the standard deviation. Second, in all prior experiments, the CR:FR variability difference was observed even when excluding low performers on criteria similar to this experiment -- not so this time. Still, the fact that there were more low-performance exclusions for CR than for FR is potentially noteworthy. One possibility there is that participants may have been more easily overwhelmed or fatigued by the CR task (i.e., double the words to study), and checked out when it came time to test. ### Exploratory analyses of low CR performers ### To better understand these "low CR" performers, we conducted exploratory analyses of other variables. For example, comparing the item-wise study and test RTs of those above and below our exclusion threshold: @[osf](6s5p9) Although low-performers spent on average 3.9s less studying the pairs, this difference was not significant, *z* = 1.52, *p* = .43. Interestingly, low-performers also spent about 2.8s more at test on each pair -- this difference was significant, *z* = 2.66, *p* = .04. So, we can't say with confidence that the low-performers were merely blazing through the tasks. Was it the case that low performers left answers blank, or tendered responses? @[osf](s2vxp) It appears not -- those above and below the accuracy threshold did not differ in the proportion of their errors that were omissions, *X2*(1) = .66, *p* = .42. What kind of responses did low-performers give (e.g., were they responding to cues with the wrong targets, other cues, or non-studied words)? The vast majority of incorrect responses for low performers were **non-studied** words (i.e., neither cues, targets, nor words from the ICT task) -- for all low-performing CR participants, 89-100% of their commission errors were non-studied words. Finally, we compared performance (accuracy & RT) on the ICT task for those above and below the performance exclusion threshold: @[osf](y5324) Low-performers were no more or less accurate, *X2*(1) = .6, *p* = .44, or faster/slower, *X2*(1) = 1.73, *p* = .19. So, what to conclude about the CR low performers whose inclusion potentially changes the fundamental conclusions? The high proportion of non-studied words given as responses and the slightly faster study RTs contrasted with good performance on the ICT suggests that these participants, while they were probably not outright shirking, may have not made a serious legitimate attempt on the CR test phase. Thus, we argue that their exclusion is justified, although it remains a point of interest that there were more such exclusions for CR than for FR. ### Self reports of imagery strategy use ### After the memory test, we asked participants to self-report how often they made use of the assigned imagery strategy when memorizing/recalling the words. The proportion of responses by test type was as follows: @[osf](thqzv) CR participants were significantly more likely to report using the assigned strategy "all of the time" (*p* = .02), and the average reported frequency overall was higher for CR (*p* < .001). Did FR and CR participants differ in the frequency with which they reported using strategies *other* than imagery? To assess this, we coded qualitative reports of other strategies used (using the same general coding scheme in past experiments where we coded strategy use), and compared the frequencies by test type: @[osf](w4efy) The proportions were similar for FR and CR, with no significant differences. Overall, the strategy analyses suggest that a) CR participants made use of the imagery strategy more often than FR participants, but b) the majority of participants used *only* imagery-based strategies, and potentially c) in addition to imagery, about 1 in 5 participants also reported using rehearsal. ### Predicting recall from imagery measures ### We also had predictions about the relationship between imagery ability (as measured using the OSIQ and ICT) and recall performance. Specifically, we predicted that imagery ability would be more predictive of recall for CR than for FR. If this is the case, it could be that variability in imagery ability amplifies variability in CR performance relative to variability in FR performance. First, we examined recall as predicted by the OSIQ Object and Spatial subscales (created by averaging across 15 items each): @[osf](ypvkj) There was no effect of OSIQ average on memory test performance, *X2*(1) = 1.17, *p* = .28, and no interactions between subscale, test type, and OSIQ score (all *p*s > .39). This is unlikely to be due to any restrictions of range, as we had a good amount of variability in both performance and OSIQ scores. Next, we examined the relationship between accuracy on the ICT and proportion correctly recalled: @[osf](8jrsu) Performance on the ICT significantly predicted performance on the memory test, *F*(1) = 5.32, *p* = .02, but this appears to be largely driven by a few low-performing outliers. Performance on the ICT was generally quite high, with little variability across participants. What about reaction time -- a potentially more sensitive measure? When examining the relationship between participant-level average RT on correct ICT trials: @[osf](m72w5) There was no significant relationship between average RT and memory test performance, *F*(1) = .01, *p* = .93. Again there were some outliers here, but even when conducting an exploratory analysis restricted to participants with an avg. RT < 5s (*n* = 188), the relationship was not significant, *F*(1) = .67, *p* = .42, and similarly when restricting to participants with an avg. RT < 2s (*n* = 173, *F*(1) = .31, *p* = .58). Results were similar when looking at average RT on *incorrect* ICT trials: @[osf](h4gy8) ...with a non-significant effect of RT, *F*(1) = .11, *p* = .74 (*p*s = .89, .06 when restricting to participants with < 5s avg. RT and < 2s avg. RT respectively). Thus, we found little evidence that our imagery measures meaningfully predicted recall performance. ### Interrelations among imagery measures ### The imagery measures were not related to memory performance, but were they related to one another? First, the OSIQ subscales and accuracy on the ICT: @[osf](bspk9) Surprisingly, there was a (modest) negative relationship between OSIQ score and ICT performance, *F*(1) = 4.91, *p* = .03 (but no interaction between subscale and score, *F*(1) = .06, *p* = .80). For RT on the ICT task: @[osf](t94vx) Interesting again was the significant relationship between ICT RT and OSIQ scores, *F*(1) = 49.52, *p* < .001, with a similar relationship for both subscales and for correct and incorrect ICT trials (all interaction *p*s > .73). These results held when restricting to participants with average RT < 5s (*p* < .001), but not when restricting to participants with average RT < 2s (*p* = .09). Thus, it appears that better self-reported imagery ability (as indexed by the OSIQ) predicted *poorer* performance on the ICT, both in terms of (slightly) lower performance and longer reaction times. One possibility is that participants higher in imagery spent longer imagining and comparing the words in the pairs. But, this relationship was not of primary interest for this experiment. ### An increase in FR variability or a decrease in CR variability (or both)? ### A question that naturally arises is whether the imagery manipulation reduced the CR:FR variability ratio by reducing CR variability, increasing FR variability, or some combination of the two. Because the current experiment did not include a control condition (e.g., without imagery instructions), it is difficult to say. However, a comparison with the results of prior experiments might offer some insight. First, we plotted the CR and FR distributions for the current and past experiments, mean-centering accuracy to make cross-experiment comparisons in variability easier: @[osf](ku9ps) At least from an interocular test, FR variability seems similar to that observed in prior experiments, with CR variability potentially decreased. This comparison also reveals that CR performance in the current experiment lacks the apparent bi-modality characteristic of most of the other experiments. As a more quantitative test, we compared the bootstrapped FR and CR standard deviations across experiments: @[osf](zda6y) For FR, variability in the current experiment was reliability greater than in 4/7 previous experiments (and not reliably different in 3/7 previous experiments). For CR, variability in the current experiment was reliably less than in 5/7 previous experiments (and not reliably different in 2/7 previous experiments). Although these cross-experiment comparisons should be taken with a grain of salt, it does seem that the imagery manipulation exerted its effects by both increasing FR variability and decreasing CR variability. [1]: https://osf.io/upwhb/ [2]: https://cmadan.com/papers/MadaEtal2010JML.pdf [3]: https://link.springer.com/content/pdf/10.3758/BF03200881.pdf [4]: https://cmadan.com/papers/MadaEtal2010JML.pdf [5]: https://cmadan.com/papers/MadaEtal2010JML.pdf [6]: https://cmadan.com/papers/MadaEtal2010JML.pdf [7]: https://bpspsychub.onlinelibrary.wiley.com/doi/pdfdirect/10.1111/bjop.12050
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.