Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
# Methods The aim of this study was to understand how users' underlying motivational regulations relate to their experiences with technologies they use frequently. It responds to recent calls to move beyond a narrow focus on Basic Needs Satisfaction in SDT-informed HCI research [@tyack_self-determination_2020], and beyond episodic understandings of user experience and autonomy [@bennett_how_2023]. We follow a well-established survey model, which has been used successfully in several studies on positive and negative aspects of episodic user experience [cite]{style="color: red"}. The approach has also been validated in addressing longer-term experiences of interaction [@bruhlmann_measuring_2018]. A benefit of this approach is that it allows the collecting of both qualitative and quantitative data. It also avoids pre-judging what constitutes a positive or negative experience, and instead focuses on users' own perceptions. Unlike previous work in this mould, we deploy novel metrics and analytical approaches to understand not only the experience during the interactions, but also the processes of motivational regulation which impact on the experience. Our approach is theoretically grounded in Organismic Integration Theory - a sub-theory of Self Determination Theory which has been neglected in HCI [cite]{style="color: red"}, and which deals with the longer term integration of motivations for an activity. ## Participants We estimated our required sample size based on precedence in prior literature, and a pilot analysis of a relevant open data-set from a previous study validating the UMI [study.2 from @bruhlmann_measuring_2018]. It has recently been noted that power analysis for LPA is very rarely conducted due to complexities in the process, and apriori uncertainty about population parameters and class-counts [@spurk_latent_2020]. As such we followed guidance in basing our sample size estimate on a previous Monte-Carlo simulation study [@nylund_deciding_2007], which indicated that for a comparable analysis to our own, with an unequally sized four-class structure, the correct number of classes was identified 100% of the time for a sample size of 200. Our pilot analysis, which included 460 participants selected a 4 class structure and demonstrated strong model fits for this solution, and medium-to-large effect sizes for relationships to outcome variables. As such we opted for a sample size of 550 participants, allowing for exclusions. We excluded participants on the following criteria: text responses had to be in English, engage with the substance of the question, and have basic face validity as a description of a technology experience. We included check questions in each page of the questionnaire, instructing participants to select particular responses, excluding incorrect responses. We also conducted a longstring analysis to detect repeated answering schemes among the User Motivation Inventory (UMI) items [see @bruhlmann_measuring_2018] After these exclusions responses from the remaining participants were subjected to analysis. ## Design The study used a one factor, between participant design. Participants were randomly assigned to one of two conditions, describing technologies with which they had either a positive or negative relationship. We aimed to understand the role played by motivational factors in both good and bad user experience over time. ## Procedure Having clicked the link to the survey, participants were presented with an introduction page, which provided the basic information about the experiment and data handling which supported informed consent. After giving consent, participants were asked to provide demographic information (gender, age, country of origin), before proceeding to the main part of the study. First, participants were asked to bring to mind an interactive technology which they had used frequently in the past two months Depending on condition, they were asked to focus on a technology with which they felt they had either a positive or negative relationship. They were then asked to spend a few moments thinking about their experiences with this technology. On the next page they were asked to name the technology involved in the experience, how long they had been using it, how regularly they used it, and when they used it most recently. On the next page they were asked to describe their experience with this technology in their own words. After this, participants completed questionnaires for several measures (see section [1.4](#measures){reference-type="ref" reference="measures"}) addressing their experiences with the technology. Over these pages, questions from individual measures were presented together, but the order of items within each measure was randomised. Finally participants were given the option to comment on the survey and asked to indicate whether they had answered questions in a conscious and engaged manner. All questions in the survey, except age and gender, were mandatory. On average the survey took [X]{style="color: red"} minutes to complete. ## Measures We collected both qualitative responses, and quantitative, self-report, measures from participants. Except where noted otherwise, all quantitative measures were answered on a 7-point Likert scale, ranging from "strongly disagree" (5) to "strongly agree" (7). ### Qualitative Questions [TODO - CONFIRM QUESTIONS] We asked participants to describe their experience with the technology in their own words. To support descriptive writing we followed previous recommendations to structure the writing with multiple prompts [@tuch_analyzing_2013], which we developed from prompts used successfully in previous studies [@mekler_momentary_2016; @tuch_analyzing_2013; @bruhlmann_measuring_2018]. [refine these]{style="color: red"} To support detailed writing and meaningful responses on the following measures, we first asked participants to bring to mind an experience and reflect on it for a few moments. They were presented with the following prompt and asked to click continue when ready. *Bring to mind an interactive technology with which you feel you have a [positive/negative] relationship, and which you have used frequently over the last two months. You should understand "[positive/negative]" in whatever way makes sense to you. Once you have chosen a technology, please spend a moment remembering your experiences with this technology and your feelings about it.* Then they were asked to name the technology and report how frequently they used the technology in the last two months. This was followed by three questions on motivations for using the technology First a general question *Thinking about the technology you chose on the previous page (technologyName.shown), please describe why you use this technology. Describe this as accurately, and in as much detail, as you can. Focus on your overall engagement with the technology over multiple interactions, rather than focusing only on one occasion Focus your description on your own experience and motivations rather than on the technology itself. Make your description as concrete as possible. Try to write so that outsiders can easily understand your experience. Write at least 80 words (e.g. at least 3 full lines of text)* Then two more specific questions: - *Why do you think you begin to use the technology* - *Why do you think you continue to use the technology?* ### User Motivation Inventory UMI The six constructs captured in this measure will be subjected to latent profile analysis to identify motivational profiles for the participants. ### Needs Satisfaction Needs Satisfaction and Frustration is another construct theoretically grounded in SDT. \db{describe} Where the UMI is a relatively new metric, and the constructs it measures have rarely been addressed in HCI, Basic Needs Satisfaction is a very commonly addressed construct in HCI research in general, and UX research in particular. And in recent years needs frustration has also been increasingly addressed since it has been often found that the impact of frustration is not identical with teh absence of satisfaction. Sheldon's need satisfaction scale has been commonly used to address satisfaction of these three basic needs alongside a wider range of less theoretically basic needs. In addition to the 3 core needs (Autonomy, Competence, and Relatedness) posited by SDT, Sheldon's scale captures a further 7 constructs. Among previous UX studies it has been common to drop particular items from the scale. Some have addressed only the 3 basic needs, and others have tended to drop one or more of the constructs which are not considered relevant to the study at hand. We follow prior work in dropping items which feel less relevant to technology use: namely those for money-luxury and physical thriving. Of these we only form hypotheses about Autonomy, Competence, and Relatedness. The remaining 5 items are retained for exploratory, post-hoc, analysis, to shed further light on the results and inform future research directions, and to relate results to prior UX research. We did not form hypotheses about these needs. As in previous work on non-episodic experience, the introductory question for this measure was adapted to reflect the frequent use of technology: \textit{ "While using [technology], I feel ..."} ### Attribution We expected that internalised motivation might results in greater self-attribution. Previous work has argued that greater autonomy satisfaction results in users taking more responsibility for the outcome of an interaction. This suggests that more-or-less autonomous self-regulation profiles may impact on this also. We asked participants to answer three questions on 7 point Likert scales, asking the degree to which they felt 1) they were responsible for outcomes of interactions with the technology, vs 2) the technology or 3) other factors. ### Usability Metric for User Experience (UMUX) We used the four-item UMUX [@finstad_usability_2010] to measure perceived usability of the reported technology. The UMUX is shorter than other candidate measures for usability, and normative data is available to ground its interpretation [@hodrien_review_2021]. While longer questionnaires support more detailed diagnostic analysis of issues with target technologies, this was not the focus of this study. Here a measure of overall-system usability was considered more expedient. ### Positive and Negative Affect Schedule (PANAS) [NHST]{style="color: red"} We used the short-form 20 item PANAS-C scale to measure positive and negative affect. The PANAS has been shown to be stable over time, and suitable for measuring both episodic and longer-term affect [cite]{style="color: red"}. The user is presented with 20 positive or negative affect-related adjectives, and we asked them to indicate the extent that they felt this way during their interactions with the technology in question. This is answered on a 5 point Likert scale from \"very slightly or not at all\" to \"extremely\". # Analysis ### Hypotheses We expected that motivational profiles identified in our data-set would be associated with significantly different outcomes in terms of affect Basic Needs Satisfaction, Perceived Usability, Affect (PANAS), and Attribution. ## Motivational Profile Analysis ### Confirmatory Factor Analysis (CFA) Guidance on LPA suggests that the "arguably optimal" approach is to conduct the analysis not on the raw measure scores, but on factor scores. This ensures that the indicators used are truly continuous (as required by LPA) and that measurement error has been corrected for [@bauer_primer_2021]. This approach requires an initial factor analysis to be conducted, and that factor scores be scaled to have meaningful mean structure [@bauer_primer_2021]. Such scaling can be achieved by *effects coding* --- constraining the model such that factor loadings of measured indicators average to 1 and indicator intercepts sum to 0 [@little_non_2006]. As such, as a first step we conducted a six-factor CFA to test a UMI measurement model in which all items loaded onto their designated factors, and the effects coding constraints were observed. ### Latent Profile Analysis (LPA) The factor scores calculated in the CFA were then subjected to LPA, in R using the tidy-sem library [cite]. We did not have hypotheses about the number of motivational profiles which would be identified. A pilot analysis on positive-user experiences identified 4 profiles across which both the amount of motivation and the quality of motivational regulation varied, but it was not clear how much the different framing in this study would affect results. As such we evaluated a range of latent profile models, with different numbers of profiles, from 1 to 6, and with different constraints. For each profile-count we tested models in which 1) variances were either a) constrained to be equal or b) allowed to vary between profiles, and 2) covariances were either i) constrained to zero, i)) constrained to be equal, or else iii) allowed to vary. As such the final model was selected on the basis of both theoretical and numerical considerations [@bauer_primer_2021]. We constrained our analysis by the following decision process: - First solutions which failed to converge to an optima were discarded - Solutions were discarded where the number of profile members was smaller than five times the number of parameters-per-class. Below this, the number of observations per parameter risks resulting in unstable fits [cite]{style="color: red"} - Solutions were discarded if they included profiles smaller than 7% of the sample since this would reduce the meaningfulness of comparisons between profiles. - After this numerical criteria were used to select models. We referred to various metrics. Primarily we sought the solution with the lowest Bayesian Information Criteria (BIC) value - indicating quality of model fit [cite]. We also paid attention to the minimum classification probability for a profile (indicating the certainty with which participants are assigned to profiles) and the entropy of the model (informative about the distinctness of the classes). We sought entropy values of 0.8 or higher, but did not exclude lower valued solutions if BIC and classification probabilities were high [cite]. - Finally we considered model constraints. Previous work points to the value of flexibility in deciding model-constraints. Bauer argues that constraints on independence and homogeneity ensure that the observed variance in profiles is attributable to differences in mean values of indicators, and not to the form of their variance-covariance relations [@bauer_primer_2021]. However, Peugh & Fan note that these are very strict assumptions whose external validity can be questioned. In simulation studies they found that LPA models were more likely to recover true populations where these constraints were relaxed. As such we did not enforce constraints a-priori but, where numeric values for models were close, preferred the more constrained (and thus interpretable) model. ### Relating Profiles to Outcome Indicators When linking profiles to outcome variables, previous work has often taken a naive approach: first participants are assigned to profiles, and then mean outcome variables are compared between profiles. Unfortunately this approach can be misleading. LPA does not directly assign data points to profiles, but assigns probabilities. The naive approach does not not take this into account, risking error [@bauer_primer_2021; @bakk_relating_2021]. Recent literature demonstrates that other approaches, such as the Bolck--Croon--Hagenaars (BCH) approach, are more robust [@bauer_primer_2021; @bakk_relating_2021]. BCH takes into account classification uncertainty by linking profiles to outcomes by a weighted estimation of a linear regression model. The approach also avoids normality assumptions, and is insensitive to heteroscedacity [@bakk_relating_2021]. As such this is the approach we used. First we related profiles to outcome variables by estimating a BCH model in tidySEM. Then we conducted statistical tests to quantify the effect size and significance of results. First we conducted a likelihood ratio test for the whole model, to evaluate the null hypothesis that outcome means do not differ between profiles. Then we conducted pairwise Wald tests, between class-pairs, for each outcome variable, to evaluate the null hypothesis that the two profiles did not vary. Finally p-values were corrected for the number of hypotheses, using the Holm-Bonferroni correction via the adjust function in R.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.