# Methods
The aim of this study was to understand how users' underlying
motivational regulations relate to their experiences with technologies
they use frequently. It responds to recent calls to move beyond a narrow
focus on Basic Needs Satisfaction in SDT-informed HCI research
[@tyack_self-determination_2020], and beyond episodic understandings of
user experience and autonomy [@bennett_how_2023]. We follow a
well-established survey model, which has been used successfully in
several studies on positive and negative aspects of episodic user
experience [cite]{style="color: red"}. The approach has also been
validated in addressing longer-term experiences of interaction
[@bruhlmann_measuring_2018]. A benefit of this approach is that it
allows the collecting of both qualitative and quantitative data. It also
avoids pre-judging what constitutes a positive or negative experience,
and instead focuses on users' own perceptions.
Unlike previous work in this mould, we deploy novel metrics and
analytical approaches to understand not only the experience during the
interactions, but also the processes of motivational regulation which
impact on the experience. Our approach is theoretically grounded in
Organismic Integration Theory - a sub-theory of Self Determination
Theory which has been neglected in HCI [cite]{style="color: red"}, and
which deals with the longer term integration of motivations for an
activity.
## Participants
We estimated our required sample size based on precedence in prior
literature, and a pilot analysis of a relevant open data-set from a
previous study validating the UMI [study.2 from
@bruhlmann_measuring_2018]. It has recently been noted that power
analysis for LPA is very rarely conducted due to complexities in the
process, and apriori uncertainty about population parameters and
class-counts [@spurk_latent_2020]. As such we followed guidance in
basing our sample size estimate on a previous Monte-Carlo simulation
study [@nylund_deciding_2007], which indicated that for a comparable
analysis to our own, with an unequally sized four-class structure, the
correct number of classes was identified 100% of the time for a sample
size of 200. Our pilot analysis, which included 460 participants
selected a 4 class structure and demonstrated strong model fits for this
solution, and medium-to-large effect sizes for relationships to outcome
variables. As such we opted for a sample size of 550 participants,
allowing for exclusions.
We excluded participants on the following criteria: text responses had
to be in English, engage with the substance of the question, and have
basic face validity as a description of a technology experience. We
included check questions in each page of the questionnaire, instructing
participants to select particular responses, excluding incorrect
responses. We also conducted a longstring analysis to detect repeated
answering schemes among the User Motivation Inventory (UMI) items [see
@bruhlmann_measuring_2018]
After these exclusions responses from the remaining participants were subjected to analysis.
## Design
The study used a one factor, between participant design. Participants
were randomly assigned to one of two conditions, describing technologies
with which they had either a positive or
negative relationship. We aimed to
understand the role played by motivational factors in both good and bad
user experience over time.
## Procedure
Having clicked the link to the survey, participants were presented with
an introduction page, which provided the basic information about the
experiment and data handling which supported informed consent. After
giving consent, participants were asked to provide demographic
information (gender, age, country of origin), before proceeding to the
main part of the study.
First, participants were asked to bring to mind an interactive
technology which they had used frequently in the past two
months Depending on condition, they were asked to
focus on a technology with which they felt they had either a positive or
negative relationship. They were then asked to spend a few moments
thinking about their experiences with this technology.
On the next page they were asked to name the technology involved in the
experience, how long they had been using it, how regularly they used it,
and when they used it most recently. On the next page they were asked to
describe their experience with this technology in their own words. After
this, participants completed questionnaires for several measures (see
section [1.4](#measures){reference-type="ref" reference="measures"})
addressing their experiences with the technology. Over these pages,
questions from individual measures were presented together, but the
order of items within each measure was randomised.
Finally participants were given the option to comment on the survey and
asked to indicate whether they had answered questions in a conscious and
engaged manner. All questions in the survey, except age and gender, were
mandatory. On average the survey took [X]{style="color: red"} minutes to
complete.
## Measures
We collected both qualitative responses, and quantitative, self-report,
measures from participants. Except where noted otherwise, all
quantitative measures were answered on a 7-point Likert scale, ranging
from "strongly disagree" (5) to "strongly agree" (7).
### Qualitative Questions
[TODO - CONFIRM QUESTIONS]
We asked participants to describe their experience with the technology
in their own words. To support descriptive writing we followed previous
recommendations to structure the writing with multiple prompts
[@tuch_analyzing_2013], which we developed from prompts used
successfully in previous studies
[@mekler_momentary_2016; @tuch_analyzing_2013; @bruhlmann_measuring_2018].
[refine these]{style="color: red"} To support detailed writing and
meaningful responses on the following measures, we first asked
participants to bring to mind an experience and reflect on it for a few
moments. They were presented with the following prompt and asked to
click continue when ready.
*Bring to mind an interactive technology with which you feel you have a [positive/negative] relationship, and which you have used frequently over the last two months. You should understand "[positive/negative]" in whatever way makes sense to you. Once you have chosen a technology, please spend a moment remembering your experiences with this technology and your feelings about it.*
Then they were asked to name the technology and report how frequently they used the technology in the last two months.
This was followed by three questions on motivations for using the technology
First a general question
*Thinking about the technology you chose on the previous page (technologyName.shown), please describe why you use this technology. Describe this as accurately, and in as much detail, as you can. Focus on your overall engagement with the technology over multiple interactions, rather than focusing only on one occasion Focus your description on your own experience and motivations rather than on the technology itself. Make your description as concrete as possible. Try to write so that outsiders can easily understand your experience. Write at least 80 words (e.g. at least 3 full lines of text)*
Then two more specific questions:
- *Why do you think you begin to use the technology*
- *Why do you think you continue to use the technology?*
### User Motivation Inventory UMI
The six constructs
captured in this measure will be subjected to latent profile analysis to
identify motivational profiles for the participants.
### Needs Satisfaction
Needs Satisfaction and Frustration is another construct theoretically grounded in SDT. \db{describe}
Where the UMI is a relatively new metric, and the constructs it measures have rarely been addressed in HCI, Basic Needs Satisfaction is a very commonly addressed construct in HCI research in general, and UX research in particular. And in recent years needs frustration has also been increasingly addressed since it has been often found that the impact of frustration is not identical with teh absence of satisfaction.
Sheldon's need satisfaction scale has been commonly used to address satisfaction of these three basic needs alongside a wider range of less theoretically basic needs. In addition to the 3 core needs (Autonomy, Competence, and Relatedness) posited by SDT, Sheldon's scale captures a further 7 constructs. Among previous UX studies it has been common to drop particular items from the scale. Some have addressed only the 3 basic needs, and others have tended to drop one or more of the constructs which are not considered relevant to the study at hand.
We follow prior work in dropping items which feel less relevant to technology use: namely those for money-luxury and physical thriving. Of these we only form hypotheses about Autonomy, Competence, and Relatedness. The remaining 5 items are retained for exploratory, post-hoc, analysis, to shed further light on the results and inform future research directions, and to relate results to prior UX research. We did not form hypotheses about these needs.
As in previous work on non-episodic experience, the introductory question for this measure was adapted to reflect the frequent use of
technology: \textit{ "While using [technology], I feel ..."}
### Attribution
We expected that internalised motivation might results in greater
self-attribution. Previous work has argued that greater
autonomy satisfaction results in users taking more responsibility for
the outcome of an interaction. This suggests that more-or-less
autonomous self-regulation profiles may impact on this also.
We asked participants to answer three questions on 7 point Likert
scales, asking the degree to which they felt 1) they were responsible
for outcomes of interactions with the technology, vs 2) the technology
or 3) other factors.
### Usability Metric for User Experience (UMUX)
We used the four-item UMUX
[@finstad_usability_2010] to measure perceived usability of the reported
technology. The UMUX is shorter than other candidate measures for
usability, and normative data is available to ground its interpretation
[@hodrien_review_2021]. While longer questionnaires support more
detailed diagnostic analysis of issues with target technologies, this
was not the focus of this study. Here a measure of overall-system
usability was considered more expedient.
### Positive and Negative Affect Schedule (PANAS)
[NHST]{style="color: red"} We used the short-form 20 item PANAS-C scale
to measure positive and negative affect. The PANAS has been shown to be
stable over time, and suitable for measuring both episodic and
longer-term affect [cite]{style="color: red"}. The user is presented
with 20 positive or negative affect-related adjectives, and we asked
them to indicate the extent that they felt this way during their
interactions with the technology in question. This is answered on a 5
point Likert scale from \"very slightly or not at all\" to
\"extremely\".
# Analysis
### Hypotheses
We expected that motivational profiles identified in our data-set would
be associated with significantly different outcomes in terms of affect
Basic Needs Satisfaction, Perceived Usability, Affect (PANAS), and
Attribution.
## Motivational Profile Analysis
### Confirmatory Factor Analysis (CFA)
Guidance on LPA suggests that the "arguably optimal" approach is to
conduct the analysis not on the raw measure scores, but on factor
scores. This ensures that the indicators used are truly continuous (as
required by LPA) and that measurement error has been corrected for
[@bauer_primer_2021]. This approach requires an initial factor analysis
to be conducted, and that factor scores be scaled to have meaningful
mean structure [@bauer_primer_2021]. Such scaling can be achieved by
*effects coding* --- constraining the model such that factor loadings of
measured indicators average to 1 and indicator intercepts sum to 0
[@little_non_2006]. As such, as a first step we conducted a six-factor
CFA to test a UMI measurement model in which all items loaded onto their
designated factors, and the effects coding constraints were observed.
### Latent Profile Analysis (LPA)
The factor scores calculated in the CFA were then subjected to LPA, in R
using the tidy-sem library [cite].
We did not have hypotheses about the number of motivational profiles
which would be identified. A pilot analysis on positive-user experiences
identified 4 profiles across which both the amount of motivation and the
quality of motivational regulation varied, but it was not clear how much
the different framing in this study would affect results.
As such we evaluated a range of latent profile models, with different
numbers of profiles, from 1 to 6, and with different constraints. For
each profile-count we tested models in which 1) variances were either a)
constrained to be equal or b) allowed to vary between profiles, and 2)
covariances were either i) constrained to zero, i)) constrained to be
equal, or else iii) allowed to vary.
As such the final model was selected on the basis of both theoretical
and numerical considerations [@bauer_primer_2021]. We constrained our
analysis by the following decision process:
- First solutions which failed to converge to an optima were discarded
- Solutions were discarded where the number of profile members was
smaller than five times the number of parameters-per-class. Below
this, the number of observations per parameter risks resulting in
unstable fits [cite]{style="color: red"}
- Solutions were discarded if they included profiles smaller than 7%
of the sample since this would reduce the meaningfulness of
comparisons between profiles.
- After this numerical criteria were used to select models. We
referred to various metrics. Primarily we sought the solution with
the lowest Bayesian Information Criteria (BIC) value - indicating
quality of model fit [cite]. We also paid
attention to the minimum classification probability for a profile
(indicating the certainty with which participants are assigned to
profiles) and the entropy of the model (informative about the
distinctness of the classes). We sought entropy values of 0.8 or
higher, but did not exclude lower valued solutions if BIC and
classification probabilities were high [cite].
- Finally we considered model constraints. Previous work points to the
value of flexibility in deciding model-constraints. Bauer argues
that constraints on independence and homogeneity ensure that the
observed variance in profiles is attributable to differences in mean
values of indicators, and not to the form of their
variance-covariance relations [@bauer_primer_2021]. However, Peugh &
Fan note that these are very strict assumptions whose external
validity can be questioned. In simulation studies they found that
LPA models were more likely to recover true populations where these
constraints were relaxed. As such we did not enforce constraints
a-priori but, where numeric values for models were close, preferred
the more constrained (and thus interpretable) model.
### Relating Profiles to Outcome Indicators
When linking profiles to outcome variables, previous work has often
taken a naive approach: first participants are assigned to profiles, and
then mean outcome variables are compared between profiles. Unfortunately
this approach can be misleading. LPA does not directly assign data
points to profiles, but assigns probabilities. The naive approach does
not not take this into account, risking error
[@bauer_primer_2021; @bakk_relating_2021]. Recent literature
demonstrates that other approaches, such as the Bolck--Croon--Hagenaars
(BCH) approach, are more robust
[@bauer_primer_2021; @bakk_relating_2021]. BCH takes into account
classification uncertainty by linking profiles to outcomes by a weighted
estimation of a linear regression model. The approach also avoids
normality assumptions, and is insensitive to heteroscedacity
[@bakk_relating_2021]. As such this is the approach we used.
First we related profiles to outcome variables by estimating a BCH model
in tidySEM. Then we conducted statistical tests to quantify the effect
size and significance of results. First we conducted a likelihood ratio
test for the whole model, to evaluate the null hypothesis that outcome
means do not differ between profiles. Then we conducted pairwise Wald
tests, between class-pairs, for each outcome variable, to evaluate the
null hypothesis that the two profiles did not vary. Finally p-values
were corrected for the number of hypotheses, using the Holm-Bonferroni
correction via the adjust function in R.