Effective applications of task-oriented dialogue agents are yet limited to
simple tasks. One of the reasons behind could be that agents couldn’t
“understand” more complicated requests due to failures in communicative
grounding attempts, which establishes mutually agreed upon knowledge.
Distinct from researches that evaluate dialogue agents’ performance using
the rates of successfully completed tasks, this paper takes the linguistic
approach of discourse analysis and investigates practical differences in
how human and interlocutors make use of language to reach common ground in
human-human and human-agent (Siri) dialogues given same tasks. Utilising
the Degrees of Grounding model (Roque and Traum, 2008), the paper suggests
how Siri’s dispreferred signal in expressing groundedness hindered accurate
indication of whether the information is grounded. Interpretations with
the modified incremental semantic processing models (Eshghi et al., 2015)
suggests the exact point of breakdown in grounding, as well as the
discrepancies in how updates of information were perceived by human and
agent interlocutors. The paper suggested four major weaknesses in Siri:
uninformative request repair, greedy use of grounding evidence, difficulties
in interpreting resubmit and inability to understand human grounding
strategies. Apart from the surface distinct language use, findings hinted
a deeper challenge in optimising agents’ presentation and image to help
adjust anticipated replies. In view of Siri’s vague and greedy expressions
that misled users about its perceptions, the paper suggests a number of
mitigation strategies, including increasing accuracy and informativity in
agents’ delivery to help users adapt to their language style as how humans
do with second-language speakers.