Keynotes – SIGDIAL 2025

Are there purely multi-turn “dialog failures”?

Chris Welty

Abstract: The notion of dialog in AI dates back decades and has traditionally distinguished itself from other areas of NLP in several ways, most notably that dialogs contain multiple turns, and that any evaluation of a dialog must somehow incorporate more fine-grained analysis of the individual turns with a more gestalt analysis of all the turns in ways that might be called conversationality. In the classic NLP “levels” (lexical, syntactic, semantic, discourse, pragmatic), this amorphous idea fell somewhere between discourse and pragmatics, and was never really understood deeply because, frankly, we had no systems capable of being evaluated at that level and therefore no real idea of where a human-computer dialog interaction might go wrong. Obviously, this has changed, but the lion’s share of the analysis of actual human-computer dialogs has focused on single turns – single prompt/response pairs. In most cases, it is easy to identify a “bad” multi-turn dialog because something went wrong in a single turn, like forgetting some previous context, self-contradiction, etc., and therefore this single-turn emphasis of multi-turn evaluation is justified as eminently practical. In this talk, I will address the question of whether there are dialog failures that can’t be found in any single turn, but that emerge from some gestalt analysis of all the turns, and through that hope to illustrate that the answer is far from obvious, and leaves open what I believe to be a very fruitful area for academic research.

Bio: Dr. Chris Welty is a Research Scientist at Google in New York, and a Visiting Professor at CornellTech in NY. Today he works on developing and evaluating frontier capabilities of large language models, especially aspects of conversationality and dialog. He has been part of the teams that launched Gemini and Gemma. While at Google he also worked on large scale crowdsourcing for Google Maps, and is best known for his work on CrowdTruth: a multi-perspective approach to human evaluation.
Before Google, Dr. Welty was a member of the technical leadership team for IBM’s Watson – the question answering computer that destroyed the all-time best Jeopardy! champions in a widely televised contest. He appeared on the TV broadcast, discussing the technology behind Watson, as well as many articles in the popular and scientific press. His proudest moment was being interviewed for StarTrek.com about the project. He is a recipient of the AAAI Feigenbaum Prize for his work.
Welty previously played a seminal role in the development of the Semantic Web and Ontologies, and co-developed OntoClean, the first formal methodology for evaluating ontologies. He holds a Ph.D. in Computer Science from Rensselaer Polytechnic Institute, where he worked, taught, and helped to form NYSERNet, and then PSINet, the first commercial internet access provider. He has also worked at AT&T Bell Labs, Vassar College and the Italian Research Council (CNR).

Interactive Task Learning
Alex Lascarides

Abstract: This talk offers a learning framework in which the agent learns to solve novel tasks through evidence gained from its own actions in the environment and an embodied conversation with the teacher. We focus on scenarios where the agent’s pretraining was deficient: it is completely unaware of domain concepts that are necessary for solving the task, and so must discover them and learn to exploit them via its conversation with the user. This poses some demands on the type of learning required: it should be incremental (since the user expects the agent to change its beliefs and policies whenever she provides feedback); the agent must reason about and act within the domain without knowing the hypothesis space of possible states and actions; and on discovering an unforeseen concept, the agent must expand its hypothesis space and adapt what it has learned from prior evidence to this newly expanded set of possibilities.

Our agent can learn, through its experience and conversations with users, a policy that addresses the dilemma about when (and what) questions to ask (which come at a cost because of the user’s effort) vs. when (and what) actions to perform in the environment (which risks a huge cost if wrong). Our main hypothesis in this learning paradigm is that formal semantics makes learning more data efficient. We demonstrate this by running experiments where the agent starts out unaware of all domain concepts and has to address reference resolution tasks and tasks involving rearrangements of objects in the scene so as to achieve the user’s instruction (e.g., “Put both granny smiths in the basket”). Our experiments show that an agent that exploits the truth conditions of quantifiers (e.g., “every”, “both”, “a”), generics (e.g., “fire trucks have ladders”), and the logical consequences of discourse coherence (in particular corrective moves, question answering, and explanations), learns faster than an agent that lacks these capabilities.

Bio: Prof. Alex Lascarides is the chair in semantics at the School of Informatics, University of Edinburgh. Her research is in theoretical and computational linguistics and AI. Her research aims to model the semantics and pragmatics of communicative actions in conversation, mainly focussing on text and speech but also analysing non-verbal actions such as hand gestures. She has developed logical and computational models of how humans communicate with each other, and machine learning frameworks that enable software agents and robots to engage in, and learn from, verbal and non-verbal interactions with humans. She has also developed models of conversation where the participants’ goals diverge (e.g., courtroom cross examination, negotiations over restricted resources and political debate), as well as cases where they align (e.g., tourist information, scheduling). A common underlying theme to all this work is to exploit models of discourse coherence to constrain the inferential processes that underly generating and interpreting language and gesture. Prof. Lascarides also has an ongoing interest in developing machine learning methods for learning optimal strategies, particularly for complex games such as Settlers of Catan, or for decision problems where the agent starts out unaware of possible states and/or actions that are critical to task success.

Ecological Study of Language Acquisition, the role of natural conversation
Abdellah Fourtassi

Abstract: Human learners acquire language in childhood through interaction. In contrast, most modern dialogue systems begin with large-scale text pretraining and only later adapt to interaction. This talk explores the human path—where interaction precedes language—and considers its implications for rethinking the role of dialogue in AI systems. First, I present recent evidence that core dialogue mechanisms—as formalized in Traum’s grounding model and related theories of linguistic coordination (e.g., turn-taking, speech acts, grounding, and coherence)—emerge in child–caregiver exchanges even before full language mastery. Second, I show that these mechanisms are not merely communicative scaffolds, but active ingredients in learning: language models trained in child-sized data regimes improve when guided by caregiver feedback. Viewing interaction as a built-in learning mechanism—rather than a late-stage interface—offers a perspective for building AI systems that, like humans, can learn effectively in low-resource settings.

Bio: Abdellah Fourtassi is Associate Professor of Computer and Cognitive Science at Aix-Marseille University, where he leads the Computational Communicative Development (CoCoDev) team. His research investigates how children acquire language and communication skills through social interaction, using machine learning—particularly approaches inspired by multimodal dialogue systems—to study development in naturalistic settings. He is also a Researcher at the Institute of Language, Communication and the Brain (ILCB) — and a member of the governing board.