• Increase font size
  • Default font size
  • Decrease font size
Home Insights Scientific Papers

Scientific Papers

E-mail Print
The following scientific publications (articles, scientific papers and invited speeches) have been partially funded by the CALLAS project.
YEAR 2010

By J.Urbain [FPMS], E.Bevacqua [TCOM], T.Dutoit  [FPMS], A. Moinet [FPMS], R.Niewiadomski  [TCOM], C.Pelachaud [TCOM], B.Picart [FPMS], J.Tilmanne [FPMS], J.Wagner [UoA]
The AVLaughterCycle database

ABSTRACT: As part of the AVLaughterCycle project carried out during the eNTERFACE’09 Workshop held in Genova, an audiovisual laughter database has been recorded. The aim of this database is to provide a broad corpus for studying the acoustics of laughter, the facial movements involved, and the synchronization between these two signals. During the Workshop, the laughter database has been used to drive the facial movements of a 3D humanoid virtual character, Greta, simultaneously with the audio laughter signal. This paper presents the database collection protocol.
Reference event: LREC'10, 17-23 May , 2010 Malta.

By N.Bee, J.Wagner , E.Andre [UOA], F.Charles , D.Pizzi , M.Cavazza [TEES]
Multimodal Interaction with a Virtual Character in Interactive Storytelling

ABSTRACT: A frequent metaphor for interactive storytelling is that of the Holodeck, the science-fiction ultimate entertainment system, where narratives take the form of virtual reality world in which the user is totally immersed, interacting with other characters and the environment in a way which drives the evolution of the narrative. As a character in the narrative, the user communicates with virtual characters much like an actor communicates with other actors. This requirement introduces a novel context for multimodal communication as well as several technical challenges. Acting involves attitudes and body gestures that are highly significant for both dramatic presentation and communication. Apart from our earlier work where we developed a story character that responds to the user's emotive tone, there is, however, hardly any conversational interface to interactive storytelling that emphasizes the socio-emotive aspects of interaction and integrates sophisticated technologies to recognize the user's emotive state. Furthermore, hardly any attempt has been made to study the role of eye gaze in interactive storytelling.
DOWNLOAD ARTICLE: In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010) in press.
Reference event: AAMAS 2010, 10 May 2010,Toronto (Canada)

By C.Coutrix [AALTO], I. Avdouevski [AALTO], G.Jacucci[AALTO], V.Vervondel[TEES], S. Gilroy[TEES], M. Cavazza [TEES]
The Common Touch: Aesthetic and affective interaction in semi-public settings

ABSTRACT: The artistic community has been developing interactive works for public or semipublic settings for a number of years. These works utilise the public setting to tailor interaction and add an aesthetic dimension. In a similar approach, we are designing and implementing a system called The Common Touch…
Reference event: workshop “Designing for Crowd” at Pervasive 2010 , 17 May 2010, Helsinki, Finland.

By Sevin,  R.Niewiadomski, E. Bevacqua, A.Pez, M. Mancini, C.Pelachaud [TCOM]
Greta, une plateforme d'agent conversationnel expressif et interactif

ABSTRACT: This paper presents a generic, modular and interactive architecture for embodied conversational agent called Greta. It is 3D agent able to communicate with users using verbal and non verbal channels like gaze, head and torso movements, facial expressions and gestures. Our architecture follows the SAIBA framework that defines modular structure, functionalities and communication protocols for ECA systems. In this paper, we present our architecture, performance tests as well as several interactive applications.
PUBLISHED on: (to appear) Technique et science informatiques, "Agents conversationnels animés", (vol. 29), 2010

By J.Wagner , E. Andre', M.Kugler , D.Leberle  [UOA]
SSI/Model UI - A Tool for the Acquisition and Annotation of Human Generated Signals
ABSTRACT:Humans are used to express their needs and goals through various channels, such as speech, mimic, posture, etc. The recognition and understanding of such behaviour is a key requirement towards a more natural human-computer interaction (HCI) and believed to be an important part of next-generation user interface. Link
Reference event: DAGA 2010, Berlin (Germany) 2010 March 15-18

By P.Bleakley [BBC], S.Hyniewska , R.Niewiadomski , C.Pelachaud  [TCOM]and Price M. [BBC]
Emotional Interactive Storyteller System
ABSTRACT: This paper presents a concept and initial evaluation of the emotional interactive storyteller system. Our concept is derived from our knowledge of the pre-medieval oral storytelling tradition and its application to therapeutic storytelling. Our initial evaluations are based on a mock-up of the concept where emotional content of the story is reinforced with an expressive agent, which was subjectively tested with a sample of volunteer users. The results of the evaluation are very helpful, confirming that the concept is useful and meaningful in general to the user, and providing us with valuable feedback in shaping the ongoing design for a full implementation.
DOWNLOAD ARTICLE: From KEER 2010 Conference Proceedings link
Reference event: International Conference On Kansei Engineering And Emotion Research 2010, KEER 2010, March 2-4, 2010, Paris, France

By R.Niewiadomski , S.Hyniewska , C. Pelachaud [TCOM]
Introducing Multimodal Sequential Emotional Expressions for Virtual Characters
ABSTRACT: In this paper we present a system which allows embodied conversational agent to display multimodal sequential expressions. Recent studies show that several emotions are expressed by a set of different nonverbal behaviors which include different modalities: facial expressions, head and gaze movements, gestures, torso movements and posture. Multimodal sequential expressions of emotions may be composed of nonverbal behaviors displayed simultaneously over different modalities, of a sequence of behaviors or of expressions that change dynamically within one modality. This paper presents, from the annotation to the synthesis of the behavior, the process of multimodal sequential expressions generation as well as the results of the evaluation of our system.
DOWNLOAD ARTICLE: From KEER 2010 Conference Proceedings link
Reference event: International Conference On Kansei Engineering And Emotion Research 2010, KEER 2010, March 2-4, 2010, Paris, France

By J. Kim, F. Jung [UOA]
Emotional Facial Expression Recognition from two different feature domains
ABSTRACT:There has been a significant amount of work on automatic facial expression recognition towards realizing affective interfaces in human-computer interaction (HCI). However, most previous works are based on specific users and dataset-specific methods and therefore the results should be strongly dependent on their lab settings. This makes it difficult to attain a generalized recognition system for different applications. In this paper, we present efficiency analysis results of two feature domains, Gabor wavelet-based feature space and geometric position-based feature space, by applying them to two facial expression datasets that are generated in quite different environmental settings.
Reference event: ICAART 2010, 22-24 January 2010, Valencia, Spain

By J.Kim and F.Lingenfelser [UOA]
Ensemble approaches to parametric decision fusion for bimodal emotion recognition
ABSTRACT: In this paper, we present a novel multi-ensemble technique for decision fusion of bimodal information. Exploiting the dichotomic property of 2D emotion model, various ensembles are built from given bimodal dataset containing multichannel physiological measures and speech. Through synergistic combination of the ensembles we investigated parametric schemes of decision-level fusion. Up to 18% of improved recognition accuracies are achieved compared to the results from unimodal classification
Reference event: B-Interface , 20-23 January 2010, Valencia, Spain

By J.Wagner, F.Jung, J.Kim, T.Vogt and E.Andre' [UOA]
The Smart Sensor Integration Framework and its Application in EU Projects
ABSTRACT:Affect sensing by machines is an essential part of next-generation human-computer interaction (HCI). However, despite the large effort carried out in this field during the last decades, only few applications exist, which are able to react to a user’s emotion in real-time. This is certainly due to the fact that emotion recognition is a challenging part in itself. Another reason is that so far most effort has been put towards offline analysis and only few applications exist, which can react to a user’s emotion in real-time. In response to this deficit we have developed a framework called Smart Sensor Integration (SSI), which considerably jump-starts the development of multimodal online emotion recognition (OER) systems. In this paper, we introduce the SSI framework and describe how it is successfully applied in different projects under grant of the European Union, namely the CALLAS and METABO project, and the IRIS network.
Reference event: B-Interface , 20-23 January 2010, Valencia, Spain

YEAR 2009

By M. Obaid, R. Mukundan [HITNZ] , R. Goecke [University of Canberra], M. Billinghurst, H. Seichter[HITNZ]
A Quadratic Deformation Model for Facial Expression Recognition. Digital Image Computing and Applications
ABSTRACT: In this paper we propose a novel approach for recognizing facial expressions based on using an Active Appearance Model facial feature tracking system with the quadratic deformation model representations of facial expressions. Thirty seven Facial Feature points are tracked based on the MPEG-4 Facial Animation Parameters layout. The proposed approach relies on the Euclidian distance measures between the tracked feature points and the reference deformed facial feature points of the six main expressions (smile, sad, fear, disgust, surprise, and anger). An evaluation of 30 model subjects, selected randomly from the Cohn-Kanade Database, was carried out. Results show that the main six facial expressions can successfully be recognized with an overall recognition accuracy of 89%. The proposed approach yields to promising recognition rates and can be used in real time applications.
DOWNLOAD ARTICLE: Proceedings of the International Conference on Digital Image Computing: Techniques and Applications, link
Reference Event: DICTA, 1-3 December 2009, Melbourne

By E.Bevacqua, K.Prepin, R.Niewiadomski, Sevin and C.Pelachaud [TCOM]
GRETA: Towards an Interactive conversational virtual companion
ABSTRACT: Research has shown that people tend to interact with computers characterized by humanlike attributes as if they were really humans (Nass et al., 1997, Reeves and Nass, 1996). For example, in their studies Nass and Reeves saw that, when interacting with computers, people apply rules of politeness and felt uneasy when large faces were displayed on a screen, as if the talking head was invading their personal space (Reeves and Nass, 1996). Consequently, human-machine interface designers should aim to implement interactive systems that simulate human-like interaction. The more this type of interface is consistent with a human style of communication, the more their use will become easy and accessible (Ball and Breese, 2000). Such a level of consistency could be reached using humanoid artefacts able to apply the rich style of communication that characterizes human conversation. Recent technological progress has made the creation of this type of humanoid interfaces, called Embodied Conversational Agents (ECAs), possible. An ECA is a computer-generated animated character that is able to carry on natural, human-like communication with users (Cassell et al., 2000b). For this purpose, the researchers engaged in ECA development share common goals:They want to implement....
PUBLISHED on: Close Engagements with Artificial Companions: Key social, psychological, ethical and design issues. 2010. xxii, 315 pp. (pp. 143–156) link

By J.Kim, J. Wagner, T. Vogt, E.André, F. Jung, M. Rehm[UOA]
Emotional sensitivity in human-computer interaction

ABSTRACT: Human conversational partners usually try to interpret the speaker's or listener's affective cues and respond to them accordingly. Recently, the modelling and simulation of such behaviours has been recognized as an essential factor for more natural man-machine communication. So far, research on emotion recognition has mostly dealt with online analysis of recorded emotion corpora, and online processing has hardly been addressed. Online processing is, however, a necessary prerequisite for the realization of human-computer interfaces that analyze and respond to the user's emotions while he or she is interacting with an application. In this paper, we first describe how we recognize emotions from various modalities including speech, gestures and biosignals. We then present Smart Sensor Integration (SSI), a framework which we developed to meet the specifi requirements of online emotion recognition.
PUBLISHED on Information Technology, Oldenbourg Wissenschaftsverlag

By N. Bee, E. André, T. Vogt, P. Gebhard [UOA]
The use of affective and attentive cues in an empathic computer-based companion

ABSTRACT: Recently, a number of research projects have been started to create virtual agents that do not just serve as assistants to which tasks may be delegated, but that may even take on the role of a Companion. Such agents require a great deal of social intelligence, such as the ability to detect the user‘s affective state and to respond to it in an empathic manner. The objective of our work is to create an empathetic listener that is capable to react on affective and attentive input cues of the user. In particular, we discuss various forms of empathy and how they may be realized based on these cues.
DOWNLOAD ARTICLE: Close Engagements with Artificial Companions: Key social, psychological, ethical and design issues. 2010. xxii, 315 pp. (pp. 143–156) link

By S.W.Gilroy , M.Cavazza [TEES], M. Benayoun
Using Affective Trajectories to Describe States of Flow in Interactive Art

ABSTRACT: Interactive Art installations often integrate sophisticated interaction techniques with visual presentations contributing to a rich user experience. They also provide a privileged environment in which to study user experience by using the same sensing data that support interaction. In this paper, using the affective interface of an Augmented Reality Art installation, we introduce a framework relating real-time emotional data to phenomenological models of user experience, in particular the concept of Flow. We propose to analyse trajectories of affect in a continuous emotional space (Pleasure-Arousal-Dominance), to characterize user experience. Early experiments with several subjects interacting in pairs with the installation support this mapping on the basis of Flow questionnaires. This approach has potential implications for the analysis of user experience across Art and Entertainment applications.
Reference Event: ACE 2009, Athens: October 29-31, 2009

By R.Niewiadomski , S.Hyniewska , C.Pelachaud [TCOM]
Modeling emotional expressions as sequences of behaviors

ABSTRACT: In this paper we present a system which allows a virtual character to display multimodal sequential expressions i.e. expressions that are composed of different signals partially ordered in time and belonging to different nonverbal communicative channels. It is composed of a language for the description of such expressions from real data and of an algorithm that uses this description to automatically generate emotional displays. We explain in detail the process of creating multimodal sequential expressions, from the annotation to the synthesis of the behavior.
DOWNLOAD PAPER: published on Proceedings of the 9th International Conference on Intelligent Virtual Agents, Amsterdam, Holland, 2009. Link
Reference Event: IVA 2009, Amsterdam: September 14-16, 2009

By J.Kim , E.Andre', T.Vogt [UOA]
Towards User-Independent Classification of Multimodal Emotional Signals

ABSTRACT: Coping with differences in the expression of emotions is a challenging task not only for a machine, but also for humans. Since individualism in the expression of emotions may occur at various stages of the emotion generation process, human beings may react quite differently to the same stimulus. Consequently, it comes as no surprise that recognition rates reported for a user-dependent system are significantly higher than recognition rates for a user independent system. Based on empirical data we obtained in our earlier work on the recognition of emotions from biosignals, speech and their combination, we discuss which consequences arise from individual user differences for automated recognition systems and outline how these systems could be adapted to particular user groups.
Reference Event: ACII 2009, Amsterdam: September 10-12, 2009

By S.W. Gilroy,  M.Cavazza [TEES], M. Niiranen, E. Andre, T. Vogt[UOA], J. Urbain[FPMS], M.Benayoun, H. Seichter, M. Billinghurst [HITNZ]
PAD-based Multimodal Affective Fusion

ABSTRACT: The study of multimodality is comparatively less developed for affective interfaces than for their traditional counterparts. However, one condition for the successful development of affective interface technologies is the development of frameworks for the real-time multimodal fusion. In this paper, we describe an approach to multimodal affective fusion, which relies on a dimensional model, Pleasure-Arousal-Dominance (PAD) to support the fusion of affective modalities, each input modality being represented as a PAD vector. We describe how this model supports both affective content fusion and temporal fusion within a unified approach. We report results from early user studies which confirm the existence of a correlation between measured affective input and user temperament scores
Reference Event: ACII 2009, Amsterdam: September 10-12, 2009

By C.Coutrix, P.Narula, M.Helin, G.Jacucci [TKK], S.Roveda [SAZ]
Interactivity of an Affective Puppet 
ABSTRACT: This paper describes a computer-animated puppet responding to multimodal and affective inputs from a group of spectators in order to engage them in the visit to a science museum. In order to compute the emotional state of the audience, our system allows fusion of information acquired from microphone and camera input. The objective of our demonstration is to put forth a mechanism for using emotions for group interaction and to provide an example of real world application of this technology.
Reference Event: UBICOMP 2009, Orlando: October 1st, 2009

By T.Vogt , E.Andre’ [UOA], J.Wagner , S.W.Gilroy , F.Charles , M.Cavazza [TEES]
Real-time vocal emotion recognition in artistic installations and interactive storytelling: Experiences and lessons learnt from CALLAS and IRIS

ABSTRACT: Most emotion recognition systems still rely exclusively on prototypical emotional vocal expressions that may be uniquely assigned to a particular class. In realistic applications, there is, however, no guarantee that emotions are expressed in a prototypical manner. In this paper, we report on challenges that arise when coping with non-prototypical emotions in the context of the CALLAS project and the IRIS network. CALLAS aims to develop interactive art installations that respond to the multimodal emotional input of performers and spectators in real-time. IRIS is concerned with the development of novel technologies for interactive storytelling. Both research initiatives represent an extreme case of non-prototypicality since neither the stimuli nor the emotional responses to stimuli may be considered as prototypical.
DOWNLOAD PAPER: Published in Proceedings of the International Conference on Affective Computing and Intelligent Interaction (ACII 2009), Amsterdam, The Netherlands , link
Reference Event: ACII 2009, Amsterdam: September 10-12, 2009

By R.Niewiadomski, S.Hyniewska , C.Pelachaud [TCOM]
Evaluation of Multimodal Sequential Expressions of Emotions in ECA

ABSTRACT: A model of multimodal sequential expressions of emotion for an Embodied Conversational Agent was developed. The model is based on video annotations and on descriptions found in the literature. A language has been derived to describe expressions of emotions as a sequence of facial and body movement signals. An evaluation study of our model is presented in this paper. Animations of 8 sequential expressions corresponding to the emotions - anger, anxiety, cheerfulness, embarrassment, panic fear, pride, relief, and tension - were realized with our model. The recognition rate of these expressions is higher than the chance level making us believe that our model is able to generate recognizable expressions of emotions, even for the emotional expressions not considered to be universally recognized.
Reference Event: ACII 2009, Amsterdam: September 10-12, 2009

By E.Vildjiounaite, V.Kyllönen, O.Vuorinen, S.Mäkelä, T.Keränen, M.Niiranen, J.Knuutinen, J.Peltola [VTT]
Requirements and Software Framework for Adaptive Multimodal Affect Recognition

ABSTRACT: This work presents a software framework for real time multimodal affect recognition. The framework supports categorical emotional models and simultaneous classification of emotional states along different dimensions. The framework also allows to incorporate diverse approaches to multimodal fusion, proposed by the current state of the art, as well as to adapt to context-dependency of expressing emotions and to different application requirements. The results of using the framework in audio-video based emotion recognition of an audience of different shows (this is a useful information because emotions of co-located people affect each other) confirm the capability of the framework to provide desired functionalities conveniently and demonstrate that use of contextual information increases recognition accuracy.
Reference Event:ACII 2009, Amsterdam: September 10-12, 2009

By J. Wagner, E. André, F.Jung [UOA]
Smart sensor integration: A framework for multimodal emotion recognition in real-time

ABSTRACT: Affect sensing by machines has been argued as an essential part of next-generation human-computer interaction (HCI). To this end, in the recent years a large number of studies have been conducted, which report automatic recognition of emotion as a difficult, but feasible task. However, most effort has been put towards offline analysis, whereas to date only few applications exist, which are able to react to a users emotion in real-time. In response to this deficit we introduce a framework we call Smart Sensor Integration (SSI), which considerably jump-starts the development of multimodal online emotion recognition (OER) systems. In particular SSI supports the pattern recognition pipeline by offering tailored tools for data segmentation, feature extraction, and pattern recognition, as well as, tools to apply them offline (training phase) and online (real-time recognition). Furthermore, it has been designed to handle input from various input modalities and to suit the fusion of multimodal information.
DOWNLOAD PAPER: published in Affective Computing and Intelligent Interaction (ACII 2009), 2009; link
Reference Event: ACII 2009, Amsterdam: September 10-12, 2009

By A.Osherenko, E. Andre', T.Vogt [UOA]
Affect sensing in speech: Studying fusion of linguistic and acoustic features

ABSTRACT: Recently, there has been considerable interest in the recognition of affect in language. In this paper, we investigate how information fusion using linguistic (lexical, stylometric, deictic) and acoustic information can be utilized for this purpose and present a comprehensive study of fusion. We examine fusion at the decision level and the feature level and discuss obtained results
DOWNLOAD PAPER: Proceedings of Affective Computing and Intelligent Interaction ACII 2009, link, presentation
Reference Event:ACII 2009, Amsterdam: September 10-12, 2009

By A.Osherenko, E. Andre' [UOA]
Differentiated semantic analysis in lexical affect sensing

ABSTRACT: Recently, there has been considerable interest in the recognition of affect from written and spoken language. In this paper, we describe an approach to lexical affect sensing that performs a semantic analysis of texts utilizing comprehensive grammatical information. Hereby, the proposed approach differentiates affect of many classes. In addition, this paper reports on obtained results and discusses them.
DOWNLOAD PAPER: Proceedings of Affective Computing and Intelligent Interaction ACII 2009,linkpresentation
Reference Event:ACII 2009, Amsterdam: September 10-12, 2009

By A.Osherenko[UOA]
EmoText: Applying differentiated semantic analysis in lexical affect sensing (Demo)
ABSTRACT: Recently, there has been considerable interest in the recognition of affect from written and spoken language. We developed a computer system that implements a semantic approach to lexical affect sensing. This system analyses English sentences utilizing grammatical interdependencies between emotion words and intensifiers of emotional meaning.
Reference Event:ACII 2009, Amsterdam: September 10-12, 2009

By L.Liikkanen , G.Jacucci , M.Helin  [TKK]
ElectroEmotion – A Tool for Producing Emotional Corpora Collaboratively

ABSTRACT: Emotion-aware applications supporting natural interaction are currently still a vision. One difficulty in developing these applications is the lack of multimodal corpora suitable for multiple use contexts, such as public spaces. Here, we introduce ElectroEmotion, a research tool prototype for collecting vocal and gestural corpora in novel contexts collaboratively. ElectroEmotion concept includes a public walk-up-anduse interface that allows users to produce multimodal expressions in an interactive environment. We describe the design of this system and report an experimental study, which evaluated the importance of inducting emotion and social influences in corpus acquisition. This preliminary investigation involved 12 users. By performing a video-based interaction analysis, we found that the participants demonstrated spontaneous multimodal activity and more distinctively emotional expressions in response to the emotion induction procedure. Social learning through examples provided by the experimenter influenced the way the subjects interacted. From these observations, we believe that the proposed concept could be developed into a functional system that can help to produce emotional corpora.
DOWNLOAD PAPER: Proceedings of Affective Computing and Intelligent Interaction ACII 2009, link
Reference Event:ACII 2009, Amsterdam: September 10-12, 2009

By T.Vogt, E.Andre'[UOA]
Exploring the benefits of discretization of acoustic features for speech emotion recognition

ABSTRACT: We present a contribution to the Open Performance subchallenge of the INTERSPEECH 2009 Emotion Challenge. We evaluate the feature extraction and classifier of EmoVoice, our framework for real-time emotion recognition from voice on the challenge database and achieve competitive results. Furthermore, we explore the benefits of discretizing numeric acoustic features and find it beneficial in a multi-class task
DOWNLOAD PAPER: Published in Proceedings of 10th Conference of the International Speech Communication Association INTERSPEECH 2009, pp. 328-331, link
Reference Event:INTERSPEECH 2009, Brighton, September 6-10,2009

By C.Pelachaud [TCOM]
Les Emotions dans l'interaction homme-machine

ABSTRACT: Dans ce chapitre nous présentons les trois grandes théories des émotions sur lesquelles reposent les modèles computationnels. Nous mettons en évidence la complexité du phénomène lié aux émotions et son implication dans le développement de systèmes émotionnels. Les modèles aussi bien théoriques que computationnels doivent aussi reposer sur des données. Nous exposons les difficultés liées au recueil de telles données ainsi qu’à leur annotation. Des modèles dans le domaine de la perception, de l’interaction et de la génération sont ensuite décrits. Finalement nous nous intéressons à un élément particulier de l’interaction émotionnelle, l’agent conversationnel animé. Après en avoir donné une définition, nous présentons les modèles computationnels pour obtenir un comportement expressif d’un agent non impulsif, c'est-à-dire capable de considérer son environnement social
DOWNLOAD ARTICLE: (to appear) published on Informatique et Sciences Cognitives : influences ou confluences?, C. Garbay and D. Kaiser (Eds.)

By R.Niewiadomski , S.Hyniewska , C.Pelachaud [TCOM]
Modelisation des expressions faciales des emotions

ABSTRACT : Ces dernières années, on observe un intérêt croissant pour le développement d’agents conversationnels animés (ACA) exprimant des émotions. Les ACAs sont des entités, des logiciels, capables de communiquer de façon autonome avec un usager, que ce soit à travers des modes verbaux ou non-verbaux. L’intérêt pour le développement d’une expressivité affective crédible chez les ACAs est motivé par la recherche de l’amélioration de l’interaction homme-machine. Pour être capable d’exprimer des émotions, l’agent doit avoir accès à un modèle déterminant une communication pouvant être comprise par les humains ainsi que d’avoir des capacités techniques de communiquer non-verbalement (...) Quelques approches théoriques du domaine de la psychologie affective et la manière dont certaines de ces théories ont contribué à la modélisation de comportements faciaux d’agents conversationnels sont présentés dans les sections suivantes. En particulier nous présentons comment les théories des émotions discrètes, celles des émotions dimensionnelles et les théories componentielles des émotions traitent les expressions faciales dans le processus complexe que sont les émotions. Les modèles computationnels pour déterminer les expressions des ACAs sont décrits suivant les modèles théoriques sur lesquels ils reposent. Le chapitre conclut en rapportant des travaux sur les expressions correspondant aux mélanges d’émotions et tenant en compte les contraintes sociales. (...).
DOWNLOAD ARTICLE: (to appear) Systemes d’Interaction Emotionnelle, (Ed.) C. Pelachaud

By S.Asteriadis , K.Karpouzis  and S.Kollias  [ICCS]
Feature Extraction and Selection for inferring user engagement in an HCI environment

ABSTRACT: We present our work towards estimating the engagement of a person to the displayed information of a computer monitor. Deciding whether a user is attentive or not, and frustrated or not, helps adapting the displayed information of a computer in special environments, such as e-learning. The aim of the current work is the development of a method that can work user-independently, without necessitating special lighting conditions and with only requirements in terms of hardware, a computer and a web-camera.
Reference Event: HCI San Diego, 2009

By J.Urbain [FPMS], E.Bevacqua, T.Dutoit, A.Moinet, R.Niewiadomski, C.Pelachaud [TCOM], B.Picart, J.Tilmanne, and J.Wagner [UoA]
AVLaughterCycle: An audiovisual laughing machine
ABSTRACT: The AVLaughterCycle project aims at developing an audiovisual laughing machine, capable of recording the laughter of a user and to respond to it with a machine-generated laughter linked with the input laughter. During the project, an audiovisual laughter database was recorded, including facial points tracking, thanks to the Smart Sensor Integration software developed by the University of Augsburg. This tool is also used to extract audio features, which are sent to a module called MediaCycle, evaluating similarities between a query input and the files in a given database. MediaCycle outputs a link to the most similar laughter, sent to Greta, an Embodied Conversational Agent, who displays the facial animation corresponding to the laughter simultaneously with the audio laughter playing.
DOWNLOAD PAPER: published on Proc of The 5th International Summer School on Multimodal Interfaces, eNTERFACE 09, Genoa, Italy, 2009; link, project abstract, presentation
Reference event: 5th International Summer School on Multimodal Interfaces, eNTERFACE 09, Genoa, Italy, 2009 and Laughter Workshop 2009, Berlin: February 27, 2009

By J.Urbain , S.Dupont [FPMS], T.Dutoit , R.Niewiadomski , C. Pelachaud [TCOM]
Towards a virtual agent using similarity-based laughter production

ABSTRACT: In this abstract we present a collaborative project on creating a laughing machine. The machine can automatically detect laughter. After clustering it, the machine finds the closest similar laughter that is then synthesized acoustically and visually by a virtual agent.
Reference event: Laughter Workshop 2009, Berlin: February 27, 2009

By G.Jacucci [TKK], A.Spagnolli [HTLab] , A.Chalambalakis [HTLab] , A. Morrison [TKK], L.Liikkanen [TKK], S.Roveda [SAZ] , M.Bertoncini [ENG]
Bodily Explorations in Space: Social Experience of a Multimodal Art Installation

ABSTRACT: We contribute with an extensive field study of a public interactive art installation that applies multimodal interface technologies. The installation is part of a Theater production on Galileo Galilei and includes: projected galaxies that are generated and move according to motion of visitors changing colour depending on their voices; projected stars that configure themselves around shadows of visitors. In the study we employ emotion scales (PANAS), qualitative analysis of questionnaire answers and video-recordings. PANAS rates indicate dominantly positive feelings, further described in the subjective verbalizations as gravitating around interest, ludic pleasure and transport. Through the video analysis, we identified three phases in the interaction with the artwork (circumspection, testing, play) and two pervasive features of these phases (experience sharing and imitation), which were also found in the verbalizations. Both video and verbalisations suggest that visitor’s experience and ludic pleasure are rooted in the embodied, performative interaction with the installation, and is negotiated with the other visitors.
Reference Event: 24-28/Aug/2009 CALLAS at Interact2009, Uppsala(Sweden)

By A. Badii, A.Khan, D.Fuschi [UOR]
One's own soundtrack: affective music synthesis
ABSTRACT: Computer music usually sounds mechanical; hence, if musicality and music expression of virtual actors could be enhanced according to the user’s mood, the quality of experience would be amplified. We present a solution that is based on improvisation using cognitive models, case based reasoning (CBR) and fuzzy values acting on close-to-affect-target musical notes as retrieved from CBR per context. It modifies music pieces according to the interpretation of the user’s emotive state as computed by the emotive input acquisition componential of the CALLAS framework. The CALLAS framework incorporates the Pleasure-Arousal-Dominance (PAD) model that reflects emotive state of the user and represents the criteria for the music affectivisation process. Using combinations of positive and negative states for affective dynamics, the octants of temperament space as specified by this model are stored as base reference emotive states in the case repository, each case including a configurable mapping of affectivisation parameters. Suitable previous cases are selected and retrieved by the CBR subsystem to compute solutions for new cases, affect values from which control the music synthesis process allowing for a level of interactivity that makes way for an interesting environment to experiment and learn about expression in music.
Reference Event: 13-14/July/09 CALLAS at EMCIS2009, Izmir (Turkey)

By J.Urbain , T.Dubuisson, S.Dupont, C.Frisson , R.Sebbe , N.D’Alessandro [FPMS]
Audiocycle: a similarity-based visualization of musical libraries
ABSTRACT: This paper presents AudioCycle, a prototype application for browsing through music loop libraries. AudioCycle provides the user with a graphical view where the audio extracts are visualized and organized according to their similarity in terms of musical properties, such as timbre, harmony, and rhythm. The user is able to navigate in this visual representation and listen to individual audio extracts, as well as query the database by providing audio examples. AudioCycle draws from a range of technologies, including audio analysis from music information retrieval research, 3D visualization, spatial auditory rendering, audio time-scaling and pitch modification. The proposed approach extends on previously described music and audio browsers. Possible extension to multimedia libraries is also suggested.
DOWNLOAD PAPER: Published on Proceedings of the IEEE 2009 International Conference on Multimedia Expo, pp.1847-1848, link to ACM
Reference Event: ICME 2009, New York: June 28 – July 3, 2009

By L.Malatesta , A.Raouzaiou , K.Karpouzis [ICCS] L.Pearce [XIM]
Affective Interface Adaptations in the Musickiosk Interactive Entertainment Application
ABSTRACT: The current work presents the affective interface adaptations in the Musickiosk application. Adaptive interaction poses several open questions since there is no unique way of mapping affective factors of user behaviour to the output of the system. Musickiosk uses a non-contact interface and implicit interaction through emotional affect rather than explicit interaction where a gesture, sound or other input directly maps to an output behaviour - as in traditional entertainment applications. PAD model is used for characterizing the different affective states and emotions.
DOWNLOAD ARTICLE: Published on Intelligent Technologies for Interactive Entertainment. Proceedings of the 3th International Conference INTETAIN 2009, pp 102-109, link
Reference Event: 22-24/Jun/2009 CALLAS at INTETAIN'09

By J.Kim , J.Wagner , T.Vogt , E.Andre , F.Jung , M.Rehm  [UoA]
Emotional Sensitivity in Human-Computer Interaction
ABSTRACT: Human conversational partners usually try to interpret the speaker's or listener's a ective cues and respond to them accordingly. Recently, the modelling and simulation of such behaviours has been recognized as an essential factor for more natural man-machine communication. So far, research on emotion recognition has mostly dealt with oine analysis of recorded emotion corpora, and online processing has hardly been addressed. Online processing is, however, a necessary prerequisite for the realization of human-computer interfaces that analyze and respond to the user's emotions while he or she is interacting with an application. In this paper, we rst describe how we recognize emotions from various modalities including speech, gestures and biosignals. We then present Smart Sensor Integration (SSI), a framework which we developed to meet the speci c requirements of online emotion recognition.
DOWNLOAD ARTICLE: Publisher :Oldenbourg Wissenschaftsverlag, Print ISSN: 1611-2776 Volume: 51, 06/2009 Pages: 325 - 328, link

By D.Arnone , A.Rossi , M.Bertoncini  [ENG]
An Open Source Integrated Framework for Rapid Prototyping of Multimodal Affective Applications in Digital Entertainment
ABSTRACT: The development of applications relying on multimodal interfaces is becoming more and more an emerging area of interest in the Digital Art and Entertainment Domain. This paper aims at proposing a new approach to the integration of multimodal modules that are able to gather the emotional states of the audience, to process them and to perform an emotional output. The proposed approach is being developed inside the FP6 EU co-funded CALLAS Project and finds its concretization in a Framework and a set of integrated multimodal components.
DOWNLOAD ARTICLE: published on Journal of Multimodal User Interfaces, link
Reference Event: position paper 13/May/09 CALLAS at OI2009, Bonn (Germany)

By R.Niewiadomski , E.Bevacqua , M.Mancini , C.Pelachaud [TCOM]
Greta: an interactive expressive ECA system
ABSTRACT: We have developed a general purpose use and modular architecture of an Embodied Conversational Agent (ECA) called Greta. Our 3D agent is able to communicate using verbal and nonverbal channels like gaze, head and torso movements, facial expressions and gestures. It follows the SAIBA framework [10] and the MPEG4 [6] standards. Our system is optimized to be used in interactive applications.
DOWNLOAD PAPER: published on Proceedings of the Eighth International Conference on Autonomous Agents and Multiagent Systems, Budapest, 2009
Reference Event: AAMAS 2009, Budapest: May 14, 2009

By E.Bevacqua , K.Prepin , Sevin , R.Niewiadomski , C. Pelachaud [TCOM]
Reactive behaviors in SAIBA architecture
ABSTRACT: In this paper we propose an extension of the current SAIBA architecture. The new parts of the architecture should manage the generation of Embodied Conversational Agents’ reactive behaviors during an interaction with users both while speaking and listening.
DOWNLOAD PAPER: published on Proceedings of the Eighth International Conference on Autonomous Agents and Multiagent Systems, Budapest, 2009, link
Reference Event: AAMAS 2009, Budapest: May 14, 2009

By F.Charles, D.Pizzi, M.Cavazza [TEES], T.Vogt, E.André [UOA]
EmoEmma: Emotional Speech Input for Interactive Storytelling (Demo Paper)
ABSTRACT:Whilst techniques for narrative generation and agent behaviour have made significant progress in recent years, natural language processing remains a bottleneck hampering the scalability of Interactive Storytelling systems. This demonstrator introduces a novel interaction technique based solely on emotional speech recognition. It allows the user to use speech to interact with virtual actors without any constraints on style or expressivity, by mapping the recognised emotional categories to narrative situations and virtual characters feelings.
Reference Event: AAMAS 2009, Budapest: May 14, 2009

By K.Karpouzis, I.Maglogiannis [ICCS]
Modeling and delivering heterogeneous audiovisual content for group consumption 
ABSTRACT: The abundance of broadcast material, especially when it becomes available from a variety of content providers, makes the choice of a program genre and adaptation of presentation options to the preferences of a user, a welcome feature of the modern-day TV viewing experience. From a technical point of view, assembling and transmitting such heterogeneous content is in itself a daunting task, especially when intellectual property rights issues should be tackled. In addition to this, while there are a lot of options for filtering content with respect to the preferences of a single user, the common or aggregated choice of a group is hardly ever taken into account. Considering the fact that TV viewing and multimedia consumption in general are essentially a social activity, systems which package, filter and rank the available content or propose similar content to what is currently viewed should also integrate mechanisms to model group dynamics. This paper presents an integrated, end-to-end architecture which assembles multimedia material, respecting the IPR of the content provider, and delivers it to a client-side mechanism which considers the preferences of all the viewers currently watching to filter and rank the available programs. In order to respect the established methods of producing content, this system utilizes concepts from adopted standards (MPEG 7, MPEG 21) to model processes and represent data and relations between the different entities of the system.
DOWNLOAD ARTICLE: Special issue of the Signal, Image and Video Processing, link
By K.Karpouzis [ICCS]
Multimodal Emotion Recognition in HCI Environments
ABSTRACT:Research on computational models of emotion and emotion recognition has been in the forefront of interest for more than a decade. The abundance of non-intrusive sensors (mainly cameras and microphones), data and ubiquitous computing power caters for real-time results in areas where this was deemed impossible a few years ago. As a result, emotion recognition and peripheral or related issues (body and hand gesture and gait analysis, speech recognition, eye gaze and head pose related to attention estimation in multi-person environments, etc.) can now benefit from the available resources, as well as the interest shown to these applications from major authorities in psychology like K. Scherer and P. Ekman. In addition to this, several research initiatives in the EU (TMR, FP5, 6 and 7: ICT, e-Health, Technology-Enhanced Learning and recently Digital Content and Libraries) promote research in this field and encourage researchers to establish strong connections with theoretician via Networks of Excellence (Humaine, Similar, SSPNET) as well as provide tangible results of application that benefit from this technology (IP Callas, STREP Feelix-Growing, STREP Agent-Dysl, etc.) Another indication of the interest in emotion-related research is the fact that papers on affective computing appear in more than 90 conferences across disciplines and almost 30 special issues in high-impact journals have been published or prepared; the momentum is such that more than 500 researchers participate in the Humaine Association , a follow-up initiative of the Humaine Network of Excellence which also plans to produce a journal on related topics in association with the IEEE.
Reference event: Invited speech (Tutorial) at AIAI 2009, Thessaloniki: April 24, 2009

By Pelachaud C. [TCOM]
Modelling multimodal expression of emotion in a virtual agent
ABSTRACT:Over the past few years we have been developing an expressive embodied conversational agent system. In particular, we have developed a model of multimodal behaviours that includes dynamism and complex facial expressions. The first feature refers to the qualitative execution of behaviours. Our model is based on perceptual studies and encompasses several parameters that modulate multimodal behaviours. The second feature, the model of complex expressions, follows a componential approach where a new expression is obtained by combining facial areas of other expressions. Lately we have been working on adding temporal dynamism to expressions. So far they have been designed statically, typically at their apex. Only full-blown expressions could be modelled. To overcome this limitation, we have defined a representation scheme that describes the temporal evolution of the expression of an emotion. It is no longer represented by a static definition but by a temporally ordered sequence of multimodal signals.
Reference event: 2nd COST 2102 International Training School on Development of Multimodal Interfaces: Active Listening and Synchrony, Dublin, Ireland, 23-27 March 2009
By N.Bee ,B.Falk , E.Andre' [UOA]
Simplified facial animation control utilizing novel input devices: a comparative study
ABSTRACT: Editing facial expressions of virtual characters is quite a complex task. The face is made up of many muscles, which are partly activated concurrently. Virtual faces with human expressiveness are usually designed with a limited amount of facial regulators. Such regulators are derived from the facial muscle parts that are concurrently activated. Common tools for editing such facial expressions use slider-based interfaces where only a single input at a time is possible. Novel input devices, such as gamepads or data gloves, which allow parallel editing, could not only speed up editing, but also simplify the composition of new facial expressions. We created a virtual face with 23 facial controls and connected it with a slider-based GUI, a gamepad, and a data glove. We first conducted a survey with professional graphics designers to find out how the latter two new input devices would be received in a commercial context. A second comparative study with 17 subjects was conducted to analyze the performance and quality of these two new input devices using subjective and objective measurements
DOWNLOAD PAPER: Published on Proceedings of the 13th international conference on Intelligent user interfaces (Sanibel Island, Feb. 8-11, 2009), pp. 197-206
Reference Event: IUI 2009, Sanibel Island: February 10, 2009 

By R.Niewiadomski, M. Mancini, S. Hyniewska, C. Pelachaud[TCOM]
Communicating emotional states with the Greta agent
ABSTRACT: (…) In this Chapter we present our embodied conversational agent called Greta and focus on its capabilities of generating emotional expressive behaviours. Our 3D agent is able to communicate using the verbal and nonverbal channels like gaze, head and torso movements, facial expressions, and gestures. It follows the SAIBA framework that defines functionalities and communication protocols for ECA systems. The system generates the output animation in MPEG-4 standard. Our system is optimised to be used in interactive applications. It has a rich repertoire of expressive emotional behaviours (…)
DOWNLOAD ARTICLE: (to appear) on Scherer, K.R., Bänziger, T., & Roesch, E. (Eds.) A Blueprint for an affectively competent agent: Cross-fertilization between Emotion Psychology, Affective Neuroscience, and Affective Computing. Oxford: Oxford University Press
YEAR 2008

By C.Doukas , I.Maglogiannis , K.Karpouzis  [ICCS]
Context-Aware Medical Content Adaptation through Semantic Representation and Rules Evaluation
ABSTRACT: Proper coding and transmission of medical and physiological data is a crucial issue for the effective deployment and performance of telemedicine services. The paper presents a platform for performing proper medical content adaptation based on context awareness. Sensors are used in order to determine the status of a patient being monitored through a medical network. Additional contextual information regarding the patient’s environment (e.g. location, data transmission device and underlying network conditions, etc.) are represented through an ontological knowledge base model. Rule-based evaluation determines proper content (i.e biosignals, medical video and audio) coding and transmission of medical data, in order to optimize the telemedicine process. The paper discusses the design of the ontological model and provides an initial assessment.
DOWNLOAD PRESENTATION: link, presentation
Reference Event: 15-16/Dec/2008 CALLAS at SMAP'08

By T.Vogt, E.André and J.Wagner [UOA]
Automatic Recognition of Emotions from Speech: a Review of the Literature and Recommendations for Practical Realisation
ABSTRACT: In this article we give guidelines on how to address the major technical challenges of automatic emotion recognition from speech in human-computer interfaces, which include audio segmentation to find appropriate units for emotions, extraction of emotion relevant features, classification of emotions, and training databases with emotional speech. Research so far has mostly dealt with offline evaluation of vocal emotions, and online processing has hardly been addressed. Online processing is, however, a necessary prerequisite for the realization of human-computer interfaces that analyze and respond to the user’s emotions while he or she is interacting with an application. By means of a sample application, we demonstrate how the challenges arising from online processing may be solved. The overall objective of the paper is to help readers to assess the feasibility of human-computer interfaces that are sensitive to the user’s emotional voice and to provide them with guidelines of how to technically realize such interfaces.
DOWNLOAD ARTICLE:  Springerling Affect and Emotion in Human-Computer Interaction, Christian Peter, Russell Beale, Springer, Heidelberg, Germany, 2008 link

By S.Asteriadis, P.Tzouveli, K. Karpouzis, S. Kollias [ICCS]
Estimation of behavioral user state based on eye gaze and head pose - application in an e-learning environment : article to be published on Multimedia Tools and Applications journal (Elsevier)
ABSTRACT: Most e-learning environments which utilize user feedback or profiles, collect such information based on questionnaires, resulting very often in incomplete answers, and sometimes deliberate misleading input. In this work, we present a mechanism which compiles feedback related to the behavioral state of the user (e.g. level of interest) in the context of reading an electronic document; this is achieved using a non-intrusive scheme, which uses a simple web camera to detect and track the head, eye and hand movements and provides an estimation of the level of interest and engagement with the use of a neuro-fuzzy network initialized from evidence from the idea of Theory of Mind and trained from expert-annotated data. The user does not need to interact with the proposed system, and can act as if she was not monitored at all. The proposed scheme is tested in an e-learning environment, in order to adapt the presentation of the content to the user profile and current behavioral state. Experiments show that the proposed system detect reading- and attention-related user states very effectively, in a testbed where children’s reading performance is tracked.

By S. Gilroy, M. Cavazza, R. Chaignon,[TEES] S.-M. Mäkelä, M. Niiranen,[VTT] E. André, T. Vogt [UOA], J. Urbain [FPMS], H. Seichter, M. Billinghurst [HIIT] and M. Benayoun [Citu, Université Paris 1 Panthéon-Sorbonne]
An affective model of user experience for interactive art
ABSTRACT:The development of Affective Interface technologies makes it possible to envision a new generation of Digital Arts and Entertainment applications, in which interaction will be based directly on the analysis of user experience. In this paper, we describe an approach to the development of Multimodal Affective Interfaces that supports real-time analysis of user experience as part of an Augmented Reality Art installation. The system relies on a PAD dimensional model of emotion to support the fusion of affective modalities, each input modality being represented as a PAD vector. A further advantage of the PAD model is that it can support a representation of affective responses that relate to aesthetic impressions
DOWNLOAD ARTICLE: Published in ACM International Conference Proceeding Series; Vol. 352. Proceedings of the 2008 International Conference in Advances on Computer Entertainment Technology, pp. 107-110, link
Reference Event: 1-3/Dec/08 CALLAS at ACE 2008, Yokohama, Japan

By Kajastila R., Takala T [TKK]
Interaction in Digitally Augmented Opera
ABSTRACT: This paper reviews an experimental opera production, where digitally augmented content was used interactively during the performance. Projected graphics and spatialized sounds were designed to support the story. The animated 3D graphics acted as virtual stage, narrative element, or reflection of thoughts of a character. The special effects were partly under performers’ direct control, which allows natural timing and gives more freedom for artistic expression.
DOWNLOAD PAPER: link (slow download of 327 pages of all conference proceedings)
Reference Event: 7-8/Nov/08: CALLAS at ARTECH2008, Porto, Portugal
By S.W. Gilroy, M.Cavazza, R.Chaignon [TEES] S.Mäkelä, M.Niiranen[VTT], T.Vogt, E.André [UOA], M.Billinghurst, H.Seichter [HITNZ] and M.Benayoun artist
E-Tree: Emotionally Driven Augmented Reality Art
ABSTRACT:In this paper, we describe an Augmented Reality Art installation, which reacts to user behaviour using Multimodal analysis of affective signals. The installation features a virtual tree, whose growth is influenced by the perceived emotional response from spectators. The system implements a ‘magic mirror’ paradigm (using a large-screen display or projection system) and is based on the ARToolkit with extended representations for scene graphs. The system relies on a PAD dimensional model of affect to support the fusion of different affective modalities, while also supporting the representation of affective responses that relate to aesthetic impressions. The influence of affective input on the visual component is achieved by mapping affective data to an L-System governing virtual tree behaviour. We have performed an early evaluation of the system, both from the technical perspective and in terms of user experience. Post-hoc questionnaires were generally consistent with data from multimodal affective processing, and users rated the overall experience as positive and enjoyable, regardless of how proactive they were in their interaction with the installation.
DOWNLOAD PAPER: Published in Proceedings of the 16th International Conference on Multimedia 2008, Vancouver, British Columbia, Canada, October 26-31, 2008. ACM 2008: pp.945-948; link
Reference Event: 27-31/Oct/08: CALLAS at ACM2008, Vancouver, Canada
By R.Niewiadomski[Par8], M.Ochs[Universite' de Paris 6], C.Pelachaud [Par8]
Using Facial Expressions to Display Empathy in ECAs
ABSTRACT: In this paper, we propose and evaluate a novel approach for the expressions of empathy using facial expressions like superposition and masking. Compared to other expressive empathic agents our agent uses two types of facial expressions simple and complex ones. By simple facial expressions we intend spontaneous facial displays of emotional states (which can be described by one-word label) e.g. display of anger or contempt. The term complex facial expressions describes expressions that are the combinations of several simple facial displays (e.g. superposition of two emotions) or that are modi ed voluntarily by the displayer (e.g. masking of one emotion by another one). Aiming at finding the appropriate facial expression of an empathic ECA we have examined both types of expressions in empathic situations. The results of the evaluation show that people nd more suitable facial ex pressions that contain elements of the emotion of empathy. In particular, complex facial expressions appaear to be a good approach to express empathy.
Reference Event: 27-29/Oct/2008: Summer School Grenoble

By L. Malatesta, A. Raouzaiou, K. Karpouzis [ICCS
Affective intelligence: the human face of AI
ABSTRACT: Affective computing has been an extremely active research and development area for some years now, with some of the early results already starting to be integrated in human-computer interaction systems. Driven mainly by research initiatives in Europe, USA and Japan and accelerated by the abundance of processing power and low-cost, unintrusive sensors like cameras and microphones, affective computing functions in an interdisciplinary fashion, sharing concepts from diverse fields, such as signal processing and computer vision, psychology and behavioral sciences, human-computer interaction and design, machine learning, and so on. In order to form relations between low-level input signals and features to high-level concepts such as emotions or moods, one needs to take into account the multitude of psychology and representation theories and research findings related to them and deploy machine learning techniques to actually form computational models of those. This chapter elaborates on the concepts related to affective computing, how these can be connected to measurable features via representation models and how they can be integrated into human-centric applications.
DOWNLOAD PAPER from State-of-the-art in AI, IFIP book of Springer in the field of Human-Computer Interaction

G.Jacucci [TKK], S.Roveda , D.Tonguet [SAZ], T.Takala 
Presence in Performing Digital Art
DOWNLOAD PAPER: Poster session published in Proceedings of the 11th Annual International Workshop on Presence, link Reference Event:Presence 2008, Padova: October 17, 2008

By Christopher Peters [Coventry University], S.Asteriadis, K.Karpouzis[ICCS], E. de Sevin [INRIA Paris-Rocquencourt]
Towards a Real-time Gaze-based Shared Attention for a Virtual Agent
ABSTRACT. This paper investigates work towards a real-time user interface for testing shared-attention behaviours with an embodied conversational agent. In two-party conversations, shared attention, and related aspects such as interest and engagement, are critical factors in gaining feedback from the other party and allowing an awareness of the general state of the interaction. Taking input from a single standard webcamera, our preliminary system is capable of processing the users eye and head directions in real-time. We are using this detection capability to inform the interaction behaviours of the agent and enable it to engage in simple shared attention behaviours with the user and objects within the scene in order to study in more depth some critical factors underpinning engagement.
Reference Event: 24/Oct/08: AFFINE workshop

By M.Rehm, N. Bee, E.André [UOA]
Wave Like an Egyptian — Accelerometer Based Gesture Recognition for Culture Specific Interactions
ABSTRACT:The user’s behavior and his interpretation of interactions with others is influenced by his cultural background, which provides a number of heuristics or patterns of behavior and interpretation. This cultural influence on interaction has largely been neglected in HCI research due to two challenges: (i) grasping culture as a computational term and (ii) infering the user’s cultural background by observable measures. In this paper, we describe how the Wiimote can be utilized to uncover the user’s cultural background by analyzing his patterns of gestural expressivity in a model based on cultural dimensions. With this information at hand, the behavior of an interactive system can be adapted to culture-dependent patterns of interaction.
DOWNLOAD PAPER from Proceedings of HCI 2008 Culture, Creativity, Interaction
Reference Event: 1-5/Sept/08: CALLAS at HCI2008, Liverpool, UK
By E.Bevacqua, M.Mancini, C.Pelachaud [PAR8]
A listening agent exhibiting personality traits 
ABSTRACT:Within the Sensitive Artifcial Listening Agent project, we propose a system that computes the behaviour of a listening agent. Such an agent must exhibit behaviour variations depending not only on its mental state towards the interaction (e.g., if it agrees or not with the speaker) but also on the agent's characteristics such as its emotional traits and its behaviour style. Our system computes the behaviour of the listening agent in real-time
DOWNLOAD PAPER published on Proceedings of IVA 2008, link 
Reference Event: 1-5/Sept/08: CALLAS at IVA2008, Tokyo, Japan
By R.Niewiadomski, M.Ochs, C.Pelachaud [PAR8]
Expressions of Empathy in ECAs
ABSTRACT:Recent research has shown that empathic virtual agents enable to improve human-machine interaction. Virtual agent's expressions of empathy are generally xed intuitively and are not evaluated. In this paper, we propose a novel approach for the expressions of empathy using complex facial expressions like superposition and masking. An evaluation study have been conducted in order to identify the most appropriate way to express empathy. According to the evaluation results people and more suitable facial expressions that contain elements of emotion of empathy. In particular, complex facial expressions seem to be a good approach to express empathy.
Reference Event: 1-5/Sept/08: CALLAS at IVA2008, Tokyo, Japan
By S.Asteriadis, P.Tzouveli, K.Karpouzis, S.Kollias [ICCS]
A non-intrusive method for user focus of attention estimation in front of a computer monitor
ABSTRACT: In this work, we present a system that estimates a user’s focus of attention in front of a computer monitor. The only requirements of the system are a simple web camera, and a software that detects and tracks the user’s head position and eye movements. Based on a Machine Learning algorithm, the system can give real time results regarding user’s attention or non-attention, by combining information coming both from eye gaze and head pose, as well as user’s distance from the monitor. The advantages of our system are that it is completely un-intrusive and no special hardware (such as infrared cameras or wearable devices) is needed. Furthermore, it adjusts to every user, thus not necessitating initial calibration, and can work under real and unconstrained conditions in terms of lighting.
Reference Event: IEEE International Conference on Automatic Face and Gesture Recognition 17-19/Sept/08: FG2008

By J. Kim, J. Wagner, M. Rehm, and E. André [University of Augsburg]
Bi-channel Sensor Fusion for Automatic Sign Language Recognition
ABSTRACT:In this paper, we investigate the mutual-complementary functionality of accelerometer (ACC) and electromyogram (EMG) for recognizing seven word-level sign vocabularies in German Sign Language (GSL). Using feature-level fusion of the bi-channel sensor data, we achieved an average accuracy of 99.82%for eight subjects and 88.75%for subject independent case. Most relevant features for all subjects are extracted and their universal effectiveness is proven with an average accuracy of 96.31% for the subjects. Finally we discuss a problem of feature-level fusion caused by high disparity between accuracies of each single channel classification.
Reference Event: IEEE International Conference on Automatic Face and Gesture Recognition 17-19/Sept/08: FG2008

By L.Liikkanen [TKK], L. Pearce [XIM]
MusicKiosk: When listeners become composers. An exploration into affective, interactive music
ABSTRACT: We present a case study of an interactive, assisted composition system called MusicKiosk. The system creates a composition based on the emotional states detected from users’ voices. The experience is augmented by visualizing the music with interactive, animated characters. Custom made musical elements are added or removed dynamically according to the detected mood. The input for emotion detection is derived from the fusion of emotional speech recognition and keyword spotting. In upcoming user evaluation, we will use this system to explore natural interaction and the capacity of the system to create emotional feedback loops
Reference Event: 25-29/Aug/08: CALLAS at ICMPC2008, Sapporo, Japan

By S. Al Moubayed [KTH, SWEDEN], M. Baklouti (Thalès, FRANCE), M. Chetouani [PAR8], T. Dutoit [FPMS], A. Mahdhaoui [PAR8], J.-C. Martin [LIMSI – FRANCE], S. Ondas [Technical University of Košice – SLOVAKIA], C. Pelachaud [INRIA), J. Urbain [FPMS], M. Yilmas [Koc University – TURKEY]
Multimodal Feedback from Robots and Agents in a Storytelling Experiment
ABSTRACT: In this project, which lies at the intersection between Human-Robot Interaction (HRI) and Human- Computer Interaction (HCI), we have examined the design of an open-source, real-time software platform for controlling the feedback provided by an AIBO robot and/or by the GRETA Embodied Conversational Agent, when listening to a story told by a human narrator. Based on ground truth data obtained from the recording and annotation of an audio-visual storytelling database, and containing various examples of human-human storytelling, we have implemented a proof-ofconcept ECA/Robot listening system. As a narrator input, our system uses face and head movement analysis, as well as speech analysis and speech recognition; it then triggers listening behaviors from the listener, using probabilistic rules based on the co-occurrence of the same input and output behaviors in the database. We have finally assessed our system in terms of the homogeneity of the database annotation, as well as regarding the perceived quality of the feedback provided by the ECA/robot
DOWNLOAD PAPER: Proceedings of eNTERFACE 2008: Project 7
Reference event: eNTERFACE Summer School 2008,  Paris 4-29 August 2008
By I.Buonazia [SNS] M.Bertoncini [ENG]
Emotional Interfaces in Performing Arts: The CALLAS Project
ABSTRACT: CALLAS project aims at designing and developing an integrated multimodal architecture able to include emotional aspects to support applications in the new media business scenario with an “ambient intelligence” paradigm. The project is structured in three main areas: the "Shelf", collecting multimodal affective components (speech, facial expression and gesture recognition); the "Framework", a software infrastructure enabling the cooperation of multiple components with an easy interface addressed to final users; and three "Showcases" addressing three main fields of new media domain: AR art, Entertainment and Digital Theatre, Interactive Installation in public spaces and Next Generation Interactive TV.
Reference Event: 22-24/Jul/08: CALLAS at EVA2008, London, UK

By A.Osherenko[UOA]
Deducing a Believable Model for Affective Behavior from Perceived Emotional Data
ABSTRACT: Recently, there has been considerable interest in building lifelike computer systems that manage a believable model for affective behavior. In this paper, we investigate an approach to deducing a probabilistic model for affective behavior from perceived emotional data that can be facilitated to build such systems. In addition, we discuss the obtained results and show potential opportunities for future work
Reference Event: ECAI 2008, Workshop on Computational aspects of affectual and emotional interaction (CAFFEi 2008)

By G.Caridakis,O.Diamanti, P.Maragos, K. Karpouzis [ICCS]
Automatic Sign Language Recognition: vision based feature extraction and probabilistic recognition scheme from multiple cues
ABSTRACT: This work focuses on two of the research problems comprising automatic sign language recognition, namely robust computer vision techniques for consistent hand detection and tracking, while preserving the hand shape contour which is useful for extraction of features related to the handshape and a novel classifcation scheme incorporating Self-organizing maps, Markov chains and Hidden Markov Models. Geodesic Active Contours enhanced with skin color and motion information are employed for the hand detection and the extraction of the hand silhouette, while features extracted describe hand trajectory, region and shape. Extracted features are used as input to separate classifers, forming a robust and adaptive architecture whose main contribution is the optimal utilization of the neighboring characteristic of the SOM during the decoding stage of the Markov chain, representing the sign class.
Reference Event: 15-19/Jul/08: CALLAS at PETRA2008, Athens, Greece
By M.Niiranen, J.Vehkaperä, S.Mäkelä, J.Peltola, T.Räty [VTT]
Fusion of Sound Source Localization and Face Detection for Supporting Human Behavior Analysis
ABSTRACT: This paper describes a demonstrated concept implementation that combines sound source localization and face detection from video stream for supporting human behavior analysis. System monitors space containing multiple persons using microphone array and video camera. The aim is to detect which person in the scene is producing the sound that is received by the microphones. For this task the microphone array localizes the sound in the environment. Simultaneously face detection is performed to the video signal produced by the monitoring video camera. If face is detected from the bearing of the sound the system may decide that the sound is produced by the person who's face is detected. Preliminary results indicate that the fusion may give useful information for human behavior analysis for space containing multiple persons.
Reference Event: 7-9/July/08 CALLAS at MobiMedia2008, Oulu, Finland
By T.Vogt, E.Andre, N.Bee [UOA]
EmoVoice | A framework for online recognition of emotions from voice
ABSTRACT: We present EmoVoice, a framework for emotional speech corpus and classi er creation and for o ine as well as real-time online speech emotion recognition. The framework is intended to be used by non-experts and therefore comes with an interface to create an own personal or application speci c emotion recogniser. Furthermore, we describe some applications and prototypes that already use our framework to track online emotional user states from voice information.
DOWNLOAD PAPER: Published in Proceedings of Workshop on Perception and Interactive Technologies for Speech-Based Systems, 2008 link
Reference Event: 16-18/Jun/08: CALLAS at PIT2008, Kloster Irsee,Germany
By L.A.Liikkanen, J Giacucci, E.Huvio, T.Laitinen  [TKK], E. Andre [UOA]
Exploring Emotions and Multimodality in Digitally Augmented Puppeteering
ABSTRACT: Recently, multimodal and affective technologies have been adopted to support expressive and engaging interaction, bringing up a plethora of new research questions. Among the challenges, two essential topics are 1) how to devise truly multimodal systems that can be used seamlessly for customized performance and content generation, and 2) how to utilize the tracking of emotional cues and respond to them in order to create affective interaction loops. We present PuppetWall, a multi-user, multimodal system intended for digitally augmented puppeteering. This application allows natural interaction to control puppets and manipulate playgrounds comprising background, props, and puppets. PuppetWall utilizes hand movement tracking, a multi-touch display and emotion speech recognition input for interfacing. Here we document the technical features of the system and an initial evaluation. The evaluation involved two professional actors and also aimed at exploring naturally emerging expressive speech categories. We conclude by summarizing challenges in tracking emotional cues from acoustic features and their relevance for the design of affective interactive systems.
DOWNLOAD PAPER: from ; published on Proceeding of AVI 2008 conference on Advanced visual interfaces ACM Press
Reference Event: 28-30/May/08: CALLAS at AVI2008, Napoli, Italy
By L.A.Liikkanen, E.Huvio, R.Samperio [TKK], T.Seppänen, E.Väyrynen [Univ. of Oulu]
Developing Affective Intelligence For An Interactive Installation: Insights From A Design Process
ABSTRACT: This paper documents a case study from the development of an affective application called PuppetWall, which is an interactive installation built upon the puppeteering metaphor. It is designed to react to user expressions and visualize them on a large multitouch screen. We present an outline of the system and a review of comparable applications. We describe our initial design efforts in implementing emotion recognition using speech and a novel way of using affective information to control the application. Based an initial user test, we show how users try to exploit the system by eliciting various vocal expressions. We conclude our presentation by examining the lessons learned from this design iteration, focusing on the auditory cues available and the implementation of interactive features.
DOWNLOAD PAPER: Published in proceedings of workshop on Corpora for Research on Emotion and Affect, pages 104-107
Reference Event: 26-30/May/08: CALLAS at LREC2008, Marrakech, Morocco
By M.Rehm, T.Vogt, M.Wissner, N.Bee [UOA]
Dancing the Night Away — Controlling a Virtual Karaoke Dancer by Multimodal Expressive Cues
ABSTRACT:In this article, we propose an approach of nonverbal interaction with virtual agents to control agents’ behavioral expressivity by extracting and combining acoustic and gestural features. The goal for this approach is twofold, (i) expressing individual features like situated arousal and personal style and (ii) transmitting this information in an immersive 3D environment by suitable means.
By N.Bee and E.André [UOA]
Cultural gaze behavior to improve the appearance of virtual agents
ABSTRACT: Finding cultural dependencies on eye gaze behavior in conversations to derive general rules that are valid beyond culture would be crucial. In this way we like to build a gaze awareness model to provide visual feedback to users interacting with virtual agents. This work aims to give an overview of literature dealing with eye gaze and culture. In addition to that we claim that an eye gaze behavior for virtual agents is important. And further, we describe methods for measuring users' eye gaze.

YEAR 2007

By F.Charles, S. Lemercier, M.avazza [TEES] T.Vogt, N.Bee, El.André [UOA] M.Mancini, C. Pélachaud [Par8] J.Urbain [FPMS] and M.Price [BBC]
Affective Interactive Narrative in the CALLAS Project
ABSTRACT: Interactive Narrative relies on the ability for the user (and spectator) to intervene in the course of events so as to influence the unfolding of the story. This influence is obviously different depending on the Interactive Narrative paradigm being implemented, i.e. the user being a spectator or taking part in the action herself as a character. If we consider the case of an active spectator influencing the narrative, most systems implemented to date have been based on the direct intervention of the user either on physical objects staged in the virtual narrative environment or on the characters themselves via natural language input . While this is certainly empowering the spectator, there may be limitations as to the realism of that mode of interaction if we were to transpose Interactive Narrative for a vast audience.
DOWNLOAD PAPER: Demo paper in Proceedings of the 4th International Conference on Virtual Storytelling, 2007, from
Reference Event: 5-7/Dec/07: CALLAS at ICVS2007, Saint Malo, France
By S.W. Gilroy, M.Cavazza, R.Chaignon [TEES] S.Mäkelä, M.Niiranen [VTT] T.Vogt, E.André [UOA] M.Billinghurst, H. Seichter [HITNZ] and M.Benayoun artist
An Emotionally Responsive AR Art Installation
ABSTRACT: In this paper, we describe a novel method of combining emotional input and an Augmented Reality (AR) tracking/display system to produce dynamic interactive art that responds to the perceived emotional content of viewer reactions and interactions. As part of the CALLAS project, our aim is to explore multimodal interaction in an Arts and Entertainment context. The approach we describe has been implemented as part of a prototype “showcase” in collaboration with a digital artist designed to demonstrate how affective input from the audience of an interactive art installation can be used to enhance and enrich the aesthetic experience of the artistic work. We propose an affective model for combining emotionally-loaded participant input with aesthetic interpretations of interaction, together with a mapping which controls properties of dynamically generated digital art.
Reference Event: 13-16/Nov/07: CALLAS at ISMAR2007, Nara, Japan
By JL Lugrin, R.Chaignon, M.Cavazza [TEES]
A High-level Event System for Augmented Reality
ABSTRACT: 3D graphics systems increasingly rely on sophisticated event systems derived from collision detection mechanisms, which support the discretisation of Physics as well as high-level programming and scripting. By contrast, Augmented Reality systems have not yet adopted this approach. We describe the development of a high-level event system on top of the ARToolkit environment incorporating the ODE Physics engine. We first define a typology of events encompassing interactions between virtual objects as well as interactions involving markers. We then describe how these events can be recognised in real-time from elementary collisions detected by the ODE Physics engine. We conclude by discussing examples of high-level event recognitions and how they can support the development of applications.
By M. Mancini, C. Pelachaud [PAR8]
Dynamic Behavior Qualifiers for Conversational Agents
ABSTRACT: We aim at defining conversational agents that exhibit qualitatively distinctive behaviors. To this aim we provide a small set of parameters to allow one to define behavior profiles and then leave to the system the task of animating the agent. Our approach is to manipulate the behavior tendency of the agent depending on its communicative intention and emotional state. In this paper we will define the concepts of Baseline and Dynamicline. The Baseline of an agent is defined as a set of fixed parameters that represent the personalized agent behavior, while the Dynamicline, is a set of parameters values that derive both from the Baseline and the current communicative goals and emotional state.
By E. Bevacqua, M.Tellier, C.Pelachaud[PAR8] D.Heylen [University of Twente, The Netherlands]
Searching for Prototypical Facial Feedback Signals
ABSTRACT: Embodied conversational agents should be able to provide feedback on what a human interlocutor is saying. We are compiling a list of facial feedback expressions that signal attention and interest, grounding and attitude. As expressions need to serve many functions at the same time and most of the component signals are ambiguous, it is important to get a better idea of the many to many mappings between displays and functions. We asked people to label several dynamic expressions as a probe into this semantic space. We compare simple signals and combined signals in order to find out whether a combination of signals can have a meaning on its own or not, i. e. the meaning of single signals is different from the meaning attached to the combination of these signals. Results show that in some cases a combination of signals alters the perceived meaning of the backchannel.
By R.Newiadomsky, C. Pelachaud [PAR8]
Model of facial expressions management for an embodied conversational agent
ABSTRACT: In this paper we present a model of facial behaviour encompassing interpersonal relations for an Embodied Conversational Agent (ECA). Although previous solutions of this problem exist in ECA's domain, in our approach a variety of facial expressions (i.e. expressed, masked, inhibited, and fake expressions) is used for the first time. Moreover, our rules of facial behaviour management are consistent with the predictions of politeness theory as well as the experimental data (i.e. annotation of the video-corpus). Knowing the affective state of the agent and the type of relations between interlocutors the system automatically adapts the facial behaviour of an agent to the social context. We present also the evaluation study we have conducted of our model. In this experiment we analysed the perception of interpersonal relations from the facial behaviour of our agent.
Reference Event: 12-14/Sept/07: CALLAS at ACII2007, Lisbon, Portugal

By J.Wagner, T.Vogt, E.Andre'[UOA]
A systematic comparison of different HMM designs for emotion recognition from acted and spontaneous speech
ABSTRACT: In this work we elaborate the use of hidden Markov models (HMMs) for speech emotion recognition as a dynamic option to static modelling approaches. Since previous work on this field does not yet define a clear line which HMM design should be prioritised for this task, we run a systematic analysis of different HMM configurations. Furthermore, experiments are carried out on an acted and a spontaneous emotions corpus, since little is known about the suitability of HMMs to spontaneous speech. Additionally, we consider two different segmentation levels, namely words and utterances. Results are compared with the outcome of a support vector machine trained on global statistics. While for both databases similar performance was observed on utterance level, the HMM-based approach outperformed static classification on word level. However, setting up general guidelines which kind of networks are best suited appeared to be rather difficult.
DOWNLOAD PAPER published in Lecture Notes In Computer Science; Vol. 4738- Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction, link
Reference Event: 12-14/Sept/07: CALLAS at ACII2007, Lisbon, Portugal

By A.Osherenko, E.André [UOA]
Lexical Affect Sensing: Are Affect Dictionaries Necessary to Analyze Affect?
ABSTRACT: Recently, there has been considerable interest in the automated recognition of affect from written and spoken language. In this paper, we investigate how information on a speaker’s affect may be inferred from lexical features using statistical methods. Dictionaries of affect offer great promise to affect sensing since they contain information on the affective qualities of single words or phrases that may be employed to estimate the emotional tone of the corresponding dialogue turn. We investigate to what extent such information may be extracted from general-purpose dictionaries in comparison to specialized dictionaries of affect. In addition, we report on results obtained for a dictionary that was tailored to our corpus.
DOWNLOAD PAPER: from SpringerLink
Reference Event: 12-14/Sept/07: CALLAS at ACII2007, Lisbon, Portugal
By M.Bertoncini [ENG], M. Cavazza [TEES]
Emotional Multimodal Interfaces for Digital Media: The CALLAS Challenge
ABSTRACT: Emotional multimodal interfaces aim at achieving the highest level of naturalness in human-computer interaction. One of the main challenges for CALLAS European R&D project is to implement the concept of affective emotional input for interactive media rather than within a traditional interface paradigm. Affective and emotional interfaces are generally concerned with the real-time identification of user emotions to determine system response. They rely most often on Ekmanian emotions such as joy, fear or anger. However, interaction with new media such as interactive narratives, digital theatre or digital arts involves different ranges of emotions on the user’s side, some of which correspond to responses to aesthetic properties of the media, or characterise the user experience itself in terms of enjoyment and entertainment. To identify these, more complex articulations of modalities are required. Such key aspects are currently investigated within the CALLAS project in the specific area of Art and Entertainment applications.
Reference Event: 22-27/Jul/07: CALLAS at HCI2007 Beijing, China
By M.Rehm, B.Endrass and M.Wissner [UOA]
Integrating the User in the Social Group Dynamics of Agents
ABSTRACT: This paper introduces the Virtual Beergarden as a virtual meeting place for agents and users. The agents behavior is controlled by a behavior control component, which allows testing different theories of social group dynamics. Agents interact via natural language that is generated by a statistical language component and takes into account the social interaction categories and the social relationships between agents. The user can freely navigate and interact with the other agents relying on the above mentioned components. An evaluation shows if the user can really be integrated in the agents’ social group dynamics.
By M.Bertoncini. A.Pandozy [ENG]
Tomorrow’s Media and Emotional Interfaces: the CALLAS Project
ABSTRACT: Emotional and multimodal Interfaces aim at achieving the highest level of naturalness in Human-Computer Interaction. A major trend for Multimodal Interfaces research activities in recent years has been the investigation and the development of affective interfaces, which are able to analyse and render emotions as part of interactive systems. These have been developed as an extension to Multimodal interfaces, in particular agent-based interfaces in which the user engages in “social” communication with digital characters. As a consequence, early affective interfaces have mostly involved those simple emotional models which are able solely to detect and/or to animate the six basic emotion categories
Reference Event: 26-30/March/07: CALLAS at EVA 2007, Florence, Italy
Last Updated on Monday, 05 July 2010 15:28