The CALLAS Shelf

A dynamic pool of multimodal interface technologies is provided in the CALLAS Shelf, where software components are included, selected on the basis of their proven efficiency and robustness, or newly developed in CALLAS, to guarantee consistent performance for many contexts and scenarios.
These components deal with:

processing and interpreting signals in terms of emotional and affective categories (More)
rendering emotions through music, emotional language and virtual humanoid representations (More)
integration, mapping and fusion for multimodal emotion recognition (More)

SHELF components are ready to be used in affective multimodal applications developed with the CALLAS Framework that reduces the complexity of artistic production scenarios especially for those who are not experts in affective multimodal theory.

Suggested reading (see Public deliverables):

Identification and Selection of Modules: update October 2007
Shelf Selection of new models: update October 2008
Shelf Components 1^st Release: update October 2007
Emotional Natural Language generator: update October 2007
Specification for Model of Awareness: update October 2008
Integrated Model of expressive and attentive capabilities: update October 2008
Affective Music Synthesis: update October 2008
Final Report on Multimodal Components: update April 2010
Final Report on ECAs for affective output: update April 2010

CALLAS components processing signals from microphones, camera, haptic devices, mobile phones, Wiimote, audio and video:

Multikeyword Spotting: a component recognizing when one of a pre-defined set of utterances occurs in speech, useful to select different paths in an application or to evaluate the users' feeling, indirectly driving application changes. It is speaker-independent and can be run in automatic or push-to-talk mode. The list of words to be recognized as well as the language can be changed at runtime. More
Real-time emotion recognition from speech: a framework for building an emotion classifier and for recognizing emotions in real-time. It extracts from speech signals a vector of emotion-relevant acoustic features (e.g. derived from pitch, energy, voice quality, pauses, spectral information) and then it uses a statistical classifier, trained by examples, to assign emotion labels. More
Emotional text analyser: a component making use of linguistic information relevant for lexical affect sensing to recognize emotions from text by statistical or semantic analysis. More
Audio Feature extraction: taking input from live audio, it classifyes audio streams by different sound classes such as speech, music, silence, constant and variable, clapping, whistling and applause. It provides in output a corresponding audio class for each audio frame. More
Video Feature Extraction: extracting faces from video sequence or live camera feed to derive information about the emotional state, content and context, keeping track of the amount of people looking towards the camera, and deriving interesting cues about the state of the audience's interest. The component is also a video player and it plays video files and captures live feed from camera. More.
Human Glove Wearable Interface for Motion Capture: based on a data glove device as a sensing unit, the component is capturing motion data from sensors, to record the full body motion, and it is integrated with an Inertial Platform (consisting of accelerometers, gyroscopes and magnetometers) making it suitable for emotion extraction. More
Video-Based Gesture Expressivity Features Extraction: a video-based component detecting and tracking the user's hands to extract and transmit expressivity features' values such as overall activation, spatial extent, temporal, fluidity, power. More
WiiGLE: a component classifying hand movements in a 3D space based on the analysis of acceleration data captured from a Nintendo's Wiimote controller. It is supported by a corpora of arbitrary gestures used to train classifiers which are used for online recognition of gestures. More
Gesture recognition from mobile phones: a component using the mobile phone with accelerometers as a sensor, mapping types of movement defined by expressivity parameters (e.g. graceful/fast tempo) to different emotions. More
Gaze detection and Head Pose estimation: a component estimating human head movements (yaw, pitch, roll), and direction of the eyes related to a user in front of the content of a computer monitor, deriving information about his state: attentive, distracted or nervous. More
Facial feature detection: detecting and tracking different facial features (such as eye centers, eye corners, top-down eyelids,..) based on facial geometry and prototypes of natural human motion. More
Facial Expression Recognition: recognizing in real-time facial expression from localizing and tracking of facial features movements, based on the appearance of the expression of a person when interacting with a camera, and providing feedback regarding emotion recognition based on dimensional or Ekmanian emotions. More

CALLAS components rendering emotions in terms of speech, music, laughter and animated ECAs:

Emotional Natural Language Generator: understanding the emotional state of a speaker from "what" and "how" a speaker is linguistically expressing itself. It is based on an annotated corpus consisting of sentences that present typical expressions used in a conversation. More
Affective Music Synthesis: rendering the user's emotive state by real-time generation of an affectivisation of music. Some key characteristics of the music are altered in response to the changing mood of the user psychologically correlated and expressed in the PAD model. More
Acoustic Awareness: analysing and properly reacting to laughs allowing an ECA to join his conversational partners' laughter: More
Emotional Attentive ECA: interacting with a user trough a rich palette of verbal and nonverbal behaviours of a real-time 3D female agent. Communicative intentions of the listener are rendered by talking and simultaneously showing facial expressions, gestures, gaze, and head movements. More
Augmented Reality Output Component: an End-User application for Augmented Reality visualization, that allows scripting and fast assembly of AR interactions. More

CALLAS components devoted to multimodal emotion recognition:

Low Level Multimodal Fusion: supporting machine learning to the feature-extraction component to provide unimodal emotion recognition output and putting together features from individual modalities, catering for early fusion and emotion recognition. More
Smart sensor integration: featuring integration of a single or multiple sensors into multimedia applications. It allows a developer to quickly turn standard sensors, such as microphone, camera, or wiimote into "smart" sensors, presenting information in a form that meets the requirements of the application as effectively as possible. More
Ad hoc multimodal semantic fusion components: combining affective results from components into a dimensional model through PAD (Pleasure-Arousal-Dominance) for an overall affective representation of user interactions. More

CALLAS

The CALLAS Shelf

Main Menu

Contact us

Notice Board