Research areas

The (very) long-term research challenges that motivate HOTLab research can be summarised in the following questions:

  • Can machines accurately input and analyse human behavior, up to the level of emotional and social signals, in a fully automatic and non-intrusive way?
  • Can we automatically simulate realistic human behavior, including believable and useful interaction of virtual with real humans?

Full answers to these questions are still in fairly distant future, so the practical research issues deal with steps that we can realistically take in this direction, as well as near-term applications of the technologies that are developed or within reach. Specific areas that we work on include:

Face Tracking

In collaboration with partners at Linkoping University and Visage Technologies AB we are continuing to develop a monocular high-performance multi-platform face and facial feature tracking system. The main distinguishing characteristics of our approach are that the tracker is fully automatic and works with any face without any manual initialization step. It is robust, resistant to rapid changes in pose and facial expressions, does not su er from drifting, is modestly computationally expensive, does not require previous training for successful tracking, recovers quickly from any losses and is capable of switching between di erent faces within the same tracking session. Its modest computational cost makes it ideal for achieving high multi-platform performance. The tracker works both on Windows and Linux as well as on leading mobile platforms: iOS and Android. We have performed a series of tests in di erent environments and under various conditions, using most common consumer devices as well as professional hardware, such as thermal cameras.


Face Analysis

In collaboration with partners at Linkoping University and Visage Technologies AB we are continuing to develop Face Analysis. It consists of age, gender and emotion estimation. We are using the results of Face Tracking to implement face analysis algorithms. Its modest computational cost makes it ideal for achieving high multi-platform performance. The analyser works both on Windows and Linux as well as on leading mobile platforms: iOS and Android. We have performed a series of tests in diff erent environments and under various conditions.


Face Recognition

In collaboration with partners at Linkoping University and Visage Technologies AB we are continuing to develop Face Recognition.


Local descriptors learning for image matching

Local descriptors are a widely used tool in computer vision and pattern recognition. Some example applications  include object/scene recognition and retrieval, face  verification, face alignment, image stitching, 3D shape estimation and 3D model retrieval/matching. Current best local descriptors are learned on a large dataset of matching and non-matching keypoint pairs. However, data of this kind is not always available since detailed keypoint correspondences can be hard to establish. On the other hand, we can often obtain labels for pairs of keypoint bags. For example, keypoint bags extracted from two images of the same object under different views form a matching pair, and keypoint bags extracted from images of different objects form a nonmatching pair. On average, matching pairs should contain more corresponding keypoints than non-matching pairs. We attempt to construct effective algorithms for using such data to obtain discriminative local descriptors.

Selected Past Achievements

Facial Motion Cloning

Facial Motion Cloning is a method for automatically copying facial motion from one 3D face model to another, while preserving the compliance of the motion to the MPEG-4 FBA standard. It offers dramatic time saving to artists producing morph targets for facial animation or MPEG-4 Facial Animation Tables. [More information is available in the paper.]

Activities continue on improving video driven facial animation as demonstrated in this video:


RealActor: A Multimodal Behavior Realizer

Applications with intelligent Embodied Conversational Agents (ECAs), seek to bring human-like abilities into machines and establish natural human-computer interaction. This research focuses on one of the crucial components which makes ECAs believable and expressive in interaction with humans, and that is the animation system which realizes multimodal behaviors. Our goal is to devise a realization system which realizes the complex communicative utterances in real time, and which will beapplicable to wide range of domains and publicly available for the benefit of the research community. The first prototype of the system called RealActor is developed and experimentaly tested in our research lab. RealActor employs a novel solution for synchronizing gestures and speech using neural networks, and also an adaptive face animation model based on Facial Action Coding System (FACS) which synthesizes face expressions. [more details are available in the paper].


Generic ECA Framework

In collaboration with the University of Kyoto, Graduate School of Informatics, Dept. of Intelligence Science and Technology, Group of Applied Intelligence Information Processing, we are developing a generic backbone GECA framework that connects a set of reusable and modulated ECA software components. The purpose of our work is to provide rapid building of ECA systems and to prevent redundant efforts and resource age. At the moment, the framework has been tested in several ECA systems.

The first serious ECA application based on GECA framework has been developed during the period of eNTERFACE '06 workshop in Dubrovnik. In the system, ECA agent mediates the sightseeing information of Dubrovnik to its visitors via verbal and non-verbal interactions in English, Japanese nad Croatian, with appropriate gestures. [View the demonstration video]. Second, a more serious application is a multiuser tour guide system developed at the eNTERFACE '08 workshop in Paris. The system combines different technologies to detect and address the system users and draw their attention. In experimental interation, it showed that it can address the user’s appearance, departure, decreased level of interest and identify his conversational role.


Automatic Gesturing Behavior for Embodied Agents

The goal of this work is to achieve real-time, on-the-fly, fully automatic gesturing behavior corresponding to the speech spoken by the virtual character, regardless whether the speech is generated using speech synthesis or captured from a human speaker. We propose and use HUman GEsturing (HUGE), a software architecture for producing and using statistical models for facial gestures based on any kind of inducement signal which is correlated to the the facial gestures e.g. text that is spoken, audio signal of speech, bio signals, emotions etc. The correlation between the inducement signal and the facial gestures is used to first build the statistical model which is than used in the real time to trigger facial gestures of the agent based on the raw inducement signal. This universal architecture is useful for experimenting with various kinds of potential inducement signals and their features and exploring the correlation of such signals or features with the gesturing behaviour. It has succesfully been used for real time, automatic generation of full facial gesturing from plain text and for speech-based animation.

Text-based automatic facial gesturing

Using HUGE arhitecture we have succesfuly achieved real-time, on-the-fly, fully automatic facial gesturing behavior corresponding to the speech spoken by the virtual character using speech synthesis. The input to the system is plain English text. We use lexical analysis and statistical models of behavior to produce fully automatic animation [more details are available in the paper].


Speech-based automatic facial gesturing

Our speech-based animation approach proposes a hybrid method for the real-time correlation of the speech signal with facial gestures. Our hybrid statistcs method combines a data-driven module, in which facial gestures are correlated with prominent parts of the speech, and a rule-based module consists of a set of rules, which among others include rules for punctuation and prolonged pauses.
In order to check how believable virtual characters are if animated using facial gestures obtained by the proposed method, we implement a system for speech driven facial gesturing and we use it for a perceptual evaluation. The system has been put in a context of virtual news presenters used in networked environments, since that is where the future research of this system strives next. [more details are available in the paper].


Personalised Virtual Characters on mobile platforms

In a final year student project, a group of motivated students have developed a prototype application for animating personalised Virtual Characters on mobile platforms. Camera phones, Java MIDlets, MMS and GPRS are some of the technologies used to bring these personalised animations to mobile phones. Prototype full 3D face animation players have been developed on Symbian (SonyEricsson P800) and iPaq platforms. Some of the results have been presented in an interview for the science program "Trenutak spoznaje" of the Croatian Television. The 3D virtual characters are created by taking a picture using the mobile phone camera and adjusting a mask over the face in the picture directly on the phone. The server then creates a personalised 3D face model looking like the person from the picture. It can be animated using speech synthesis or speech analysis (lip sync). The animations can be delivered using MMS, or full 3D animation can be played on Symbian smartphones. Additionaly, the whole service can be available through the web.


Virtual Reality in treatment of psychological disorders

Postgraduate student Sanja Mrdeža has been working on the application of Virtual Reality techniques for the treatment of patients with Post-Traumatic Stress Disorder (PTSD), giving emphasis to the particular needs of PTSD patients in Croatia (see paper) with the aim to develop an application that may help treating such patients, and evaluate it in a clinical trial.