This paper presents a localization and tracking system
integrating multiple sensors. Object localization
results from local sensor systems are fused using a
decentralized Kalman filter. An audiovisual speaker
tracking system is evaluated, which is based upon
a video based face tracker and a microphone array.
A quantitative analysis shows that the presented bimodal tracking
system can deliver more robust and reliable results than
either of the two single modalities.