Information fusion algorithms have been successful in many vision tasks such as stereo, motion estimation, registration and robot localization. Stereo and motion image analysis are intimately connected and can provide complementary information to obtain robust estimates of scene structure and motion. We present an information fusion based approach for multi-camera and multi-body structure and motion that combines bottom-up and top-down knowledge on scene structure and motion. The only assumption we make is that all scene motion consists of rigid motion. We present experimental results on synthetic and nonsynthetic data sets, demonstrating excellent performance compared to binocular based state-of-the-art approaches for structure and motion.
Detection of motion patterns in video data can be significantly simplified by abstracting away from pixel intensity values towards representations that explicitly and compactly capture movement across space and time. A novel representation that captures the spatiotemporal distributions of motion across regions of interest, called the ‘‘Direction Map,” abstracts video data by assigning a two-dimensional vector, representative of local direction of motion, to quantized regions in space-time. Methods are presented for recovering direction maps from video, constructing direction map templates (defining target motion patterns of interest) and comparing templates to newly acquired video (for pattern detection and localization). These methods have been successfully implemented and tested (with real-time considerations) on over 6300 frames across seven surveillance/traffic videos, detecting potential targets of interest as they traverse the scene in specific ways. Results show an overall recognition rate of approximately 91% hits vs 8% false positives.
Analysing human gait has found considerable interest in recent computer vision research. So far, however, contributions to this topic exclusively dealt with the tasks of person identification or activity recognition. In this paper, we consider a different application for gait analysis and examine its use as a means of deducing the physical well-being of people. Understanding the detection of unusual movement patterns as a two-class problem suggests using support vector machines for classification. We present a homeomorphisms between 2D lattices and binary shapes that provides a robust vector space embedding of segmented body silhouettes. Experimental results demonstrate that feature vectors obtained from this scheme are well suited to detect abnormal gait. Wavering, faltering, and falling can be detected reliably across individuals without tracking or recognizing limbs or body parts.
Subspace manifold learning represents a popular class of techniques in statistical image analysis and object recognition. Recent research in the field has focused on nonlinear representations; locally linear embedding (LLE) is one such technique that has recently gained popularity. We present and apply a generalization of LLE that introduces sample weights. We demonstrate the application of the technique to face recognition, where a model exists to describe each face’s probability of occurrence. These probabilities are used as weights in the learning of the low-dimensional face manifold. Results of face recognition using this approach are compared against standard non-weighted LLE and PCA. A significant improvement in recognition rates is realized using weighted LLE on a data set where face occurrences follow the modeled distribution.
An approach to recognizing human hand gestures from a monocular temporal sequence of images is presented. Of concern is the representation and recognition of hand movements that are used in singlehanded American sign language (ASL). The approach exploits previous linguistic analysis of manual languages that decompose dynamic gestures into their static and dynamic components. The first level of decomposition is in terms of three sets of primitives, hand shape, location and movement. Further levels of decomposition involve the lexical and sentence levels and are beyond the scope of the present paper. We propose and subsequently demonstrate that given a monocular gesture sequence, kinematic features can be recovered from the apparent motion that provide distinctive signatures for 14 primitive movements of ASL. The approach has been implemented in software and evaluated on a database of 592 gesture sequences with an overall recognition rate of 86% for fully automated processing and 97% for manually initialized processing.