Visual Attention

TarzaNN

hl_imgeXhnwutoKo.jpg

Even though attention is a pervasive phenomenon in primate vision, surprisingly little agreement exists on its definition, role and mechanisms, due at least in part to the wide variety of investigative methods. As elsewhere in neuroscience, computational modeling has an important role to play by being the only technique that can bridge the gap between these methods and provide answers to questions that are beyond the reach of current direct investigative methods.

   A number of computational models of primate visual attention have appeared over the past two decades. While all models share several fundamental assumptions, each is based on a unique hypothesis and method. Each seems to provide a satisfactory explanation for several experimental observations. However, a detailed comparative analysis of the existing models, that is, a comparison with each of the models subjected to the same input data set in order to both verify the published performance and to push the models to their limits, has never been undertaken. Such an analysis would be invaluable: comparative, computational testing procedures would be established for the first time, successful modeling ideas would be confirmed, weaknesses identified, and new directions for development discovered. The goal would be to validate the models by testing them against existing knowledge of the primate attentional system; the experimental stimuli and task definitions which led to that knowledge would form the basis for the development of the test data sets.

   In order to facilitate this analysis and to provide the research community with a common software platform, we have developed a general purpose, extensible, neural network simulator geared towards the computational modeling of visual attention. The simulator allows for the distributed execution of models in a heterogeneous environment. Its associated tools allows non-programmers to develop and test computational models, and a common model description format facilitates the exchange of models between research groups. The simulation results can be presented in a variety of formats, from activation maps to the equivalent of single-unit recordings and fMRI


 

Tsotsos, J.K., Liu, Y., Martinez-Trujillo, J., Pomplun, M., Simine, E., Zhou, K., Attending to Visual Motion, Computer Vision and Image Understanding, Vol. 100, 1-2, p 3 - 40, Oct. 2005.

imgURCpiJyBmW.jpg

Visual motion analysis has focused on decomposing image sequences into their component features. There has been little success at re-combining those features into moving objects. Here, a novel model of attentive visual motion processing is presented that addresses both decomposition of the signal into constituent features as well as the re-combination, or binding, of those features into wholes. A new feed-forward motion-processing pyramid is presented motivated by the neurobiology of primate motion processes. On this structure the Selective Tuning (ST) model for visual attention is demonstrated. There are three main contributions: (1) a new feed-forward motion processing hierarchy, the first to include a multi-level decomposition with local spatial derivatives of velocity; (2) examples of how ST operates on this hierarchy to attend to motion and to localize and label motion patterns; and (3) a new solution to the feature binding problem sufficient for grouping motion features into coherent object motion. Binding is accomplished using a top-down selection mechanism that does not depend on a single location-based saliency representation.


Rothenstein,  A., Rodriguez-Sanchez, A., Simine, E., Tsotsos, J.K., Visual Feature  Binding within the Selective Tuning Attention Framework, Int. J.  Pattern Recognition and Artificial Intelligence - Special Issue on  Brain, Vision and Artificial Intelligence, 22(5), 2008 p. 861-881

 

imgoiDQZHgnHJ.jpg

We present a biologically plausible computational model for solving the visual feature binding problem, based on recent results regarding the time course and processing sequence in the primate visual system. The feature binding problem appears due to the distributed nature of visual processing in the primate brain, and the gradual loss of spatial information along the processing hierarchy. This paper puts forward the proposal that by using multiple passes of the visual processing hierarchy, both bottom-up and top-down, and using task information to tune the processing prior to each pass, we can explain the different recognition behaviors that primate vision exhibits. To accomplish this, four different kinds of binding processes are introduced and are tied directly to specific recognition tasks and their time course. The model relies on the reentrant connections so ubiquitous in the primate brain to recover spatial information, and thus allow features represented in different parts of the brain to be integrated in a unitary conscious percept. We show how different tasks and stimuli have different binding requirements, and present a unified framework within the Selective Tuning model of visual attention.


Rodriguez-Sanchez, A.J., Simine, E., Tsotsos., J.K., Attention And Visual Search, Int. J. Neural Systems,2007 Aug;17(4):275-88.

 

imgpBlySIPLiL.jpg

We present a biologically plausible computational model for solving the visual feature binding problem, based on recent results regarding the time course and processing sequence in the primate visual system. The feature binding problem appears due to the distributed nature of visual processing in the primate brain, and the gradual loss of spatial information along the processing hierarchy. This paper puts forward the proposal that by using multiple passes of the visual processing hierarchy, both bottom-up and top-down, and using task information to tune the processing prior to each pass, we can explain the different recognition behaviors that primate vision exhibits. To accomplish this, four different kinds of binding processes are introduced and are tied directly to specific recognition tasks and their time course. The model relies on the reentrant connections so ubiquitous in the primate brain to recover spatial information, and thus allow features represented in different parts of the brain to be integrated in a unitary conscious percept. We show how different tasks and stimuli have different binding requirements, and present a unified framework within the Selective Tuning model of visual attention.


Bruce, N.D.B., Tsotsos, J.K., Saliency, Attention, and Visual Search:  An Information Theoretic Approach, Journal of Vision 9:3, p1-24, 2009,http://journalofvision.org/9/3/5/, doi:10.1167/9.3.5

imgbedYzXGGTq.jpg

 

A proposal for saliency computation within the visual cortex is put forth based on the premise that localized saliency computation serves to maximize information sampled from one’s environment. The model is built entirely on computational constraints but nevertheless results in an architecture with cells and connectivity reminiscent of that appearing in the visual cortex. It is demonstrated that a variety of visual search behaviors appear as emergent properties of the model and therefore basic principles of coding and information transmission. Experimental results demonstrate greater efficacy in predicting fixation patterns across two different data sets as compared with competing models.


Martinez-Trujillo,  J.C., Cheyne, D., Gaetz, W., Simine, E., Tsotsos, J.K., Activation of  area MT/V5 and the right inferior parietal cortex during the  discrimination of transient direction changes in translational motion, Cerebral Cortex, 2007 Jul;17(7):1733-9. Epub 2006 Sep 29.

 

imgABjNTdMgeq.jpg

The perception of changes in the direction of  objects that translate in space is an important function of our visual  system. Here we investigate the brain electrical phenomena underlying  such a function by using a combination of magnetoencephalography (MEG)  and magnetic resonance imaging. We recorded MEG-evoked responses in 9  healthy human subjects while they discriminated the direction of a  transient change in a translationally moving random dot pattern  presented either to the right or to the left of a central fixation  point. We found that responses reached their maximum in 2 main regions  corresponding to motion processing area middle temporal (MT)/V5  contra-lateral to the stimulated visual field, and to the right inferior  parietal lobe (rIPL). The activation latencies were very similar in  both regions (~135 ms) following the direction change onset. Our  findings suggest that area MT/V5 provides the strongest sensory signal  in response to changes in the direction of translational motion,  whereas area rIPL may be involved either in the sensory processing of  transient motion signals or in the processing of signals related to  orienting of attention.


Martinez-Trujillo,  J.C., Tsotsos, J.K., Simine, E., Pomplun, M., Wildes, R., Treue, S.,  Heinze, H.-J., Hopf, J.-M., Selectivity for Speed Gradients in Human  Area MT/V5, NeuroReport 16(5):435-438, 2005 Apr 4.

 

imgWSlFYYQiDp.jpg

Cortical area MT/V5 in the human occipito-temporal  cortex is activated by visual motion. In this study, we use functional  imaging to demonstrate that a subregion of MT/V5 is more strongly  activated by unidirectional motion with speed gradients than by other  motion patterns. Our results suggest that like the monkey homolog  middle temporal area (MT), human MT/V5 contains neurons selective for  the processing of speed gradients. Such neurons may constitute an  intermediate stage of processing between neurons selective for the  average speed of unidirectional motion and neurons selective for  different combinations of speed gradient and different motion  directions such as expanding optical flow patterns.


Loach, D.,  Frischen, A., Bruce, N., Tsotsos, J.K., An attentional mechanism for  selecting appropriate actions afforded by graspable objects, Psychological Science 19(12), p 1253-1257, 2008.

 

imgODegabAcNo.jpg

An object may afford a number of different actions. In this article, we show that an attentional mechanism inhibits competing motor programs that could elicit erroneous actions. Participants made a speeded key press to categorize the second of two successively presented door handles that were rotated at varying orientations relative to one another. Their responding hand was compatible or incompatible with the graspable part of the door handles (rightward or leftward facing). Compatible responses were faster than incompatible responses if the two handles shared an identical orientation, but they were slower if the two handles were aligned at slightly dissimilar orientations. Such suppressive surround effects are hallmarks of attentional processing in the visual domain, but they have never been observed behaviorally in the motor domain. This finding delineates a common mechanism involved in two of the most important functions of the brain: processing sensory data and preparing actions based on that information.


file_Tsotsosetal2008.pdfTsotsos,  J.K., Rodriguez-Sanchez, A., Rothenstein, A., Simine, E., Different  Binding Strategies for the Different Stages of Visual Recognition,  Brain Research, Available online 23 May 2008

 

imgytWbQRTado.jpg

Many think that visual attention needs an executive to allocate resources. Although the cortex exhibits substantial plasticity, dynamic allocation of neurons seems outside its capability. Suppose instead that the visual processing architecture is fixed, but can be ‘tuned’ dynamically to task requirements: the only remaining resource that can be allocated is time. How can this fixed, yet tunable, structure be used over periods of time longer than one feed-forward pass? With the goal of developing a computational theory and model of vision and attention that has both biological predictive power as well as utility for computer vision, this paper proposes that by using multiple passes of the visual processing hierarchy, both bottom-up and top-down, and using task information to tune the processing prior to each pass, we can explain the different recognition behaviors that human vision exhibits. By examining in detail the basic computational infrastructure provided by the Selective Tuning model and using its functionality, four different binding processes – Convergence Binding and Partial, Full and Iterative Recurrence Binding – are introduced and tied to specific recognition tasks and their time course. The key is a provable method to trace neural activations through multiple representations from higher order levels of the visual processing network down to the early levels.


Tombu,  M., Tsotsos, J.K., Attending to Orientation Results in an Inhibitory  Surround in Orientation Space, Perception & Psychophysics,  2008, 70 (1), 30-35.

 

imgeeUwVywpAJ.jpg

Subjects were required to attend to an orientation and make judgments about the stripes on briefly presented disks. Stripe orientation was varied so that they could be at, near, or far from the attended orientation. According to the selective-tuning model (Tsotsos, 1990; Tsotsos et al., 1995), attending to an orientation results in an inhibitory surround for nearby orientations, but not for orientations farther away. In line with this prediction, the results revealed an inhibitory surround. As in the spatial domain, attending to a point in orientation space results in an inhibitory surround for nearby orientations.


F. Cutzu, J.K. Tsotsos, The selective tuning model of visual attention: Testing the predictions arising from the inhibitory surround mechanism, Vision Research pp. 205 - 219, Jan. 2003.

 

imgOGGkQCAdbF.jpg

The selective tuning model [Artif. Intell. 78 (1995) 507] is a neurobiologically plausible neural network model of visual attention. One of its key predictions is that to simultaneously solve the problems of convergence of neural input and selection of attended items, the portions of the visual neural network that process an attended stimulus must be surrounded by inhibition. To test this hypothesis, we mapped the attentional field around an attended location in a matching task where the subjects attention was directed to a cued target while the distance of a probe item to the target was varied systematically. The main result was that accuracy increased with inter-target separation. The observed pattern of variation of accuracy with distance provided strong evidence in favor of the critical prediction of the model that attention is actively inhibited in the immediate vicinity of an attended location.