L Itti C Koch Computational Modelling of Visual Attention Nature Reviews Neuroscience 2001

John G. Tsotsos, Dept. of Estimator Scientific discipline & Technology, and Heart for Vision Research, York University, Canada
Albert Rothenstein, Dept. of Computer science and Engineering and Centre for Vision Research, York University

Figure 1: A Venn diagram of Computational Models organized by the hypotheses that influence them. The outer large circle represents all possible models while the inner 4 ovals represent the four major hypotheses discussed in the paper.

A Model of Visual Attention addresses the observed and/or predicted behavior of human being and non-human primate visual attention. Models tin can be descriptive, mathematical, algorithmic or computational and try to mimic, explain and/or predict some or all of visual circumspect behavior. A Computational Model of Visual Attention non just includes a process description for how attention is computed, but besides can be tested by providing epitome inputs, similar to those an experimenter might nowadays a discipline, and then seeing how the model performs by comparing.

1 Introduction
- 1.one What is a model of visual attention?
- 1.2 What is a computational model of visual attending?
ii A taxonomy of models
- ii.1 The computer vision branch
  - 2.ane.i Interest indicate operations
  - 2.ane.2 Perceptual organization
  - two.1.3 Active vision
  - two.1.four Predictive methods
- 2.ii The biological vision co-operative
  - 2.ii.1 Descriptive models
  - ii.ii.2 Information-plumbing fixtures models
  - 2.2.3 Algorithmic models
- 2.three The computational branch
  - 2.3.1 The selective routing hypothesis
  - two.3.2 The saliency map hypothesis
  - 2.3.three The temporal tagging hypothesis
  - 2.3.4 The emergent attention hypothesis
- two.iv The computational models that are instances of the four major hypotheses
three Functional elements
4 Evaluating a model
5 Acknowledgement
half dozen References
7 Additional reading
eight Links to relevant Scholarpedia articles

Introduction

This commodity presents an overview of a wide variety of models of visual attention that have been presented over the course of the past few decades. A number of model classes will be defined within an organizational taxonomy in an effort to organize a literature that is rapidly growing and with a view towards guiding time to come research. The taxonomy will reflect the differing schools of thought besides as the different modeling strategies. Further, it is important to proceed in listen that not all models were developed with the same goals and that modelers practice not ever follow merely one school of thought or strategy. Motivations for all models come from two sources. The first is involvement to sympathise the human perceptual capability to select, process and act upon parts of ane'south sensory feel differentially from the remainder. The second is the need to reduce the quantity of sensory information processed by a perceptual system (see Computational Foundations for Attentive Processes).

This article focuses on models whose goal is to provide an understanding of all or part of man or not-human primate visual attention. The majority of models that focus primarily on the development of artefacts for computer vision or robotic systems will non be mentioned, even if they might include pregnant biological inspiration. Biological relevance is the key here, that is, research that attempts to model a particular prepare of experimental observations and simultaneously makes predictions that would extend that set and could be verified by future experiments. We try to not guess any model just to provide factual information most modeling in general, about kinds of models (or modeling 'camps'), and about the kinds of functions different models comprehend. Interested readers tin draw their ain conclusions.

An important class of models is non covered here solely because of the emphasis on models that accept claims on explaining the biology of attention. Those are many efforts to utilize aspects of attentive processing in applied settings, in robotics, for surveillance and other applications. Fortunately, a recent excellent survey exists for those interested (Frintrop et al. 2010).

What is a model of visual attention?

A Model of Visual Attention is a description of the observed and/or predicted beliefs of human and not-human primate visual attention. Models can employ natural language, system block diagrams, mathematics, algorithms or computations as their apotheosis and attempt to mimic, explicate and/or predict some or all of visual circumspect behavior. Of importance are the accompanying assumptions, the set of statements or principles devised to provide the explanation, and the extent of the facts or phenomena that are explained. These cannot all exist laid out hither due to the resulting article length just the reader is encouraged to follow the citations provided. Models must exist tested by experiments, and such experiments replicated, both with respect to their explanations of existing phenomena but too to test their predictive validity.

What is a computational model of visual attention?

A Computational Model of Visual Attention is an case of a model of visual attention, and not only includes a formal clarification for how attention is computed, only as well tin can exist tested by providing prototype inputs, similar to those an experimenter might nowadays to a discipline, and and so seeing how the model performs by comparison. The bulk of this commodity will focus on computational models. It should be pointed out that this definition differs from the usual, almost casual, apply of the term 'computational' in the area of neurobiological modeling. Information technology has come up to mean well-nigh any model that includes a mathematical conception of some kind. Mathematical equations can be solved and/or simulated on a computer, and thus the term computational has seemed advisable to many authors. Marr'due south levels of assay (Marr 1982) provide a different view. He specified 3 levels of analysis: the computational level (a formal argument of the problems that must overcome), the algorithmic level (the strategy that may be used), and the implementation level (how the chore is actually performed in the encephalon or in a computer, solving the problems laid out at the computational level, using the strategies of the algorithmic level and adding in the details required for their implementation). Our utilize of the term 'computational model' is intended to capture models that specify all three of Marr's levels in a testable manner. Our description of the functional elements of attention in Department iii corresponds to Marr's first level of analysis, the problems that must exist addressed. The terms 'descriptive', 'data-plumbing fixtures' and 'algorithmic' as used here describe iii different methodologies for specifying Marr'south algorithmic level of analysis. Section 2 will provide definitions and farther discussion on the model classification strategy used here.

Models of attending are complex providing mechanisms and explanations for a number of functions all tied together with a control system; this is basically the specification at Marr's 'computational level' of analysis. More detail on each of these tasks is provided in Visual Attention. Due to their complication, model evaluation is non a elementary matter and objective conclusions are still elusive.

The bespeak of the next section is to create a context for such models; this enables one to encounter their scientific heritage, to distinguish models on the basis of their modeling strategy, and to situate new models appropriately to enable comparisons and evaluations.

A taxonomy of models

We present a taxonomy of models with computational models clearly lying in the intersection of how the biological customs and how the reckoner vision customs view attentive processes. See Figure 2. In that location are two main roots in this lattice – i for the uses of circumspect methods in estimator vision and one for the development of attention models in the biological vision customs. Although both have proceeded independently, and indeed, the apply of attending appears in the computer vision literature before nigh biological models, the major signal of intersection is the grade of computational models (using the definition given higher up).

It is quite clear that the motivations for all the modeling efforts come from two sources. The starting time is the deep interest to understand the perceptual capability that has been observed for centuries, that is, the ability to select, process and act upon parts of ane's sensory experience differentially from the balance. The second is the demand to reduce the quantity of sensory information entering whatever system, biological or otherwise, by selecting or ignoring parts of the sensory input. Although the motivation seems distinct, the determination is the same, and in reality the motivation for attention in whatsoever system is to reduce the quantity of information to procedure in order to complete some chore (run into Computational Foundations for Attentive Processes). But depending on i'south interest, modeling efforts do not always have the aforementioned goals. That is, one may be trying to model a particular prepare of experimental observations, 1 may be trying to build a robotic vision arrangement and attention is used to select landmarks for navigation, one may have interest in centre movements, or in the executive control function, or any one or more of the functional elements described in Visual Attention. As a consequence, comparing models is not straightforward, fair, or useful. Comparing pieces that represent the same functionality is more than relevant, but there are so many of these combinations that it would be an practice beyond the telescopic of this overview.

Figure 2: Model Taxonomy

The reckoner vision branch

The use of attentive methods has pervaded the computer vision literature demonstrating the importance for reducing the amount of information to be processed. It is important to note that several early analyses of the extent of the information load upshot appeared (Uhr 1972, Feldman and Ballard 1982, Tsotsos 1987) with converging suggestions for its solution, those convergences appearing in a number of the models below (specially those of Burt 1988 or Tsotsos 1990). Specifically, the methods can be grouped into four categories. Within modernistic computer vision, in that location are many, many variations and combinations of these themes because regardless of the impressive rapid increases in power in modern computers, the inherent difficulty of processing images demands attentional processes (see Computational Foundations or Attentional Processes).

Interest point operations

Ane way to reduce the amount of an paradigm to exist processed is to concentrate on the points or regions that are most interesting or relevant for the next phase of processing (such every bit for recognition or activeness). The idea is that possibly 'interestingness' can be computed in parallel across the whole paradigm and then those interesting points or regions tin exist processed in more depth serially. The showtime of these methods is due to Moravec (1981) and since then a big number of unlike kinds of 'interest indicate' computations accept been used. Information technology is interesting to note the parallel hither with the Saliency Map Hypothesis described below.

Perceptual organization

The computational load is not only due to the large number of paradigm locations (this is non so large a number as to cause much difficulty for modernistic computers), simply rather information technology is due to the combinatorial nature of combinations of positions or regions. In perceptual psychology, how the brain might organize items is a major concern, pioneered by the Gestaltists (Wertheimer 1923). Thus, computer vision has used grouping strategies following Gestalt principles in order to limit the possible subsets of combinatorially divers items to consider. The offset such employ appeared in Muerle and Allen (1968) in the context of object division.

Active vision

Human being optics move, and humans move around their globe in club to learn visual information. Agile vision in computer vision uses intelligent control strategies applied to the data acquisition process depending on the electric current land of data interpretation (Bajcsy 1985, Tsotsos 1992). A variety of methods have appeared following this idea, maybe the earliest i most relevant to this discussion is the robotic binocular photographic camera system of Clark and Ferrier (1988), featuring a salience-based fixation command machinery.

Predictive methods

The application of domain and job knowledge to guide or predict processing is a powerful tool for limiting processing, a fact that has been formally proved (Tsotsos 1989; Parodi et al. 1998). The first apply was for oriented line location in a face-recognition job (Kelly 1971). The first case for temporal window prediction was in a motion recognition task (Tsotsos et al. 1980).

The biological vision co-operative

Conspicuously, in this class, the major motivation has always been to provide explanations for the characteristics of biological, especially human, vision. Typically, these accept been developed to explain a item trunk of experimental observations. This is a strength; the authors unremarkably are the ones who have done some or all of the experiments and thus completely understand the experimental methods and conclusions. Simultaneously, all the same, this is also a weakness considering usually the models are ofttimes hard to extend to a broader grade of observations. Along the biological vision co-operative, the three classes identified hither are:

Descriptive models

These models are described primarily using natural language and/or cake diagrams. Their value lies in the caption they provide of sure attentional processes; the abstractness of caption is also their major problem because it is typically open to interpretation. Classic models, even though they were motivated by experiments in auditory attention, take been very influential. Early Selection (Broadbent 1958), Late Selection (Deutsch & Deutsch 1963, Moray 1969, Norman 1968), and Attenuator Theory (Treisman, 1964) are all descriptive models. Others such as Feature Integration Theory (Treisman and Gelade 1980), Guided Search (Wolfe et al. 1989), Breathing Vision (Ballard 1991), Biased Contest (Desimone and Duncan 1995), FeatureGate (Cave 1999), Characteristic Similarity Gain Model (Treue &Martinez-Trujillo 1999), RNA (Shipp 2004), and the model of Knudsen (2007) are as well considered descriptive. The Biased Competition Model has garnered many followers mostly due to the conceptual aspect of information technology combining competition with tiptop-downwardly bias, concepts that actually appeared in earlier models (such as Grossberg 1982 or Tsotsos 1990). These are conceptual frameworks, means of thinking nearly the trouble of attending. Many have played important, indeed foundational, roles in how the field has developed.

Data-fitting models

These models are mathematical and are developed to capture parameter variations in experimental data in equally compact and parsimonious form as possible. Their value lies primarily in how well they provide a fit to experimental information, and in interpolation or extrapolation of parameter values to other experimental scenarios. Good examples are the Theory of Visual Attention (Bundesen 1990) and the set of models that utilise normalization as a basic processing chemical element. An early i is the model of Reynolds et al. 1999) that proposed a quantification of the Biased Contest model. Subsequently, this was refined further onto the Normalization Model of Attention, a union of divisive normalization with biased competition (Reynolds & Heeger 2009). At the aforementioned time a further normalization model appeared due to (Lee & Maunsell 2009), the Normalization Model of Attentional Modulation, showing how attending changes the gain of responses to individual stimuli and why attentional modulation is more than than a proceeds change when multiple stimuli are present in a receptive field.

Algorithmic models

These models provide mathematics and algorithms that govern their functioning and as a consequence present a process past which attention might be computed and deployed. They, however, do not provide sufficient particular or methodology so that the model might be tested on existent stimuli. These models frequently provide simulations to demonstrate their actions. In a real sense they are a combination of descriptive and data-plumbing fixtures models; they provide more detail on descriptions so they may exist fake while showing good comparing to experimental data at qualitative levels (and perhaps also quantitative). The all-time known of these models is the Saliency Map Model (Koch and Ullman 1985 - defined in Department 2.3.2); it has given rise to many subsequent models. It is interesting to note that the Saliency Map Model is strongly related to the Interest Indicate Operations on the other side of this taxonomy. Other algorithmic models include Adaptive Resonance Theory (Grossberg 1982), Temporal Tagging (Niebur et al. 1993; Usher and Niebur 1996), Shifter Circuits (Anderson and Van Essen 1987), Visual Routines (Ullman 1984), CODAM (Taylor and Rogers 2002), and a SOAR-based model (Wiesmeyer & Laird 1990).

The computational co-operative

As mentioned earlier, the point of intersection betwixt the figurer vision and biological vision communities is represented by the gear up of computational models in the taxonomy. Computational Models non only include a process clarification for how attention is computed, but also can be tested past providing image inputs, similar to those an experimenter might present a subject, and and then seeing how the model performs by comparison. The biological connection is key and pure computer vision efforts are not included here. Under this definition, computational models generally provide more complete specifications and permit more objective evaluations as well. This greater level of particular is a forcefulness but too a weakness because there are more than details that require experimental validation.

Many models have elements from more ane class so the separation is not a strict one. Computational models necessarily are Algorithmic Models and often also include Data-Plumbing fixtures elements. Still, in recent years four major schools of thought have emerged, schools that will be termed 'hypotheses' here since each has both supporting and detracting bear witness. In what follows, an endeavour is fabricated to provide the intellectual antecedents for each of these major hypotheses. The taxonomy is completed in Department 2.4 when several instances of each of the classes are added.

The selective routing hypothesis

This hypothesis focuses on how attention solves the issues associated with stimulus selection and so transmission through the visual cortex. The issues of how signals in the encephalon are transmitted to ensure correct perception announced, in part, in a number of works. Milner (1974), for example, mentions that attending acts in role to actuate feedback pathways to the early visual cortex for precise localization, implying a pathway search problem. The complexity of the brain'due south network of feed-forwards and feedback connectivity highlights the concrete problems of search, transmission and finding the right path between input and output (see Felleman and Van Essen 1991). Anderson and VanEssen's Shifter Circuits proposal (Anderson & VanEssen 1987) was presented primarily to solve these physical routing and transmission problems using control signals to each layer of processing that shift selected inputs from one path to another. The routing issues, described in (Tsotsos et al. 1995), are: 1) A single unit at the top of the visual processing network receives input from a sub-network of converging inputs, and thus from a big portion of the visual field (the Context Trouble - run across Figure 3a); 2) A unmarried event at the input volition touch on a large number of units in the network due to a diverging feed-forwards signal resulting in a loss of localization data (the Blurring Problem - come across Effigy 3b); 3) Two separate visual events in the visual field will activate ii overlapping sub-networks of units and connections, whose region of overlap will contain units whose activity is a function of both events. Thus, each event interferes with the interpretation of other events in the visual field (the Cross-Talk Problem - see Figure 3c).

Any model that uses a biologically plausible network of neural processing units needs to address these problems. One form of solutions is that of an attentional 'beam' through the processing network every bit shown in Figure 3d.

Models that fall into the Selective Routing grade include Pyramid Vision (Burt 1988), Olshausen et al. (1993), Selective Tuning (Tsotsos et al. 1995; Zaharescu et al. 2004, Tsotsos et al. 2005; Rodriguez-Sanchez et al. 2007, Rothenstein et al. 2008), NeoCognitron (Fukushima 1986), and Scan (Postma et al. 1997).

Effigy 3: Illustrating the indicate routing issues in a neural network (the bottom layer is the input and the pinnacle the highest level of processing). a) The Context Problem - In this feed-forrard scenario, it is easy to see that many neurons in the input layer affect each single neuron in the highest layer. If the 'attended' stimulus is the ane highlighted past the arrow, then in that location is no neuron in the highest layer that 'sees' only it; they all 'see' the desired stimulus within the context of other input stimuli. (adapted from Tsotsos et al. 1995) b) The Blurring Trouble - In a different feed-forward scenario, a single stimulus can touch on the response of the entire set of highest layer neurons. Although the stimulus is well localized within the input layer, whatever localization information is blurred across the highest layer if no remedial processing is added. (adapted from Tsotsos et al. 1995) c) The Cross-Talk Trouble - This is likewise a feed-forward scenario but with two stimulus elements in the input layer, one in blueish, roofing only a single neuron and the other, in scarlet, larger covering 2 neurons. The set of feed-forrard connections they activate overlap, shown by the majestic coloured neurons. The overlap of signals interferes with one another and this corrupted signal covers about of highest layer. (adapted from Tsotsos et al. 1995) d) A solution to these three problems involves an attentional 'beam' that modulates all layers of the network to allow the selected items to pass through while suppressing stimuli in the context that might interfere with the processing of the selected stimulus. (adapted from Tsotsos 1990) e) The modulatory action of the beam strategy shown in d) causes changes in the configuration shown in c). The selected (attended) neuron in the highest layer is indicated past the arrow. The recurrent modulation of the attentional axle leads to the selected neurons shown in black. The pathways that are suppressed (and resulting neurons deprived of input) are in grey. (adapted from Tsotsos et al. 1995)

The saliency map hypothesis

This hypothesis has its roots in Feature Integration Theory (Treisman and Gelade 1980) and appears first in the class of algorithmic models above (Koch and Ullman 1985). It includes the post-obit elements (see Figure four): (i) an early representation composed of a prepare of characteristic maps, computed in parallel, permitting carve up representations of several stimulus characteristics; (ii) a topographic saliency map where each location encodes the combination of properties across all feature maps every bit a conspicuity measure; (iii) a selective mapping into a cardinal non-topographic representation, through the topographic saliency map, of the backdrop of a unmarried visual location; (iv) a winner-have-all (WTA) network implementing the option process based on ane major rule: conspicuity of location (minor rules of proximity or similarity preference are also suggested); and, (v) inhibition of this selected location that causes an automatic shift to the adjacent near conspicuous location. Feature maps code conspicuity within a particular feature dimension. The saliency map combines data from each of the feature maps into a global measure where points respective to one location in a feature map project to single units in the saliency map. Saliency at a given location is determined by the degree of difference between that location and its surround. The models of Clark & Ferrier (1988), Sandon (1990) - the outset implementation of the Koch & Ullman model -, Itti et al. (1998), Itti & Koch (2000), Walther et al. (2002), Navalpakkam & Itti (2005), Itti & Baldi (2006), SERR Humphreys & Müller (1993), Zhang et al. (2008), and Bruce & Tsotsos (2009) are all in this class. The drive to notice the best representation of saliency or conspicuity is a major current activity; whether or not a single such representation exists in the brain remains an open question with evidence supporting many potential loci (summarized in Tsotsos et al. 2005).

Figure 4: The Saliency Map Model every bit originally conceived by Koch & Ullman 1985. (figure adapted from Koch & Ullman 1985)

The temporal tagging hypothesis

The earliest conceptualization of this idea seems to be due to Grossberg who between 1973 and 1980, presented ideas and theoretical arguments regarding the relationship among neural oscillations, visual perception and attention (see Grossberg 1980). His work led to the Fine art model that provided details on how neurons may accomplish stable states given both top-down and bottom-up signals and play roles in attending and learning (Grossberg 1982). Milner also suggested that the unity of a figure at the neuronal level is defined by synchronized firing action (Milner 1974). von der Malsburg (1981) wrote that neural modulation is governed past correlations in temporal structure of signals and that timing correlations signal objects. He divers a detailed model of how this might be accomplished, including neurons with dynamically modifiable synaptic strengths that became known as von der Malsburg synapses. Crick & Koch (1990) later on proposed that an attentional mechanism binds together all those neurons whose action relates to the relevant features of a single visual object. This is washed by generating coherent semi-synchronous oscillations in the xl-70Hz range. These oscillations then activate a transient short-term retention. Models subscribing to this hypothesis typically consist of pools of excitatory and inhibitory neurons connected equally shown in Figure five. The actions of these neuron pools are governed by sets of differential equations; it is a dynamical system. Strong support for this view appears in a overnice summary by Sejnowski and Paulsen (2006). The model of Hummel & Biederman (1992) and those from Deco's group - Deco & Zihl (2001), Corchs & Deco (2001), Deco, Pollatos & Zihl (2002) - are within this form. A number of other models exist but practice not adapt to our definition of computational model; they are mathematical models that only provide simulations of their performance. As such, we cannot include them here only do provide these citations considering of the intrinsic involvement in this model class (Niebur et al. (1993), Usher & Niebur (1996), Kazanovich & Borisyuk (1999), Wu & Guo (1999)). Conspicuously, there is room for expansion of these models into computational form. This hypothesis remains controversial (see Shadlen and Movshon 1999).

Figure 5: Typical neural connectivity design for attentional models focusing on oscillatory behavior. The model network consists of a fully connected set of excitatory and inhibitory neurons. Each excitatory and inhibitory neuron likewise receives a constant driving current, I.(figure adapted from Buia and Tiesinga 2006; farther discussion can be plant at that place).

The emergent attending hypothesis

The emergent attending hypothesis proposes that attention is a holding of big assemblies of neurons involved in competitive interactions (of the kind mediated by lateral connections) and selection is the combined result of local dynamics and top-down biases (run into Figure 6). In other words, there is no explicit selection procedure of any kind. The mathematics of the dynamical organisation of equations leads through its evolution alone to single peaks of response that represent the focus of attention. Duncan (1979) provided an early word of properties of attention having an emergent quality in the context of divided attention. Grossberg's 1982 Fine art (Adaptive Resonance Theory) model played a formative role here. Such an emergent view took further root with work on the role of emergent features in attention by Pomerantz and Pristach (1989) and Treisman and Paterson (1984). Later on, Styles (1997) suggested that attentional behaviour emerges as a result of the circuitous underlying processing in the brain. Shipp'southward review (2004) concludes that this is the most probable hypothesis. The models of Heinke and Humphreys SAIM (1997, 2003), Hamker (1999; 2000; 2004; 2005; 2006), Spratling (2008), Deco and Zihl (2001), and Corchs and Deco (2001), belong in this class among others. Clearly, there must be mechanisms that support the process behind this; Hamker'due south model provide a good view of how this might be accomplished and shows, for example, how interactions between hierarchical representations are employed. Desimone and Duncan (1995) view their biased competition model as a fellow member of this class, writing "attention is an emergent belongings of ho-hum, competitive interactions that work in parallel across the visual field". In turn, many of the models in this class are besides strongly based on Biased Competition.

Effigy 6: The concept of competitive interactions that grade the basis of the Emergent Attention Models. Shown is an example of the concept from Hamker (2005). Each of the visual representations (the rectangles) cooperates and competes with several other representations. Within each representation, boosted local competitions aid define the contents. No separate attentive mechanisms are provided.

The computational models that are instances of the four major hypotheses

A number of models have appeared over the years that borrow from the major attentional hypotheses and equally noted earlier, many borrow from more one. This section will classify a number of computational models conforming to the definition presented earlier. The directory of models follows while Figure 1 groups them according to their foundational ideas.

Model Directory:

AIM        Bruce & Tsotsos (2005; 2009) Fine art        Grossberg (1975; 1982), Carpenter et al. (1998) ClaFer     Clark & Ferrier (1988) DraLio     Draper & Lionelle (2005) FastGBA    Sharma (2016) Hamker     Hamker (1999; 2000; 2004; 2005; 2006)  HumBie     Hummel & Biederman (1992) LanDen     Lanyon & Denham (2004) LeeBux     Lee et al. (2003) LiZ        Li (2001) MORSEL     Mozer (1991) NeoCog     Fukushima (1986) NeurDyn    Deco & Zihl (2001), Corchs & Deco (2001), Deco, Pollatos & Zihl (2002) NowSej     Nowlan & Sejnowski (1995) OliTor     Oliva et al. (2003) OlshAn     Olshausen et al. (1993) PC/BC-DIM  Spratling (2008) PyrVis     Burt (1988) SAIM       Heinke & Humphreys (1997,2003)  Sandon     Sandon (1990) SCAN       Postma et al. (1997) SERR       Humphreys & Müller (1993) SM         Itti et al. (1998) SMOC       Itti & Koch (2000) SMSurp     Itti & Baldi  (2006) SMTask     Navalpakkam & Itti (2005) ST         Tsotsos et al. (1995) STActive   Zaharescu et al. (2004) STBind     Tsotsos et al. (2008), Rothenstein et al. (2008) STFeature  Rodriguez-Sanchez et al. (2007) STRec      Tsotsos et al. (2005) Sun        Zhang et al. (2008) SunFish    Lord's day et al. (2008) UshNie     Usher & Niebur (1996) vaHeGi     van de Laar et al. (1997) VISIT      Ahmad (1992) WalItt     Walther et al. (2002)

Effigy 1 makes articulate that the Saliency Map hypothesis seems most popular. Further, it is axiomatic, that few of the possible combinations of hypotheses seem explored. We would suggest that those empty joint classes are potentially valuable avenues of exploration because it is articulate that no single hypothesis covers the total breadth of attentional behavior, as was argued in Department 2 and besides further discussed in Section 3.

Functional elements

What are the functional elements of attention that a complete modeling effort must include? This is a difficult question and there have been several previous papers that attempt to address information technology. Itti and Koch (2001), for example, review the land of attentional modeling, but from the indicate of view that assumes attending is primarily a bottom-up process based largely on their notion of saliency maps. Knudsen (2007) provides a more than contempo review; his perspective favors an early selection model. He provides a number of functional components primal to attention: working retentiveness, competitive option, peak-downwardly sensitivity command, and filtering for stimuli that are probable to be behaviorally of import (salience filters). In his model, the world is showtime filtered by the salience filters in a purely-bottom fashion, creating the various neural representations on which competitive selection is based. The part of top-down data and control is to compute sensitivity control affecting the neural representations by incorporating the results of selection, working retentivity and gaze. A tertiary functional structure is that of Hamker (1999), whose work is an excellent example of the neuro-dynamical approach. The focus is on excitatory and inhibitory neural pools, the ordering of their furnishings every bit well as the neural sites affected and top-down bias is really a unproblematic bias arising from area It. 'What' and 'where' functions are separated - features are computed and represented in the ventral stream and spatial location in the dorsal. A review past Rothenstein & Tsotsos (2008) presents a classification of models with details on the functional elements each includes. Finally, Shipp (2004) provides yet another useful overview where he compares several different models forth the dimension of how they map onto organisation level circuits in the brain. He presents his Real Neural Architecture (RNA) model for attending, integrating several different modes of performance – parallel or series, bottom-up or top-down, pre-attentive or attentive – found in cognitive models of attending for visual search.

Information technology would seem that at that place is value in providing an boosted perspective, namely, one that is orthogonal to the neural correlates of role and that is contained of model and modeling strategy. This alternate functional decomposition is presented in Visual Attention and covers the breadth of visual attention from information reduction, to representations, to command to external manifestations of attentional beliefs. Information technology is off-white to say that a complete model should account for each; it is also off-white to say that no model nonetheless comes shut. These functional elements are listed beneath. We invite modelers to comment each with a brief description of how their model provides the functionality listed; those details are beyond the scope of this commodity. The primary elements of attention are now given. They are detailed further in Visual Attention where ane can also run into all the appropriate citations and biological evidence.

An of import point here is that models of visual attention should be able to deal with each of these. It would be in the best interests of the readers of this article that each modeler provide some note through this commodity (maybe through the utilize of a SUB-PAGE) on how their model incorporates these attentional elements. It would form a major contribution to the comparison of models.

Evaluating a model

The above lists of elements are unlikely to be complete nor the optimal partitioning of the problem simply are representative of most current thinking. The effectiveness of any model, regardless of type as laid out in Section ii, is adamant by how well it provides explanations for what is known about as many of the above functional elements as possible. Every bit of import, models must be falsifiable, that is, they must make testable predictions regarding new behaviors or functions non withal observed - behaviors that are not easily deduced from current cognition, that are counterintuitive - that would enable one to support or reject the model. To test all the models on these criteria is beyond the telescopic of this article but is a necessary task for anyone wishing to respond the question "Which is the best model of visual attention?"

Nevertheless, several authors are making strong attempts at comparative evaluation using large databases of images and providing executable code that others tin use. Primarily, these evaluations are for models that focus on representations of saliency that drive fixation models in the Saliency Map Hypothesis class. Itti'southward Neuromorphic Vision Toolkit was the first; more than recently others, such as Bruce, Draper and Lionnelle, and Zhang et al. show serious evaluations and provide public databases for others to apply. We add together that Draper & Lionelle (2003) laid out the starting time steps for a principled comparative evaluation. This is very positive even though statistical validity of databases and the relevant comparative dimensions remain bug needing more work.

Acknowledgement

We thank Mazyar Fallah, Heather Jordan, Fred Hamker and an bearding reviewer for their comments on earlier drafts.

References

Ahmad, S. (1992). VISIT: a neural model of covert visual attention, in Advances in Neural Information Processing Systems, edited by J.E. Moody, et al., 4:420-427, San Mateo, CA: Morgan Kaufmann.
Anderson, C. and D. Van Essen (1987). Shifter Circuits: a computational strategy for dynamic aspects of visual processing. Proc. Natl. University Sci. U.s. 84, p6297-6301.
Bajcsy, R. (1985). Active perception vs passive perception. In Proc. IEEE Workshop on Computer Vision: Representation and Control, Oct., Bellaire, Mich., p55–62.
Ballard, D. (1991). Breathing vision. Artificial Intelligence, 48, p57–86.
Broadbent, D. (1958). Perception and communication, Pergamon Printing, NY.
Bruce, N.D.B., Tsotsos, J.K. (2009). Saliency, Attending, and Visual Search: An Information Theoretic Approach, Journal of Vision 9:3, p1-24.
Bruce, Northward.D.B., Tsotsos, J.Chiliad. (2005). Saliency Based on Information Maximization, Proc. NIPS 2005, Vancouver, BC.
Buia, C., Tiesinga, P. (2006). Attentional modulation of firing rate and synchrony in a model cortical network, Journal of Computational Neuroscience xx(3), p1573-6873.
Bundesen, C. (1990). A theory of visual attention, Psychological Review, 97, p523-547.
Burt, P. (1988). Attention mechanism for vision in a dynamic world, Proc. 9th Int. Conf. on Pattern Recognition, p977–987.
Carpenter, G.A., Grossberg, South., Lesher, Grand.W. (1998). The what-and-where filter: A spatial mapping neural network for object recognition and image understanding, Computer Vision and Epitome Understanding, 69, p1-22.
Cave. K. (1999). The FeatureGate model of visual selection, Psychological Res. 62, p182-194.
Clark, J.J., Ferrier, N. (1988). Modal control of an circumspect vision system. Proc. ICCV, Tarpon Springs Florida, p514–523.
Corchs, S., Deco, One thousand. (2001). A neurodynamical model for selective visual attention using oscillators, Neural Networks 14, p981-990.
Crick, F., Koch, C. (1990). Some reflections on visual aware-ness. Cold Spring Harbor Symp. Quant. Biol. 55, p953–962.
Deco, K., Zihl, J. (2001). A neurodynamical model of visual attention: feedback enhancement of spatial resolution in a hierarchical system, J Comput Neurosci 10(3), p231-53.
Deco, G., Pollatos, O., Zihl, J. (2002). The time course of selective visual attention: theory and experiments, Vision Research 42, p2925–2945
Desimone, R., Duncan, J. (1995). Neural mechanisms of selective visual attention, Ann. Rev. of Neuroscience xviii, p193-222.
Deutsch, J., Deutsch, D. (1963). Attention: Some theoretical considerations, Psych. Review 70, p80-xc.
Draper, B., Lionelle, A. (2005). Evaluation of Selective Attending under Similarity Transforms, Estimator Vision and Image Understanding 100(1-2), p152-171.
Duncan J., (1979). Divided attending: the whole is more than the sum of its parts, J Exp Psychol Hum Percept Perform 5(2), p216-28.
Feldman, J. & Ballard, D. (1982). Connectionist models and their backdrop, Cognitive Science 6, p205 - 254.
Felleman, D., Van Essen, D., (1991). Distributed hierarchical processing in the primate visual cortex. Cerebral Cortex one, p1–47.
Frintrop, South., Rome, Due east. and Christensen, H.I., (2010): Computational Visual Attention Systems and their Cerebral Foundation: A Survey, ACM Transactions on Applied Perception (TAP), 7(i)
Frith, C. (2005). The Pinnacle in Top-Down Attending, in Neurobiology of Attending, ed. by Itti, Rees, Tsotsos, Elsevier Press, p105-108
Fukushima, K. (1986). A neural network model for selective attention in visual pattern recognition, Biological Cybernetics 55(1), p5 - 15.
Grossberg, S. (1980). Biological Competiton: Decision Rules, blueprint formation and oscillations, PNAS 77 p2338-2342.
Grossberg, S. (1982). A psychophysiological theory of reinforcement, bulldoze, motivation, and attention. Periodical of Theoretical Neurobiology, one, p286-369.
Grossberg, S. (1975). A neural model of attention, reinforcement, and bigotry learning, International Review of Neurobiology eighteen, p263-327.
Hamker, F.H. (1999). The role of feedback connections in job-driven visual search. In: D. Heinke, G. W. Humphreys & A. Olson (eds.) Connectionist Models in Cognitive Neuroscience, Proc. of the 5th Neural Computation and Psychology Workshop (NCPW'98). London: Springer Verlag, 252-261.
Hamker, F.H. (2000). Distributed competition in directed attention, Proceedings in Artificial Intelligence, Vol. nine. Dynamische Perzeption,Workshop der GI-Fachgruppe 1.0.4 Bildverstehen. Hrsg. von One thousand. Baratoff, H.Neumann. Berlin: AKA, Akademische Verlagsgesellschaft, p39-44.
Hamker, F.H. (2004). A dynamic model of how feature cues guide spatial attending, Vision Inquiry 44, p 501-521.
Hamker, F. H. (2005) The emergence of attending past population-based inference and its role in distributed processing and cognitive control of vision, Computer Vision and Epitome Understanding, 100, p. 64-106.
Hamker, F. H., Zirnsak, Grand. (2006) V4 receptive field dynamics as predicted by a systems-level model of visual attention using feedback from the frontal eye field. Neural Networks. 19:1371-1382.
Hanson,A.R., Riseman, E.Yard., (1978). Computer Vision Systems, Academic Printing.
Heinke, D., Humphreys, M.W. (1997). SAIM: A Model of Visual Attention and Neglect, 7th International Conference on Artificial Neural Networks, Lausanne, Switzerland, Springer Verlag.
Heinke, D., Humphreys, Chiliad.Westward., (2003). Attention, Spatial Representation, and Visual Neglect: Simulating Emergent Attention and Spatial Memory in the Selective Attention for Identification Model (SAIM). Psychological Review, 110(ane), pp.29-87.
Hummel, J.E., & Biederman, I. (1992). Dynamic binding in a neural network for shape recognition, Psychological Review 99, p480–517.
Humphreys, G., Müller, H., (1993). Search via Recursive Rejection (SERR): A Connectionist Model of Visual Search, Cognitive Psychology, 25, p45 - 110.
Itti, 50., Baldi, P. (2006). Bayesian Surprise Attracts Homo Attention. Advances in Neural Information Processing Systems xviii, 547–554.
Itti, L. (2005), Models of Bottom-upwards Attention and Saliency, in Neurobiology of Attention, ed. past Itti, Rees and Tsotsos, p576-582.
Itti, Fifty., Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts of visual attention, Vision Res 40(10-12), p1489-506
Itti, L., C. Koch, et al. (1998). A model of saliency-based visual attention for rapid scene analysis, IEEE Transactions on Pattern Analysis and Car Intelligence xx(11), p1254-1259.
Kazanovich, Y. B., Borisyuk, R. M. (1999). Dynamics of neural networks with a central element, Neural Networks, 12, p441-454.
Kelly, M. (1971). Edge detection in pictures by computer using planning, Motorcar Intell. 6, p397-409.
Koch, C. Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry. Hum. Neurobiology iv, p219–227.
Knudsen, E. (2007). Fundamental Components of Attending, Annu. Rev. Neurosci. 30, p57–78
Lanyon, Fifty. J., Denham, S.L. (2004). A model of active visual search with object-based attention guiding scan paths, Neural Networks 17, 873–897.
Lee, K. Westward., H. Buxton, et al. (2003). Selective attention for cue-guided search using a spiking neural network. International Workshop on Attention and Performance in Computer Vision, Graz, Austria.
Lee, J., Maunsell, J.H. (2009). A Normalization Model of Attentional Modulation of Unmarried Unit Responses, PLoS Ane four: e4651.
Li, Z. (2002). A saliency map in primary visual cortex, Trends in Cognitive Sciences Vol. 6, No. one, Jan. 2002, p9-16.
Li, Z. (2001). Computational design and nonlinear dynamics of a recurrent network model of the primary visual cortex, Neral Computation 13/eight, p. 1749-1780
Macmillan, N.A., Creelman, C.D. (2004). Detection Theory: A User's Guide, Routledge.
Marr, D. (1982). Vision: A Computational Investigation into the Man Representation and Processing of Visual Data. Henry Holt and Co., New York
Milner, P. (1974). A model for visual shape recognition, Psych. Rev. 81, p521-535.
Moravec, H. (1981). Rover visual obstacle avoidance, IJCAI, Vancouver, BC, p785-790.
Moray, N. (1969). Attention: Selective Processes in Vision and Hearing, Hutchinson, London.
Mozer, K. C. (1991). The perception of multiple objects: a connectionist approach, Cambridge, Mass., MIT Press.
Muerle, J., Allen, D. (1968). Experimental Evaluation of Techniques for Automated Division of Objects in a Circuitous Scene, in G. Cheng et al., Eds. Pictorial Design Recognition, Thompson, Washington DC, p3 - 13
Navalpakkam Five, Itti L. (2005). Modeling the influence of task on attention, Vision Res. 45(ii), p205-31.
Niebur, E., Koch, C., Rosin, C. (1993). An oscillation-based model for the neural footing of attention, Vision Research 33, p2789-2802.
Niebur, Due east., Koch, C. (1994). A model for the neuronal implementation of selective visual attention based on temporal correlation among neurons, J. Comput. Neuroscience 1(i), p141-158.
Norman, D. (1968). Toward a theory of memory and attention, Psych. Review 75, p522-536.
Nowlan, Due south. and Sejnowski, T. (1995). A selection model for movement processing in surface area MT of primates. The Journal of Neuroscience,15(2), p1195–1214.
Oliva, A., A. Torralba, et al. (2003). Top-Down control of visual attending in object detection. IEEE International Briefing on Epitome Processing, Barcelona, Spain.
Olshausen, B. A., C. H. Anderson, et al. (1993). A neurobiological model of visual attention and invariant design recognition based on dynamic routing of information. J Neurosci 13(11), p4700-nineteen.
Parodi, P., Lanciwicki, R., Vijh, A., Tsotsos J.K. (1998). Empirically-Derived Estimates of the Complication of Labeling Line Drawings of Polyhedral Scenes, Bogus Intelligence 105, p47 - 75.
Parkhurst, D., Law, K., Neibur, E. (2002). Modeling the role of salience in the allocation of overt visual attention, Vision Research 42, p107–123
Pomerantz, J.R., Pristach, E.A. (1989). Emergent Features, Attention, and Perceptual Glue in Visual Form Perception, Journal of Experimental Psychology: Homo Perception and Performance, xv(four), p635-649
Postma, E. O. et al. (1997). Browse: A scalable model of attentional choice. Neural Networks 10(6), p993-1015.
Reynolds, J., Chelazzi, L., Desimone, R. (1999). Competitive Mechanisms Subserve Attending in Macaque Areas V2 and V4, J. Neurosci. nineteen (5), p1736–1753.
Reynolds, J., Heeger, D. (2009). The Normalization Model of Attention, Neuron 61, p168 - 185.
Riesenhuber G, Poggio T. (1999). Hierarchical models of object recognition in cortex. Nat. Neurosci. 2, p1019–25
Rodriguez-Sanchez, A.J., Simine, East., Tsotsos., J.Thousand. (2007). Attention And Visual Search, Int. J. Neural Systems 17(four), p275-88.
Rosenblatt, F. (1961). Principles of Neurodynamics: Perceptions and the Theory of Encephalon Mechanisms, Washington, CD: Spartan Books.
Roskies, A. (1999). The binding problem - introduction. Neuron 24, p7–9.
Rothenstein, A.Fifty., Rodriguez-Sanchez, A.J., Simine, E., Tsotsos, J.One thousand. (2008). Visual Feature Binding within the Selective Tuning Attention Framework, Int. J. Blueprint Recognition and Bogus Intelligence - Special Result on Encephalon, Vision and Artificial Intelligence, p861-881.
Sandon, P. (1990). Simulating visual attention, J. Cognitive Neuroscience 2, p213-231.
Sejnowksi, T., Paulsen, O. (2006). Network Oscillations: Emerging Computational Principles, The Journal of Neuroscience 26(half dozen), p1673-1676.
Serre, T., Wolf, L. Bileschi, Due south., Riesenhuber, Thousand., Poggio, T. (2007). Recognition with Cortex-similar Mechanisms, IEEE Transactions on Pattern Assay and Motorcar Intelligence 29(iii), p411-426.
Sharma P. (2016). Modeling Bottom-Up Visual Attending Using Dihedral Grouping D4 §. Symmetry, 8(8):79, p1-xiv.
Shadlen. 1000., Movshon, A. (1999). Synchrony Unbound: Review A Critical Evaluation of the Temporal Bounden Hypothesis, Neuron 24, p67–77.
Shipp, S. (2004). The brain circuitry of attention, Trends in Cognitive Sciences 8(5), p223-230.
Spratling (2008). Predictive coding as a model of biased competition in visual attending. Vision Research 48(12):1391-408.
Styles, Eastward. (1997). The Psychology of Attention, Psychology Press.
Sun, Y., Fisher, R., Wang, F., Gomes, H. (2008). A computer vision model for visual-object-based attending and center movements, Computer Vision and Image Understanding 112(2), p126-142.
Taylor, J.Grand., Rogers, Thou. (2002). A control model of the move of attention. Neural Networks xv, p309-326
Treisman, A. (1964). The effect of irrelevant fabric on the efficiency of selective listening, American J. Psychology 77, p533-546.
Treisman, A., Gelade, G. (1980). A feature integration theory of attention, Cognitive Psychology 12, p97-136.
Treisman, A., Paterson, R. (1984). Emergent features, attending, and object perception. Journal of Experimental Psychology: Man Perception and Performance, x, p12-31.
Treue, South., Martinez-Trujillo, J. (1999). Feature-Based Attending Influences Motility Processing Proceeds in Macaque Visual Cortex, Nature 399(6736), p575-9.
Tsotsos, J.Thousand. (1987). A 'complication level' analysis of vision, Proceedings of International Conference on Computer Vision, London, England.
Tsotsos, J.M. (1989). The complexity of perceptual search tasks, Proc. Int. J. Conf. Artif. Intell. Detroit p1571–1577.
Tsotsos, J.K. (1990). A Complexity Level Analysis of Vision, Behavioral and Encephalon Sciences thirteen, p423 – 445.
Tsotsos, J., Mylopoulos, J., Covvey, H., Zucker, Due south. (1980). A framework for visual motion understanding, IEEE Patt. Anal. Motorcar Intell. ii, p563-573.
Tsotsos, J.K. (1992). On the Relative Complexity of Passive vs Active Visual Search, International Journal of Reckoner Vision vii-ii, p 127 - 141.
Tsotsos, J. K., S. One thousand. Culhane, et al. (1995). Modeling Visual-Attending Via Selective Tuning. Artificial Intelligence 78(1-ii), p507-545.
Tsotsos, J.K., Liu, Y., Martinez-Trujillo, J., Pomplun, M., Simine, E., Zhou, One thousand. (2005). Attending to Visual Motility, Computer Vision and Epitome Agreement 100(1-2), p3 - twoscore.
Tsotsos, J.K., Rodriguez-Sanchez, A.J., Rothenstein, A.Fifty., Simine, Eastward. (2008). Different Binding Strategies for the Different Stages of Visual Recognition, Encephalon Research 1225, p119-132.
Tsotsos, J.K., Itti, Fifty., Rees, Chiliad. (2005). A Brief and Selective History of Attention, in Neurobiology of Attending, Editors Itti, Rees & Tsotsos, Elsevier Press, 2005
Uhr, L. (1972). Layered `recognition cone' networks that preprocess, classify and describe, IEEE Transactions on Computers, p758-768.
Conductor, M., Niebur, E. (1996). Modeling the temporal dynamic of IT neurons in visual search: A mechanism for tiptop-downwardly selective attention, J. Cognitive Neuroscience 8:four, p311-327.
van de Laar, P., Heskes, T., Gielen, Southward. (1997). Task-dependent learning of attention, Neural Networks 10, p981-992.
von der Malsburg, C. (1981). The correlation theory of brain function, Internal Rpt. 81-two, Dept. of Neurobiology, Max-Planck-Institute for Biophysical Chemical science, Göttingen, Germany.
Wolfe, J., Cave, K., Franzel, S. (1989). Guided search: An alternative to the characteristic integration model for visual search, J. Exp. Psychology: Human Perception and Performance fifteen, p419-433.
Ullman, Due south. (1984). Visual routines, Cognition 18, p97–159
Walther, D., L. Itti, et al. (2002). Attentional selection for object recognition - A gentle way. Biologically Motivated Reckoner Vision, Proceedings 2525, p472-479.
Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt. II Psychologische Forschung, 4 301-350.
Wiesmeyer, K., Laird, J. (1990). A Computer Model of 2d Visual Attention, Proceedings of the 12th Annual Conference of the Cerebral Science Society, p582 - 589.
Wixson, 50. (1994). Gaze selection for visual search. Rochester, North.Y., University of Rochester Dept. of Computer science.
Wu, A., Guo, A. (1999). Selective visual attention in a neurocomputational model of phase oscillators, Biol. Cybern. 80, p205-214.
Zaharescu, A., Rothenstein, A.L., et al. (2004). Towards a Biologically Plausible Active Visual Search Model. in Attending and Operation in Computational Vision: Second International Workshop, WAPCV 2004, Revised Selected Papers, Lecture Notes in Informatics Volume 3368 / 2005, Springer-Verlag Heidelberg, p133-147.
Zhang, L., Tong, M. H., Marks, T.K., Shan, H., & Cottrell, G.W. (2008). SUN: A Bayesian framework for saliency using natural statistics. Journal of Vision, 8(7):32, p1–xx.

Additional reading

Itti, Fifty., Rees, G., Tsotsos, J.Chiliad. (Editors) (2005) Neurobiology of Attention, Elsevier Press.
Tsotsos, J.K., Itti, L., Rees, G. (2005). A Brief and Selective History of Attention, in Neurobiology of Attention, Editors Itti, Rees & Tsotsos, Elsevier Press, 2005
Itti, L., Koch, C. (2001), Computational modeling of visual attention, Nature Reviews Neuroscience ii, p one-xi.
Rothenstein, A.L., Tsotsos, J.K., (2008). Attention Links Sensing with Perception, Prototype & Vision Computing Journal, Special Upshot on Cognitive Vision Systems (ed. H. Buxton), 26(i) p114-126.
Tsotsos, J.K. (2011). A Computational Perspective on Visual Attention, MIT Press, Cambridge MA.

Links to relevant Scholarpedia articles

Computational Models of Attention

Biased competition theory
Inhibition of render
Visual search
Oscillatory models of attention
Binding problem

Biology (general link to the category Vision)

Visual search
Eye movements
Attention#Covert orienting
Synchronization
Saccade
Attentional blink
Priming of popout

saavedrabatouth1961.blogspot.com

Source: http://www.scholarpedia.org/article/Computational_models_of_visual_attention