To what extent can matching algorithms based on direct outputs of spatial filters account for human object recognition?

József Fiser, Irving Biederman*, Eric E. Cooper

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract (may include machine translation)

A number of recent successful models of face recognition posit only two layers, an input layer consisting of a lattice of spatial filters and a single subsequent stage by which those descriptor values are mapped directly onto an object representation layer by standard matching methods such as stochastic optimization. Is this approach sufficient for modeling human object recognition? We tested whether a highly efficient version of such a two-layer model would manifest effects similar to those shown by humans when given the task of recognizing images of objects that had been employed in a series of psychophysical experiments. System accuracy was quite high overall, but was qualitatively different from that evidenced by humans in object recognition tasks. The discrepancy between the system's performance and human performance is likely to be revealed by all models that map filter values directly onto object units. These results suggest that human object recognition (as opposed to face recognition) may be difficult to approximate by models that do not posit hidden units for explicit representation of intermediate entities such as edges, viewpoint invariant classifiers, axes, shocks and object parts.

Original languageEnglish
Pages (from-to)237-271
Number of pages35
JournalSpatial Vision
Volume10
Issue number3
DOIs
StatePublished - 1996
Externally publishedYes

Fingerprint

Dive into the research topics of 'To what extent can matching algorithms based on direct outputs of spatial filters account for human object recognition?'. Together they form a unique fingerprint.

Cite this