Sign up
Forgot password?
FAQ: Login

Behnke S. Hierarchical Neural Networks for Image Interpretation

  • pdf file
  • size 4,97 MB
  • added by
  • info modified
Behnke S. Hierarchical Neural Networks for Image Interpretation
Springer, 2003, -244 p.
It is my pleasure and privilege to write the foreword for this book, whose results I have been following and awaiting for the last few years. This monograph represents the outcome of an ambitious project oriented towards advancing our knowledge of the way the human visual system processes images, and about the way it combines high level hypotheses with low level inputs during pattern recognition. The model proposed by Sven Behnke, carefully exposed in the following pages, can be applied now by other researchers to practical problems in the field of computer vision and provides also clues for reaching a deeper understanding of the human visual system.
This book arose out of dissatisfaction with an earlier project: back in 1996, Sven wrote one of the handwritten digit recognizers for the mail sorting machines of the Deutsche Post AG. The project was successful because the machines could indeed recognize the handwritten ZIP codes, at a rate of several thousand letters per hour. However, Sven was not satisfied with the amount of expert knowledge that was needed to develop the feature extraction and classification algorithms. He wondered if the computer could be able to extract meaningful features by itself, and use these for classification. His experience in the project told him that forward computation alone would be incapable of improving the results already obtained. From his knowledge of the human visual system, he postulated that only a two-way system could work, one that could advance a hypothesis by focussing the attention of the lower layers of a neural network on it. He spent the next few years developing a new model for tackling precisely this problem.
The main result of this book is the proposal of a generic architecture for pattern recognition problems, called Neural Abstraction Pyramid (NAP). The architecture is layered, pyramidal, competitive, and recurrent. It is layered because images are represented at multiple levels of abstraction. It is recurrent because backward projections connect the upper to the lower layers. It is pyramidal because the resolution of the representations is reduced from one layer to the next. It is competitive because in each layer units compete against each other, trying to classify the input best. The main idea behind this architecture is letting the lower layers interact with the higher layers. The lower layers send some simple features to the upper layers, the uppers layers recognize more complex features and bias the computation in the lower layers. This in turn improves the input to the upper layers, which can refine their hypotheses, and so on. After a few iterations the network settles in the best interpretation. The architecture can be trained in supervised and unsupervised mode.
Here, I should mention that there have been many proposals of recurrent architectures for pattern recognition. Over the years we have tried to apply them to non-trivial problems. Unfortunately, many of the proposals advanced in the literature break down when confronted with non-toy problems. Therefore, one of the first advantages present in Behnke’s architecture is that it actually works, also when the problem is difficult and really interesting for commercial applications. The structure of the book reflects the road taken by Sven to tackle the problem of combining top-down processing of hypotheses with bottom-up processing of images. Part I describes the theory and Part II the applications of the architecture. The first two chapters motivate the problem to be investigated and identify the features of the human visual system which are relevant for the proposed architecture: retinotopic organization of feature maps, local recurrence with excitation and inhibition, hierarchy of representations, and adaptation through learning.
Chapter 3 gives an overview of several models proposed in the last years and provides a gentle introduction to the next chapter, which describes the NAP architecture. Chapter 5 deals with a special case of the NAP architecture, when only forward projections are used and features are learned in an unsupervised way. With this chapter, Sven came full circle: the digit classification task he had solved for mail sorting, using a hand-designed structural classifier, was outperformed now by an automatically trained system. This is a remarkable result, since much expert knowledge went into the design of the hand-crafted system.
Four applications of the NAP constitute Part II. The first application is the recognition of meter values (printed postage stamps), the second the binarization of matrix codes (also used for postage), the third is the reconstruction of damaged images, and the last is the localization of faces in complex scenes. The image reconstruction problem is my favorite regarding the kind of tasks solved. A complete NAP is used, with all its lateral, feed-forward and backward connections. In order to infer the original images from degraded ones, the network must learn models of the objects present in the images and combine them with models of typical degradations.
I think that it is interesting how this book started from a general inspiration about the way the human visual system works, how then Sven extracted some general principles underlying visual perception and how he applied them to the solution of several vision problems. The NAP architecture is what the Neocognitron (a layered model proposed by Fukushima the 1980s) aspired to be. It is the Neocognitron gotten right. The main difference between one and the other is the recursive nature of the NAP. Combining the bottom-up with the top-down approach allows for iterative interpretation of ambiguous stimuli.
I can only encourage the reader to work his or her way through this book. It is very well written and provides solutions for some technical problems as well as inspiration for neurobiologists interested in common computational principles in human and computer vision. The book is like a road that will lead the attentive reader to a rich landscape, full of new research opportunities.
Part I. Theory
Neurobiological Background
RelatedWork
Neural Abstraction Pyramid Architecture
Unsupervised Learning
Supervised Learning
Part II. Applications
Recognition of Meter Values
Learning Iterative Image Reconstruction
Face Localization
Summary and Conclusions
  • Sign up or login using form at top of the page to download this file.
  • Sign up
Up