SCALE-SPACE THEORY IN COMPUTER VISION

Tony Lindeberg

Royal Institute of Technology
Stockholm, Sweden

SHORT DESCRIPTION

We perceive objects in the world as having structures both at coarse
and fine scales.  A tree, for instance, may appear as having a roughly
round or cylindrical shape when seen from a distance, even though it
is built up from a large number of branches.  At a closer look,
individual leaves become visible, and we can observe that they in turn
have texture at an even finer scale.  This fact that objects in the
world appear in different ways depending upon the scale of observation
has important implications when analysing measured data, such as
images, with automatic methods.

"Scale-Space Theory in Computer Vision" describes a formal framework,
called _scale-space representation_, for handling the notion of scale
in image data.  It gives an introduction to the general foundations of
the theory and shows how it applies to essential problems in computer
vision such as computation of image features and cues to surface
shape.  The subjects range from the mathematical underpinning to
practical computational techniques.  The power of the methodology is
illustrated by a rich set of examples.

"This approach will certainly turn out to be part of the foundations
of the theory and practice of machine vision ...  the author has no
doubt performed an excellent service to many in the field of both
artificial and biological vision."                 Jan Koenderink

SCALE-SPACE THEORY IN COMPUTER VISION

Tony Lindeberg

Royal Institute of Technology
Stockholm, Sweden

FOREWORD

The problem of _scale_ pervades both the natural sciences and the
visual arts. The earliest scientific discussions concentrate on visual
perception (much like today!) and occur in Euclid's (c. 300 B.C.)
"Optics" and Lucretius' (c. 100--55 B.C.)  "On the Nature of the
Universe". A very clear account in the spirit of modern "scale-space
theory" is presented by Boscovitz (in 1758), with wide ranging
applications to mathematics, physics and geography. Early applications
occur in the cartographic problem of "generalization", the central
idea being that a _map_ in order to be useful has to be a
"generalized" (coarse grained) representation of the actual terrain
(Miller and Voskuil 1964).  Broadening the scope asks for progressive
summarizing. Very much the same problem occurs in the (realistic)
artistic rendering of scenes.  Artistic generalization has been
analyzed in surprising detail by John Ruskin (in his "Modern Painters",
who even describes some of the more intricate generic "scale-space
singularities" in detail: Where the ancients considered only the 
merging of blobs under blurring, Ruskin discusses the case where a 
blob splits off another one when the resolution is decreased, a case 
that has given rise to confusion even in the modern literature.

It is indeed clear that _any_ physical observation of some extended
quantity such as mass density or surface irradiance presupposes a
scale-space setting due to the inherent graininess of nature on the
small scale and its capricious articulation on the large scale. What
is the "right scale" does indeed depend on the problem, _i.e.},
whether one needs to see the forest, the trees or the leaves. (Of
course this list could be extended indefinitely towards the
microscopic as well as the the mesoscopic domains, as has been done in
the popular film "Powers of Ten" (Morrison and Morrison 1984)).  The
physicist almost invariably manages to pick the right scale for the
problem at hand _intuitively_. However, in many modern applications
the "right scale" need not be obvious at all, and one really needs a
principled mathematical analysis of the scale problem.

In applications such as _vision_ the front end system has to process
the radiance function blindly (since no meaning resides in the photons
as such) and the problem of finding the right scale becomes especially
acute.  This is true for biological and artificial vision systems
alike. Here a principled theory is mandatory and can _a priori_ be
expected to yield important insights and lead to mechanistic models.
The modern scale-space theory has indeed led to an increased
understanding of the low level operations and novel handles on ways to
design algorithms for problems in machine vision.

In this book the author presents a commendably lucid outline of the
theory of scale-space, the structure of low level operations in a
scale-space setting and algorithmic schemes to use these structures
such as to solve important problems in computer vision. The subjects
range from a mathematical underpinning, over issues in implementation
(discrete scale-space structures) to more open ended algorithmic
methods for computer vision problems. The latter methods seem to me to
point a way to a range of potentially very important applications.
This approach will certainly turn out to be part of the foundations of
the theory and practice of machine vision.

It was about time for somebody to write a monograph on the subject of
scale-space structure and scale-space based methods, and the author
has no doubt performed an excellent service to many in the field of
both artificial and biological vision.

Utrecht, October 4th, 1993
Jan Koenderink


PREFACE

We perceive objects in the world as having structures both at coarse
and fine scales.  A tree, for instance, may appear as having a roughly
round or cylindrical shape when seen from a distance, even though it
is built up from a large number of branches.  At a closer look,
individual leaves become visible, and we can observe that the leaves
in turn have texture at an even finer scale.

This fact that objects in the world appear in different ways depending
upon the scale of observation has important implications when
analysing measured data, such as images, with automatic methods.  A
straightforward way of exemplifying this is to note that every
operation on image data must be carried out on a window, whose size
can range from a single point to the whole image.  The type of
information we can get from such an operation is largely determined by
the relation between structures in the image and the size of the
window.  Hence, without prior knowledge about what we are looking for,
there is no reason to favour any particular scale.  We should
therefore try them all and operate at all window sizes.

These insights are not completely new in computer vision.  Multi-scale
representations of images in terms of pyramids were developed already
around 1970.  A main motivation then was to achieve computational
efficiency by coarse-to-fine strategies.  This approach was also
supported by findings in neurophysiology about the primate visual
system.  However, it was soon discovered that relating structures from
different levels in the multi-scale representation was far from
trivial.  Structures at coarse levels could sometimes not be assigned
any direct interpretation, since they were hard to trace to finer
scales.  Despite considerable efforts to develop techniques for
matching between scales, a theoretical foundation was missing.

In 1983, Witkin proposed that scale could be considered as a
continuous parameter, thereby generalizing the existing notion of
Gaussian pyramids.  He noted the relation to the diffusion equation
and hence found a well-founded way of relating image structures
between different scales.  Koenderink soon furthered the approach,
which has been developed into what we now know as scale-space theory.

Since that work, we have seen the theory develop in many ways, and
also realized that it provides a framework for early visual
computations of a more general nature.  The aim of this book is to
provide a coherent overview of this recently developed theory, and to
make material, which has earlier existed only in terms of research
papers, available to a larger audience.  The presentation provides an
introduction into the general foundations of the theory and shows how
it applies to essential problems in computer vision such as
computation of image features and cues to surface shape.  The subjects
range from the mathematical foundation to practical computational
techniques.  The power of the methodology is illustrated by a rich set
of examples.

I hope that this work can serve as a useful introduction, reference,
and inspiration for fellow researchers in computer vision and related
fields such as image processing, signal processing in general,
photogrammetry, and medical image analysis.  Whereas the book is
mainly written in the form of a research monograph, the level of
presentation has been adapted so that it can be used as a basis for
advanced courses in these fields.

The presentation is organized in a logical bottom-up way, following
the ordering of the processing modules in an imagined vision system.
It is, however, not necessary to read the book in such a sequential
manner.  Several of the chapters are relatively self-contained, and it
should be possible to read them independently.  A guide to the reader
describing the mutual dependencies is given in section 1.7 (page 22).  
I wish the reader a pleasant tour into this highly stimulating and 
challenging subject.

Stockholm, September 1993,
Tony Lindeberg


ABSTRACT

The presentation starts with a philosophical discussion about computer
vision in general.  The aim is to put the scope of the book into its
wider context, and to emphasize why the notion of _scale_ is crucial
when dealing with measured signals, such as image data.  An overview
of different approaches to multi-scale representation is presented,
and a number special properties of scale-space are pointed out.

Then, it is shown how a mathematical theory can be formulated for
describing image structures at different scales.  By starting from a
set of axioms imposed on the first stages of processing, it is
possible to derive a set of canonical operators, which turn out to be
derivatives of Gaussian kernels at different scales.

The problem of applying this theory computationally is extensively
treated.  A _scale-space theory_ is formulated for _discrete signals_,
and it demonstrated how this representation can be used as a _basis_
for expressing a large number of _visual operations}.  Examples are
smoothed derivatives in general, as well as different types of
detectors for image features, such as edges, blobs, and junctions.  In
fact, the resulting scheme for feature detection induced by the
presented theory is very simple, both conceptually and in terms of
practical implementations.

Typically, an object contains structures at many different scales, but
locally it is not unusual that some of these "stand out" and seem to
be more significant than others.  A problem that we give special
attention to concerns how to find such locally stable scales, or
rather how to generate hypotheses about interesting structures for
further processing.  It is shown how the scale-space theory, based on
a representation called the _scale-space primal sketch_, allows us to
extract _regions of interest_ from an image without prior information
about what the image can be expected to contain.  Such regions,
combined with knowledge about the scales at which they occur
constitute _qualitative information_, which can be used for {\em
guiding and simplifying_ other low-level processes.

Experiments on different types of real and synthetic images
demonstrate how the suggested approach can be used for different
visual tasks, such as image segmentation, edge detection, junction
detection, and focus-of-attention.  This work is complemented by a
mathematical treatment showing how the behaviour of different types of
image structures in scale-space can be analysed theoretically.

It is also demonstrated how the suggested scale-space framework can be
used for computing direct cues to _three-dimensional surface
structure_, using in principle only the same types of _visual
front-end_ operations that underlie the computation of image features.

Although the treatment is concerned with the analysis of visual data,
the general notion of scale-space representation is of much wider
generality and arises in several contexts where measured data are to
be analyzed and interpreted automatically.


-------------------------------------ORDER FORM------------------------------

Ref:  ftpser

Please send me: 
Scale-Space Theory in Computer Vision, by Tony Lindeberg
_____copy(ies) HB, ISBN 0-7923-9418-6,  Dfl 275.00  $ 130.00, GBP 97.50

  Payment enclosed to the amount of ___________________________

* Please invoice me 

* Please charge my credit card 

  Name of Card Holder: ______________________________________  

  Card. no.: ________________________________________________

  Expiry Date:______________________________________________

     Am. Ex.*          Visa*           Diners Club*           Mastercard*

Delivery address: 

Name: ___________________________________________________________________

Address: ________________________________________________________________

         ________________________________________________________________

         ________________________________________________________________

         ________________________________________________________________


Date:________________     Signature:_______________________________

To be sent to:


Outside North America                         In USA and Canada

KLUWER ACADEMIC PUBLISHERS GROUP              KLUWER ACADEMIC PUBLISHERS 
Order Dept.                                   Order Dept
P.O. Box 322                                  101 Philip Drive
3300 AH Dordrecht, The Netherlands            Norwell, 02016 MA
Tel: +31-78-524400                            Tel: 617-871-6600
Fax +31-78-524474.                            Fax: 617-871-6528
email:  vanderlinden@wkap.nl                  email: kluwer@world.std.com

Orders from individuals accompanied by payment or authorization to
charge a credit card account will ensure prompt delivery. Postage and
handling charges will be absorbed by the Publisher on all such orders.
Payment will be accepted in any convertible currency. Please check the
rate of exchange at your bank. For sales within the Netherlands please
add 6% VAT (BTW). Prices are subject to change without notice.

* Delete those that do not apply.