SCALE-SPACE THEORY IN COMPUTER VISION Tony Lindeberg Royal Institute of Technology Stockholm, Sweden SHORT DESCRIPTION We perceive objects in the world as having structures both at coarse and fine scales. A tree, for instance, may appear as having a roughly round or cylindrical shape when seen from a distance, even though it is built up from a large number of branches. At a closer look, individual leaves become visible, and we can observe that they in turn have texture at an even finer scale. This fact that objects in the world appear in different ways depending upon the scale of observation has important implications when analysing measured data, such as images, with automatic methods. "Scale-Space Theory in Computer Vision" describes a formal framework, called _scale-space representation_, for handling the notion of scale in image data. It gives an introduction to the general foundations of the theory and shows how it applies to essential problems in computer vision such as computation of image features and cues to surface shape. The subjects range from the mathematical underpinning to practical computational techniques. The power of the methodology is illustrated by a rich set of examples. "This approach will certainly turn out to be part of the foundations of the theory and practice of machine vision ... the author has no doubt performed an excellent service to many in the field of both artificial and biological vision." Jan Koenderink SCALE-SPACE THEORY IN COMPUTER VISION Tony Lindeberg Royal Institute of Technology Stockholm, Sweden FOREWORD The problem of _scale_ pervades both the natural sciences and the visual arts. The earliest scientific discussions concentrate on visual perception (much like today!) and occur in Euclid's (c. 300 B.C.) "Optics" and Lucretius' (c. 100--55 B.C.) "On the Nature of the Universe". A very clear account in the spirit of modern "scale-space theory" is presented by Boscovitz (in 1758), with wide ranging applications to mathematics, physics and geography. Early applications occur in the cartographic problem of "generalization", the central idea being that a _map_ in order to be useful has to be a "generalized" (coarse grained) representation of the actual terrain (Miller and Voskuil 1964). Broadening the scope asks for progressive summarizing. Very much the same problem occurs in the (realistic) artistic rendering of scenes. Artistic generalization has been analyzed in surprising detail by John Ruskin (in his "Modern Painters", who even describes some of the more intricate generic "scale-space singularities" in detail: Where the ancients considered only the merging of blobs under blurring, Ruskin discusses the case where a blob splits off another one when the resolution is decreased, a case that has given rise to confusion even in the modern literature. It is indeed clear that _any_ physical observation of some extended quantity such as mass density or surface irradiance presupposes a scale-space setting due to the inherent graininess of nature on the small scale and its capricious articulation on the large scale. What is the "right scale" does indeed depend on the problem, _i.e.}, whether one needs to see the forest, the trees or the leaves. (Of course this list could be extended indefinitely towards the microscopic as well as the the mesoscopic domains, as has been done in the popular film "Powers of Ten" (Morrison and Morrison 1984)). The physicist almost invariably manages to pick the right scale for the problem at hand _intuitively_. However, in many modern applications the "right scale" need not be obvious at all, and one really needs a principled mathematical analysis of the scale problem. In applications such as _vision_ the front end system has to process the radiance function blindly (since no meaning resides in the photons as such) and the problem of finding the right scale becomes especially acute. This is true for biological and artificial vision systems alike. Here a principled theory is mandatory and can _a priori_ be expected to yield important insights and lead to mechanistic models. The modern scale-space theory has indeed led to an increased understanding of the low level operations and novel handles on ways to design algorithms for problems in machine vision. In this book the author presents a commendably lucid outline of the theory of scale-space, the structure of low level operations in a scale-space setting and algorithmic schemes to use these structures such as to solve important problems in computer vision. The subjects range from a mathematical underpinning, over issues in implementation (discrete scale-space structures) to more open ended algorithmic methods for computer vision problems. The latter methods seem to me to point a way to a range of potentially very important applications. This approach will certainly turn out to be part of the foundations of the theory and practice of machine vision. It was about time for somebody to write a monograph on the subject of scale-space structure and scale-space based methods, and the author has no doubt performed an excellent service to many in the field of both artificial and biological vision. Utrecht, October 4th, 1993 Jan Koenderink PREFACE We perceive objects in the world as having structures both at coarse and fine scales. A tree, for instance, may appear as having a roughly round or cylindrical shape when seen from a distance, even though it is built up from a large number of branches. At a closer look, individual leaves become visible, and we can observe that the leaves in turn have texture at an even finer scale. This fact that objects in the world appear in different ways depending upon the scale of observation has important implications when analysing measured data, such as images, with automatic methods. A straightforward way of exemplifying this is to note that every operation on image data must be carried out on a window, whose size can range from a single point to the whole image. The type of information we can get from such an operation is largely determined by the relation between structures in the image and the size of the window. Hence, without prior knowledge about what we are looking for, there is no reason to favour any particular scale. We should therefore try them all and operate at all window sizes. These insights are not completely new in computer vision. Multi-scale representations of images in terms of pyramids were developed already around 1970. A main motivation then was to achieve computational efficiency by coarse-to-fine strategies. This approach was also supported by findings in neurophysiology about the primate visual system. However, it was soon discovered that relating structures from different levels in the multi-scale representation was far from trivial. Structures at coarse levels could sometimes not be assigned any direct interpretation, since they were hard to trace to finer scales. Despite considerable efforts to develop techniques for matching between scales, a theoretical foundation was missing. In 1983, Witkin proposed that scale could be considered as a continuous parameter, thereby generalizing the existing notion of Gaussian pyramids. He noted the relation to the diffusion equation and hence found a well-founded way of relating image structures between different scales. Koenderink soon furthered the approach, which has been developed into what we now know as scale-space theory. Since that work, we have seen the theory develop in many ways, and also realized that it provides a framework for early visual computations of a more general nature. The aim of this book is to provide a coherent overview of this recently developed theory, and to make material, which has earlier existed only in terms of research papers, available to a larger audience. The presentation provides an introduction into the general foundations of the theory and shows how it applies to essential problems in computer vision such as computation of image features and cues to surface shape. The subjects range from the mathematical foundation to practical computational techniques. The power of the methodology is illustrated by a rich set of examples. I hope that this work can serve as a useful introduction, reference, and inspiration for fellow researchers in computer vision and related fields such as image processing, signal processing in general, photogrammetry, and medical image analysis. Whereas the book is mainly written in the form of a research monograph, the level of presentation has been adapted so that it can be used as a basis for advanced courses in these fields. The presentation is organized in a logical bottom-up way, following the ordering of the processing modules in an imagined vision system. It is, however, not necessary to read the book in such a sequential manner. Several of the chapters are relatively self-contained, and it should be possible to read them independently. A guide to the reader describing the mutual dependencies is given in section 1.7 (page 22). I wish the reader a pleasant tour into this highly stimulating and challenging subject. Stockholm, September 1993, Tony Lindeberg ABSTRACT The presentation starts with a philosophical discussion about computer vision in general. The aim is to put the scope of the book into its wider context, and to emphasize why the notion of _scale_ is crucial when dealing with measured signals, such as image data. An overview of different approaches to multi-scale representation is presented, and a number special properties of scale-space are pointed out. Then, it is shown how a mathematical theory can be formulated for describing image structures at different scales. By starting from a set of axioms imposed on the first stages of processing, it is possible to derive a set of canonical operators, which turn out to be derivatives of Gaussian kernels at different scales. The problem of applying this theory computationally is extensively treated. A _scale-space theory_ is formulated for _discrete signals_, and it demonstrated how this representation can be used as a _basis_ for expressing a large number of _visual operations}. Examples are smoothed derivatives in general, as well as different types of detectors for image features, such as edges, blobs, and junctions. In fact, the resulting scheme for feature detection induced by the presented theory is very simple, both conceptually and in terms of practical implementations. Typically, an object contains structures at many different scales, but locally it is not unusual that some of these "stand out" and seem to be more significant than others. A problem that we give special attention to concerns how to find such locally stable scales, or rather how to generate hypotheses about interesting structures for further processing. It is shown how the scale-space theory, based on a representation called the _scale-space primal sketch_, allows us to extract _regions of interest_ from an image without prior information about what the image can be expected to contain. Such regions, combined with knowledge about the scales at which they occur constitute _qualitative information_, which can be used for {\em guiding and simplifying_ other low-level processes. Experiments on different types of real and synthetic images demonstrate how the suggested approach can be used for different visual tasks, such as image segmentation, edge detection, junction detection, and focus-of-attention. This work is complemented by a mathematical treatment showing how the behaviour of different types of image structures in scale-space can be analysed theoretically. It is also demonstrated how the suggested scale-space framework can be used for computing direct cues to _three-dimensional surface structure_, using in principle only the same types of _visual front-end_ operations that underlie the computation of image features. Although the treatment is concerned with the analysis of visual data, the general notion of scale-space representation is of much wider generality and arises in several contexts where measured data are to be analyzed and interpreted automatically. -------------------------------------ORDER FORM------------------------------ Ref: ftpser Please send me: Scale-Space Theory in Computer Vision, by Tony Lindeberg _____copy(ies) HB, ISBN 0-7923-9418-6, Dfl 275.00 $ 130.00, GBP 97.50 Payment enclosed to the amount of ___________________________ * Please invoice me * Please charge my credit card Name of Card Holder: ______________________________________ Card. no.: ________________________________________________ Expiry Date:______________________________________________ Am. Ex.* Visa* Diners Club* Mastercard* Delivery address: Name: ___________________________________________________________________ Address: ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ Date:________________ Signature:_______________________________ To be sent to: Outside North America In USA and Canada KLUWER ACADEMIC PUBLISHERS GROUP KLUWER ACADEMIC PUBLISHERS Order Dept. Order Dept P.O. Box 322 101 Philip Drive 3300 AH Dordrecht, The Netherlands Norwell, 02016 MA Tel: +31-78-524400 Tel: 617-871-6600 Fax +31-78-524474. Fax: 617-871-6528 email: vanderlinden@wkap.nl email: kluwer@world.std.com Orders from individuals accompanied by payment or authorization to charge a credit card account will ensure prompt delivery. Postage and handling charges will be absorbed by the Publisher on all such orders. Payment will be accepted in any convertible currency. Please check the rate of exchange at your bank. For sales within the Netherlands please add 6% VAT (BTW). Prices are subject to change without notice. * Delete those that do not apply.