Tuesday 4 November 2014

Intuition Behind the Derivative Filters for Edges, Corners and Blobs

This semester I'm TAing the graduate computer vision course CSC2503 at UofT. Last week, when giving a tutorial on SIFT, I had an interesting question from a student, which I could not answer on the spot, and required some thoughts to find a satisfactory answer afterwards. 

The context of the question was following: to introduce the idea of "interest point" that is distinctive locally, I asked the students what criteria should one look at. One student suggested to look for strong edges, so I told them that edge response doesn't localize well along direction of edges, but instead one should look for corners or blobs. Later on, when I talked about scaled normalized Laplacian of Gaussian, and difference of Gaussian, a student asked: given how we motivated interest points, we should be looking for points with large first order derivative magnitude in both direction, but Laplacian is second derivative, so how do they relate? 

One way to answer would be to describe how Laplacian of Gaussian finds blobs, and how blobs are well localized in location and scale. However, the student clearly wanted some intuitive answers about relation and difference to edge detection.  

I think a better way would be the following answer: 

First order derivatives can reveal edges; but to detect corners, one needs to consider how first derivatives change (why not magnitude of first derivative? because not robust to contrast variation). This can be accomplished by looking at the second moment (matrix) of first derivatives in the vicinity (Harris corner detector) of the point, or by looking at the second derivatives (Laplacian of Gaussian).

That being said, while corners localize well in location, they do not localize well in scale, unlike blobs.  It turns out that Laplacian of Gaussian has maximal response for blobs of the correct scale, which are regions where gradients change a lot. So LoG (and approximation by DoG) is really a blob detector than a corner detector, although in practice, it will also find some corners.  

A slightly tangential but easy to confuse issue is with the use of second derivative in edge detection. Because using a threshold on the norm of first derivative is not robust to contrast variation, and also does not localize well, a better way is to look at the zero-crossing of first derivative, which are extrema of first derivatives. This is different (almost opposite in a sense) from the extrema of Laplacian that we seek with SIFT. From the perspective of image processing, DOG zero crossings typically form curves on the image plane and thus individual zero-crossing points are not localizable.


Another intuitive explanation from first principles (due to the course instructor):

What kinds of defs yield isolated keypoints? You could look at extrema of intensity, but they are clearly not invariant to illumination, etc. You could look at extrema of 1st derivatives, but they correspond to zero-crossings of 2nd derivatives---and these are localizable only when zero-crossing curves intersect, which they almost never do.

So DoG extrema have the advantage of (a) corresponding to neighbourhoods where image intensities vary, (b) are invariant to absolute intensity, (c) can be localized their neighbourhoods, and (d) there is a reasonably dense set of them in typical images.



Monday 12 August 2013

Notes on the theory behind Mapper

Mapper [1] is a topological data analysis procedure that infers a simplicial complex of a topological space from a point cloud of the space. The simplicial complex can reveal various topological or geometric property of $X$, such as number of connected components, number of holes, and flares, either by visualization or by computational algebraic tools. In this note, I will describe my understanding of the theory behind the method.

A typical output of Mapper on a point cloud is a graph like the plot below. The procedure is statistical by nature, but it tries to mimic a rigorous topological construction described in [1] and [2]. I will first outline the topological construction, then describe the statistical one that is Mapper.
A 1D simplicial complex returned by Mapper: each node represent a region that is estimated to be homotopy equivalent to a point (contractible); the number of observed data points in that region is represented by the number on the node and size of the node; and the colour of the node correspond to the average value of function $f$ over points in that region ($f$ is explained below).


Ideas behind the topological construction


Given a finite open covering of $X$ (a finite collection of open subsets of $X$) whose union is the whole space $X$), the index set of the open covering defines a simplicial complex, also called the nerve of the covering. So the key is to come up with a finite open covering of $X$, but this might be hard if $X$ is very exotic. Instead, if we have a continuous function $f:X \rightarrow Z$, where $Z$ is a more familiar space such as $\mathbb{R}^n$, then an open covering $\mathit{U} = \{\mathit{U}_\alpha\}$ of $Z$ yields an open covering of $X$. This is because each $f^{-1}(\mathit{U}_\alpha)$ is open, and every point in $X$ is mapped to some point in $Z$, hence contained in some $f^{-1}(\mathit{U}_\alpha)$. Each $f^{-1}(\mathit{U}_\alpha)$ might be made of multiple path connected components, so this open covering of X, consists of all the path connected components of all the $f^{-1}(\mathit{U}_\alpha)$. Two important properties of this construction:

  • The dimensionality of $Z$ determines the dimensionality of simplicial complex that can be recovered. So if $Z$ is $\mathbb{R}$, then the highest dimension of simplices is 1, which is the edges in a graph like in the figure above. The resulting graph is also a generalization of the Reeb graph which tracks changes in topology of level sets of $f$.
  • The construction can reveal structure at multiple resolution, in  the sense that if there are two covering $\{\mathit{U}_\alpha\}$ and  $\{\mathit{V}_\beta\}$ of $Z$ at different resolution such that any $\mathit{U}_\alpha$ is contained in some $\mathit{V}_\beta$. Then the resulting pre-image coverings of $X$ are also at different resolution because $f^{-1}(\mathit{U}_\alpha) \in f^{-1}(\mathit{V}_\beta)$.

The statistical version

Now given a point cloud $\{x_i\}$ from $X$, to mimic the above topological construction, Mapper first needs a continuous function $f$. The choice of this function dictates what kind of structure can be recovered, so it should reflect the intended purpose of study. For example, if we are interested in the intrinsic geometric structure of X, then we might take $f$ to be some sort of distance to some reference point; or if we are interested in the probabilistic structure over X that generates the point cloud, then we might take $f$ to be some kernel density estimate; finally, if what we have is a regression dataset, so that for each $x_i$, there is corresponding $y_i$, and we know that the underlying regression function is continuous, then we do not need to choose some other $f$.

Having chosen $f$, to build the covering of $X$ by pre-image, we first cover the range of $f$ using overlapping open sets. If the target space of $f$ is Euclidean, then we can just use overlapping interval/square/cube/hyper-cube as bins. In the topological construction, we now would have connected components in the pre-images of these bins, but with point clouds, all we have are subsets of the points that are mapped to the same bin. The main idea in the statistical version is to use a clustering algorithm to group the points in the pre-image that are more likely to be in the same connected components. Mapper uses single-linkage clustering which does not require predetermining the number of clusters. After clustering, Mapper omits all the structure inside each connected component, and assumes that it is homotopy equivalent to a point (contractible), and represents it as a single node in the resulting graph (as in the figure above). Finally, if two connected components from pre-image of different bins share at least one point (can happen because bins overlap), then their corresponding contraction node is linked by an edge.

Furthermore, by controlling the size of bins and how much they overlap, we control the resolution of coverings as previously discussed in the topological construction. This allows recovering structures at multiple resolution from the point cloud.

The Mapper procedure can work with higher dimensional output $f$, which would result in higher dimensional simplicial complexes, but they are harder to visualize. Using a scalar-valued function, typical resulting graphs look like the figure above. Function values of $f$ (assuming space is $\mathbb{R}$) are shown as color of nodes, and number of points in the connected component is reflected by the size of the node and the number on it.

One thing which we have omitted in discussion is the affinity measure in the clustering algorithm used to partition pre-images into connected components. Different affinity measures reflect different notion of "neighbourhood'', so would result in different partitioning of the pre-images. Fortunately, if the affinity measures are similar, Mapper is robust with respect to the choice of this affinity. In particular, if the affinity measure is a metric, and $X$ is finite dimensional, then by equivalence of norm-induced topologies in finite dimensional normed spaces, we should recover the same structure if the point cloud is sampled dense enough or if the covering resolution is coarse enough. I have done some preliminary experiments suggesting that Mapper indeed has this property. I'll blog about it once I have some more comprehensive results. 

Reference:
[1] Singh, Gurjeet, Facundo Mémoli, and Gunnar E. Carlsson. "Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition." In SPBG, pp. 91-100. 2007.

[2]  Gunnar Carlsson. "Topology and data." Technical report, 2008.

[3] Lum, P. Y., G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, and G. Carlsson. "Extracting insights from the shape of complex data using topology." Scientific reports 3 (2013).