Jafar: KLT theory

Where to find information

Basics of the method can be found in the homepage of the documentation of the KLT lib : http://www.ces.clemson.edu/~stb/klt/.

A comparison between the KLT corner detector and Harris corner detector can also be found in this course of the University of Western Australia : http://undergraduate.csse.uwa.edu.au/units/CITS4240/Lectures/tracking.pdf.

An explanation of the multi-resolution pyramid of KLT can be found here : http://ilab.cs.ucsb.edu/publications/KolschBook05.pdf.

How it works

In brief, KLT select features representing corners using a method close to the Harris detector, and then track them maximizing the correlation with a Newton-Raphson technique, which operates with a multiresolution pyramid. Finally a consistency check of features using an affine mapping can be done.

Features extraction

Tomasi has proposed a feature selection based on corner detection, as Harris detector, but which is according to the author "optimal by construction because it is based on how the tracker works".

Corner detection, as in Harris detector, is based on a local structure matrix, but in a large area ( $(2d+1)\times(2d+1)$ neighbourhood $\mathcal R$ , providing a smoothing) :

$C_{\textrm{KLT}}(x,y)=\left[ \begin{array}{cc} \sum \sum_{\mathcal R} \left(\frac{\partial I}{\partial x} \right)^2 & \sum \sum_{\mathcal R} \frac{\partial I}{\partial x} \frac{\partial I}{\partial y} \\ \sum \sum_{\mathcal R} \frac{\partial I}{\partial y} \frac{\partial I}{\partial x} & \sum \sum_{\mathcal R} \left(\frac{\partial I}{\partial y} \right)^2 \end{array} \right]$

Let's now consider the two eigenvalues $\lambda_1$ and $\lambda_2$ of the matrix C. As C is symmetric and positive semi-definite, both $\lambda_1$ and $\lambda_2$ are non-negative, and at the location of a corner we have : $\lambda_1\geq\lambda_2>0$ , where both eigenvalues are large.

The KLT algorithm compares the smaller eigenvalue $\lambda_2$ to a threshold value $\lambda_{\textrm{min}}$ and if greater saves $(x,y)$ in a potential corner list $L$ .

Then it sorts $L$ in decreasing order of $\lambda_2$ , and scan the sorted list from top to bottom, selecting points in the list in sequence and removing points that fall inside the neighbourhood $\mathcal R$ of any selected points (in order to have neighbourhood which do not overlap, because those which overlap are probably due to the same corner), until having the required count of features.

Tracking

Basically, the tracking is done by an algorithm using Newton-Raphson on image correlation and which works under affine image transformations.

There are two models for image motion : translation - the simplest -, and affine motion. KLT uses translation for frame-to-frame transformations, but uses affine motion for comparing the pattern to the first image, to verify it has not too much moved away.

Birchfield uses a definition for the dissimilarity between two windows, one in image I and one in image J, which presents the particularity to be symmetric :

$\epsilon = \int\int_W \left[ J\left(x+\frac d2\right) - I\left(x-\frac d2\right) \right]^2 w(x)$

where $x=[x,y]^T$ , the displacement $d=[d_x,d_y]^T$ , and the weighting function $w(x)$ usually set to the constant 1 (more information can be found at http://www.ces.clemson.edu/~stb/klt/birchfield-klt-derivation.pdf).

Then we can expand it in Taylor series truncated to the linear term, what leads to a matrix equation solved by Newton-Raphson.

Technical details about the additional layer

Here are some details about the implementation of the additional layer :

when the processing area has a part outside the image, the area outside is considered black. It permits to go near the edge of the image (nearer than normal borders) without having to modify borders.
3 heuristics for eliminating landscape pixels are used. The first is based on displacements along x and y axis of all features since the previous frame. The mean and the standard deviation is calculated, and features which are too different are suppressed. The second heuristic is based on norms of these displacements, which give slightly different and complementary results, and the last one is based on the distance of features to the barycenter, in order to suppress features which are obviously too far.
the size of the object is represented by its "scope" which is the max distance from one feature to the barycenter, and the replacement is done in an area centered on the new barycenter of tracked features, but with the scope in the previous frame. Moreover object's scope can only increase or decrease by 10% at each frame.
a feature table normally records the history of features positions. It has set to a fix size of 96 and features are recorded modulo 96, so that it is possible to let the algorithm running, and the 96 last features positions are recorded.
the algorithm uses by default the sequential mode of the klt lib, which prevent calculating two times pyramids and gradients (so speeds up the algorithm), but also which normally prevent from changing the size of the processed area. But code has beed added to automatically detect changes of size of the processed area, and recalculate all what is necessary. So you can change the size of the processed area, but for each changement the computing time will be approximatively two times longer. But this possibility has not been fully tested because it is not used for the moment (the confidence area can only change if dt changes, but if the processed area size doesn't change the algorithm will be in constant time so that dt is constant).

Table of contents

Where to find information

How it works

Features extraction

Tracking

Technical details about the additional layer