Home Papers Reports Projects Code Fragments Dissertations Presentations Posters Proposals Lectures given Course notes

Correlation between the inproduct and the sum of absolute differences is -0.8485 for uniform sampled signals on [-1:1]

Werner Van Belle1* - werner@yellowcouch.org, werner.van.belle@gmail.com

1- Yellowcouch;

Abstract :  While investigating relations between various comparators and the inproduct we found that the inproduct correlates strongly towards the absolute difference when the domain from which the values are taken come from a uniform distribution on [-1:1]. This useful relation might help to speed up block comparison in content databases and can be used as a valuable tool to estimate the inproduct based on the absolute difference and vice versa

Keywords:  sum of absolute differences, sum of inproduct, landmark tracking, feature tracking
Reference:  Werner Van Belle; Correlation between the inproduct and the sum of absolute differences is -0.8485 for uniform sampled signals on [-1:1]; Signal Processing; YellowCouch Scientific; November 2006

1 Introduction

To compare two data-blocks (of size $ n$ ), two easy implementable techniques are widely used. The first is the summed inproduct (SIP), defined as $ \sum_{i=0}^{n-1}x_{i}y_{i}$ . The second one, termed the sum of absolute differences (SAD), is defined as $ \sum_{i=0}^{n-1}\vert x_{i}-y_{i}\vert$ . Depending on the application behind the comparison one might be favored over the other.

The absolute difference is a well known operator for block comparison within images and is one of the basic blocks used in MPEG coding [1]. It is often used for motion estimation by means of landmark detection [2, 3, 4, 5] and many different hardware implementations of the absolute difference exist [6, 7, 8, 9]. In general statistics, the absolute difference is also often used as a robust measure [10], with specialized application such as model evaluation of protein 3D structures [11].

In audio applications, the absolute difference could also be used, but for historical reasons, the standard inproduct seems to be favored, mainly because multiplication is a basic operation in digital filters [12]. The inproduct, and specifically convolution, has an interesting application in fuzzy pattern matching. There it is often used to find sub-patterns in a haystack of information. Convolution will then assess the correlation between every position and the pattern-to-be-found [13, 14, 15]. Since convolution itself can be efficiently implemented through a Fourier transform (or a sliding sliding window Fourier transform when the datasets are large), pattern matching becomes a faster operation than finding similar fragment by means of the absolute difference. To go somewhat into the details, we denote the Fourier transform as $ \mathcal{F}$ . On a block of size $ n$ it takes time complexity $ \mathcal{O}(n.$log$ _{2}n)$ [16]. The inproduct of two signals expressed in their frequency domain is the same as convolution of their information in the time domain [12]. As such, the function $ \mathcal{F}^{-1}(\mathcal{F}(A)\times\mathcal{F}(B)^{*})$ , in which $ \times$ denotes the inproduct and $ *$ denotes conjugation, will return a vector of which element $ j$ describes the correlation between signal $ A$ and signal $ B$ shifted $ j$ positions. This operation itself has time complexity $ \mathcal{O}\left(3n.\mbox{log}_{2}n+n\right)$ , which is much faster than the same process using an iteration of absolute differences, which would yield a time complexity of $ n^{2}$ .

As such, the two operations both have their advantages and disadvantages. The absolute difference can be quickly calculated, but it is difficult to optimize for many comparisons. The inproduct on the other hand is slower on individual comparisons, but has very efficient implementations for large volumes.

Aside from these trade-offs, both metrics behave very different. The sum of absolute differences will be small for similar blocks, while the inproduct instead will be large. Some other subtitles arise as well: the absolute difference is independent of any constant added to both vectors since $ \left\vert(x+c)-(y+c)\right\vert=\left\vert x-y\right\vert$ . The inproduct does not have such a property: $ (x+c)(y+c)$ and will provide different results for any two same values: $ a.a\neq b.b$ when $ a\neq b$ . In other words the strength of the reported similarity depends on the strength of the values themselves. Actually, the inproduct itself is strictly speaking not a metric since $ d(X,X)\neq0$ for most values of $ X$ .

Notwithstanding their completely different mathematical behavior, both are often used in very similar contexts and therefore, we became interested to understand the relation between a number of well known comparators such as the summed inproduct and the sum of absolute differences. This could make it possible to predict one in function of the other.

2 Empirical Testing

To study the relations between various comparators we created $ m$ data-blocks, each containing two vectors $ X_{j}$ and $ Y_{j}$ . The variables $ X$ and $ Y$ , both of size $ n$ and element from $ \mathbb{R}^{n}$ , were randomly sampled from the same multi-variate distribution. Based on such test vectors we measured the spearman rank order and linear pearson correlation. $ x_{i}$ and $ y_{i}$ annotate the elements of $ X$ and $ Y$ , with the variables $ x$ and $ y\in[l,h]\cap\mathbb{R}$ . americanFor 8-bit signed integers, $ x$ and $ y$ ranges within [-127,128], and for 16 bit signed integers the values come from [-32767,32768]. americanWe do not take into account the discrete nature of integers and will work with real numbers taken from $ \mathbb{R}$ . americanAll $ x$ and $ y$ values were drawn from the same distribution, which could be normal or uniform distributed.

All comparators are set up as a summation of some underlying operator $ \theta:\mathbb{R}^{2}\rightarrow\mathbb{R}$, which could be one of the following 6:

$\displaystyle \theta_{\left\vert-\right\vert}(x,y)=\vert x-y\vert$ (1)
$\displaystyle \theta_{\times}(x,y)=x.y$ (2)
$\displaystyle \theta_{da}(x,y)=\vert x\vert-\vert y\vert$ (3)
$\displaystyle \theta_{-}(x,y)=x-y$ (4)
$\displaystyle \theta_{\left\vert da\right\vert}(x,y)=\left\vert\left\vert x\right\vert-\left\vert y\right\vert\right\vert$ (5)
$\displaystyle \theta_{en}(x,y)=xy+(1-x)(1-y)$ (6)

The first operator $ \theta_{\left\vert-\right\vert}$ is the absolute difference. The second operator $ \theta_{\times}$ is the multiplication. The third operator $ \theta_{-}$ will provide us with insight on the relation between the absolute difference and the basic difference. The fourth operator $ \theta_{da}$ investigates the impact of signed to unsigned conversion in a comparison process and the last operator. $ \theta_{en}$ takes the inverse of the signal and puts it back into the equation as an attempt to balance the inequality that occurs when comparing identical couples with a different value.

The multidimensional comparator $ \Theta_{\theta}$ , based on $ \theta$ is defined as

$\displaystyle \Theta_{\theta}:\mathbb{R}^{n}\times\mathbb{R}^{n}\rightarrow\mathbb{R}\qquad\Theta_{\theta}(X,Y)=\sum_{i=0}^{n-1}\theta(x_{i},y_{i})$

We tested the relations between every pair of comparators using $ m=10^{4}$ sequences each of $ n=10^{3}$ samples. The values for $ x$ and $ y$ were first drawn from a uniform distribution between $ -1$ and $ 1$ and then from a normal distribution. The random generators are ran2 and gasdev from [17]. In all cases, both the linear Pearson and spearman rank order correlations were similar. The programs to measure the correlations can be found in section appendix.

Table 1: The correlation coefficient between various block comparators, using $ 10^{4}$ vectors, each with $ 10^{3}$ values. Left) values uniformly sampled from $ [-1:1]$ . Right) values sampled from a normal distribution with mean 0 and width $ 1$ .
  uniform distribution normal distribution
$ r$ $ \Theta_{-}$ $ \Theta_{da}$ $ \Theta _{\left \vert-\right \vert}$ $ \Theta_{\left\vert da\right\vert}$ $ \Theta _{\times }$ $ \Theta_{en}$ $ \Theta_{-}$ $ \Theta_{da}$ $ \Theta _{\left \vert-\right \vert}$ $ \Theta_{\left\vert da\right\vert}$ $ \Theta _{\times }$ $ \Theta_{en}$
$ \Theta_{-}$ 1 0 0 0 0 0 1 0 0 0 0 0
$ \Theta_{da}$   1 0 0 0 0   1 0 0 0 0
$ \Theta _{\left \vert-\right \vert}$     1 -0.23 -0.83 -0.53     1 -0.53 -0.65 -0.53
$ \Theta_{\left\vert da\right\vert}$       1 0 0       1 0 0
$ \Theta _{\times }$         1 0.63         1 0.82
$ \Theta_{en}$           1           1

Table 1 presents our findings: none of the correlations are useful to predict one in function of the other, except one: when working with uniform distributed samples, the inproduct anti-correlates strongly towards to absolute difference. This should make it possible to estimate one in function of the other, however before we do so, we would like to verify whether this correlation is correct and if so, what its exact value is.

3 Correlation estimate between the inproduct and the sum of absolute differences

We now assume that both vectors $ X$ and $ Y$ are uniform distributed within range $ [l,h]$ , with $ h>l$ and $ h>0$ . $ t$ denotes the size of the interval ($ h-l$ ). The comparators $ \Theta _{\times }$ and $ \Theta _{\left \vert-\right \vert}$ , both depending on $ X$ and $ Y$ , will now be considered to be variables themselves. The correlation between them, $ r\left(\Theta_{\times},\Theta_{\left\vert-\right\vert}\right)$ , is given by

$\displaystyle \frac{\mathbb{E}[\Theta_{\times}.\Theta_{\left\vert-\right\vert}]...
...{\left\vert-\right\vert}^{2}]-\mathbb{E}[\Theta_{\left\vert-\right\vert}]^{2}}}$ (7)

Under the assumption that the various sample positions are independent from each other, one can easily show that $ r\left(\Theta_{\times},\Theta_{\left\vert-\right\vert}\right)=r\left(\theta_{\times},\theta_{\left\vert-\right\vert}\right)$ .

$\displaystyle \frac{\mathbb{E}[\theta_{\times}\theta_{\left\vert-\right\vert}]-...

This reduces the 5 estimates we need to $ \mathbb{E}[x.y]$ , $ \mathbb{E}[\vert x-y\vert]$ , $ \mathbb{E}[\left(x.y\right)^{2}]$, $ \mathbb{E}[(x-y)^{2}]$ and $ \mathbb{E}[x.y.\vert x-y\vert]$. We will now determine those 5 values.

3.1 Estimate of $ \vert x-y\vert$

Estimates of a variable $ T$ can be calculated from probability functions as follows ($ p$ is the probability distribution function of the variable $ T$)

$\displaystyle \mathbb{E}[T]=\int_{-\infty}^{+\infty}x.p(x)dx$

To calculate the probability distribution function of $ \vert x-y\vert$ with both $ x$ and $ y$ uniform distributions over $ [l,h]$ , we must first obtain the probability function of $ x-y$ , which is given by [18]:

   PDF$\displaystyle _{x-y}(u)=\begin{cases}
\frac{t-u}{t^{2}} & \mbox{for }u\in[0,t]\\
\frac{t+u}{t^{2}} & \mbox{for }u\in[-t,0]\end{cases}$

Since this probability distribution is symmetrical around $ u=0$ (the means of the two uniform distribution has been subtracted), we would get an expected value of 0, but then we would forget to take into account the absolute difference. englishThe absolute difference will flip the probability of all values below zero over the vertical axis and add them there. This will result in an increased slope to the right and a probability density of 0 for negative values [19].

Figure 1: PDF of $ \vert x-y\vert$ when $ l=-1$ and $ h=2$

PDF$\displaystyle _{\vert x-y\vert}(u)=\begin{cases}2\frac{t-u}{t^{2}} & \mbox{for }u\in[0,t]\\ 0 & \mbox{otherwise}\end{cases}$ (8)

Figure 1 plots eq.8. englishThe expectancy can now be calculated

$\displaystyle \mathbb{E}[\left\vert x-y\right\vert]=\int_{0}^{h-l}2u\frac{t-u}{t^{2}}du=\frac{t}{3}$ (9)

Testing this empirically gives the following values: $ \left[0:256\right]\rightarrow85.2197$ , $ \left[0:65536\right]\rightarrow21841$, $ \left[-128,128\right]\rightarrow85.5316$ , $ \left[-1,1\right]\rightarrow0.666846$, $ \left[-0.25,0.6\right]\rightarrow0.282795$and $ \left[-32768,32768\right]\rightarrow21851.8$. Which is consistent to the above formula.

3.2 Estimate of $ \vert x-y\vert^{2}$

Given that for any function $ g:\mathbb{R}\rightarrow\mathbb{R}$

$\displaystyle \mathbb{E}[g(x)]=\int_{-\infty}^{+\infty}g(x)f(x)dx$ (10)

we can calculate $ \mathbb{E}[\vert x-y\vert^{2}]$ based on the previous probability density function (8) as

$\displaystyle \mathbb{E}[\vert x-y\vert^{2}]=\int_{0}^{h-l}2u^{2}\frac{t-u}{t^{2}}du=\frac{t^{2}}{6}$ (11)

Testing this empirically gives the following values: $ \left[0:256\right]\rightarrow10891.4$ , $ \left[0:65536\right]\rightarrow7.17436.10^{8}$, $ \left[-128,128\right]\rightarrow10918.2$ , $ \left[-1,1\right]\rightarrow0.665828$, $ \left[-0.25,0.6\right]\rightarrow0.120234$ and $ \left[-32768,32768\right]\rightarrow7.14998.10^{8}$ . Which is close to the results as given by eq. 11.

3.3 Estimate of $ x.y$

To determine $ \mathbb{E}[x.y]$ we will calculate the probability distribution function of $ x.y$ , which might appear as taking the long road when the probability distribution of uniform products on $ [0:1]$ is already known [20]. However, we will later on needs this probability distribution on $ [l:h]$ to determine $ \mathbb{E}[\left(x.y\right)^{2}]$ and $ \mathbb{E}[x.y\vert x-y\vert]$ and we can now introduce the technique we will also use in section 3.5.2.

The probability density function of $ x.y$ is defined as the differential of the cumulative probability function, which in turn is defined as

   CDF$\displaystyle _{x.y}(z)=Pr[x.y<z]$
Figure 2: 3D plot of $ x.y$ . The left plot has a cutoff value of $ z=1$ . The middle and right plots are cut off at $ z<0.5$ and $ z<0.25$
\includegraphics[width=0.3\textwidth]{xyz1} \includegraphics[width=0.3\textwidth]{xyz2} \includegraphics[width=0.3\textwidth]{xyz3}

Figure 2 illustrates how to find the number of values (of $ x.y$ ) located below a fixed $ z$ . The surface area that is not cut off at $ z$ directly reflects the probability that a value below $ z$ can be chosen because both $ x$ and $ y$ are uniformly distributed. Since the isoline of $ z$ is located at $ y=\frac{z}{x}$ , various forms of $ \int\frac{z}{x}dx$ can be combined. $ \int\frac{z}{x}dx$ suffers from a discontinuity at 0 when $ l.h<0$ and its calculation in this particular context is further complicated with a sudden quadrant-change of the isoline when $ z$ becomes negative. In which case quadrant 1 and 3 no longer contain the isoline. Instead, quadrant 2 and 4 will contain it (see figure 3 for an illustration). To approach this problem, we divided the calculation of CDF$ _{x.y}$ into three branches. The first with $ l>0$ ; The second with $ l<0\wedge z>0$ and the last with $ l<0\wedge z<0$ .

Figure 3: 2D plot of the isoline with $ xy=z$ Left) $ z=1$ , when $ z>0$ we are interested in the area below the curve (towards the x-axis). Right) $ z=-1$ ; when $ z<0$ we are interested in the area outside the curve (farthest away from the x-axis). Only there will we find values that are smaller than $ -1$ .
\includegraphics[width=0.5\textwidth]{xy1} \includegraphics[width=0.5\textwidth]{xy2}

Aside from these standard complications, we must also limit our $ x$ and $ y$ values to the range $ [l:h]$ . This restricts the maximal surface area we can find and requires a piecewice approach. We start out with a general solutions for rectangles originating in $ \left(0,0\right)$ and ending at a specific point. We have the following 4 sections:

S_{1}:\, x,y\in[0:h,0:h] & S_{2}:\, x,y\in...
S_{3}:\, x,y\in[0:l,0:l] & S_{4}:\, x,y\in[0:h,0:l]\end{array}\end{displaymath}
Figure 4: Decomposition of the area $ [l:h,l:h]$ in 4 sections. Left) when $ l>0$ . Right) when $ l<0$ .

Every section forms the two dimensional bounds (on $ x$ and $ y$ ) for the integral $ \int\frac{z}{x}dx$ , and depending on the sign of $ l.h$ they wil be added or substracted to/from each other. Figure 4 presents this graphically. When $ l.h$ is positive we need to subtract $ S_{2}$ and $ S_{4}$ , otherwise we need to add them. $ S'_{x}$ is used to denoted the size of area $ S_{x}$ intersected with the area below the isoline. The cumulative distribution function is now defined as

CDF$\displaystyle _{x.y}=\begin{cases}S_{1}'+S'_{3}-S'_{2}-S'_{4} & \mbox{for }l.h\geq0\\ S'_{1}+S'_{2}+S'_{3}+S'_{4} & \mbox{for }l.h<0\end{cases}$ (12)
Given that area $ S_{2}$ and $ S_{4}$ will be the same, even when the isoline cuts it, ( $ \frac{z}{x}$ is symmetrical over the line $ x=y$), we also have
CDF$\displaystyle _{x.y}=\begin{cases}S_{1}'+S'_{3}-2S'_{2} & \mbox{for }l.h\geq0\\ S'_{1}+S'_{3}+2S'_{2} & \mbox{for }l.h<0\end{cases}$ (13)
To calculate the intersection between the surface area below (or above) $ \frac{z}{x}$ and one of the above sections, we will create a general solution, relying on a variable $ c$ , forming the maximum value on the x-axis and $ m$ , forming the maximum value on the y-axis. Both $ c$ and $ m$ are assumed to be positive. Later on we will replace $ c$ with either $ l$ or $ h$ depending on the situation called for. Notation-wise: $ S'_{c,m}$ is the area below the isoline limited by $ c$ (horizontally) and $ m$ (vertically).

$\displaystyle S'_{c,m}(z)=\int_{0}^{c}$min$\displaystyle \{\frac{z}{x},m\} dx$
Figure 5: Limiting the possible $ y$ value leads to a piecewise continue function.

$ S'_{c,m}$ can be determined by first obtaining the x-point where the hyperbole crosses the top line of our box, which happens at position $ x=\frac{z}{m}$. When $ z$ and $ m$ are positive, this point will always be larger than 0. When $ z>c.m$ , the intersection will happen outside $ c$ , in which case the surface area will be $ c.m$ . When $ z<c.m$ we need to integrate $ \int_{0}^{\frac{z}{m}}mdx$ and $ \int_{\frac{z}{m}}^{c}\frac{z}{x}dx$ (See figure 5 for an illustration), which gives

$\displaystyle S'_{c,m}(z)=\begin{cases}z(1+\mbox{ln}\frac{c.m}{z}) & \mbox{for }z<c.m\\ c.m & \mbox{for }z\geq c.m\end{cases}$ (14)
(14) provides the surface area of positions with a multiplied value $ x.y$ below a specific value $ z$ , when $ x\in[0:c]$ and $ y\in[0:m]$ . We now use (14) to obtain cumulative probabilities for $ l.h$ positive and negative.

3.3.1 Cumulative probability when $ l.h$ is positive

The cumulative probability function (13) requires us to calculate the different sections based on (14) with $ S'_{1}=S'_{h,h}$ ; $ S'_{2}=S'_{l,h}$ and $ S'_{3}=S'_{l,l}$ . Since (14) appears to be discontinue at position $ c.m$ we split the cumulative probability function in 4 segments.

When $ z<l^{2}$ the cumulative probability is 0 because there can be no values $ x$ and $ y$ , both larger than $ l$ that can lead to a multiple that is smaller than $ l^{2}$ .

When $ z\in[l^{2},lh]$ we have

CDF$\displaystyle _{x.y}(z)$ $\displaystyle =$ $\displaystyle z+z.$ln$\displaystyle \frac{h^{2}}{z}+l^{2}-2z-2z.$ln$\displaystyle \frac{h.l}{z}$  
  $\displaystyle =$ $\displaystyle l^{2}+z($ln$\displaystyle z-2$ln$\displaystyle l-1)$ (15)
Equation 15, as expected, evaluates to 0 when $ z=l^{2}$ . If $ z=lh$ , it evaluates to $ l(l+h.$ln$ h-$ln$ l-h)$ english. When $ z\in[lh,h^{2}]$ we have
CDF$\displaystyle _{x.y}(z)=z(1+2$ln$\displaystyle h-$ln$\displaystyle z)+l^{2}-2hl$ (16)
When $ z=lh$ (16) evaluates to $ l(l+h.$ln$ h-$ln$ l-h)$ , which is consistent with eq. 15.

When $ z>h^{2}$ we have

CDF$\displaystyle _{x.y}(z)=h^{2}+l^{2}-2hl=t^{2}$ (17)
which is indeed the surface area of the square $ [l:h,l:h]$ .
Figure 6: $ PDF_{xy}(z)$Left) with $ l=1$ and $ h=3$ . Right) with $ l=-1$ and $ h=3$

(15), (16) and (17) are the non-normalized cumulative distribution functions americanwhen $ l.h>0$. Given the maximum surface area (17), we can divide and differentiate them all to provide us with the proper probability distribution (illustrated in figure 6). We conclude that, when $ l.h>0$ :

PDF$\displaystyle _{x.y}(z)=\begin{cases}\frac{\mbox{ln}z-2\mbox{ln}l}{t^{2}} & \mb...
...mbox{ln}z}{t^{2}} & \mbox{for }z\in[lh,h^{2}]\\ 0 & \mbox{otherwise}\end{cases}$ (18)

3.3.2 Cumulative probability when $ lh$ is negative

When $ l<0\wedge h>0$ the result is somewhat more complex. Instead of subtracting $ 2S_{2}$ , we need to add them. The interesting thing is that neither $ S_{2}$ nor $ S_{4}$ will intersect with $ \frac{z}{x}$ , when $ z>0$ . However, as soon as $ z<0$ the situation is different. (depicted in fig. 3). Therefore we further branch the calculation. z>0

When $ z\geq0$ we have $ -lh$ as surface area for $ S_{2}$ and $ S_{4}$ .

$\displaystyle S'_{2,4}(z)=-l.h$   for $\displaystyle z\geq0$ (19)
The surface area for $ S_{1}$ is
$\displaystyle S'_{1}(z)=\begin{cases}z(1+2\mbox{ln}h-\mbox{ln}z) & \mbox{for }z\in[0,h²]\\ h^{2} & \mbox{for }z>h^{2}\end{cases}$ (20)
The surface area for $ S_{3}$ is
$\displaystyle S'_{3}(z)=\begin{cases}z(1+2\mbox{ln}(-l)-\mbox{ln}z) & \mbox{for }z\in[0,l²]\\ l² & \mbox{for }z>l^{2}\end{cases}$ (21)
The logarithm in (21) uses $ -l$ because $ l$ is negative and the general form in (14) requires a positive sign. This does however not change the surface area. z<0

When $ z<0$ we are interested in the outside of the function $ \frac{z}{x}$ , which is to be found in $ S_{2}$ and $ S_{4}$ . Since the area below the curve matches that as if $ l$ , $ h$ and $ z$ were positive, we get surface area $ -z(1+ln\,\frac{-lh}{-z})$ (The minus signs have been added to make $ z$ and $ l$ positive). As such, the remaining surface area must then be

$\displaystyle S'_{2,4}(z)=\begin{cases}-l.h+z(1+ln\frac{lh}{z}) & \mbox{for }z\in[lh,0]\\ -l.h & \mbox{for }z<l.h\end{cases}$ (22)
Because $ x.y$ can never be smaller than 0 in quadrants $ S_{1}$ and $ S_{3}$ , we get 0 for them.
$\displaystyle S'_{1,3}(z)=0$   for $\displaystyle z\in]-\infty,0]$ (23)
Combining (19) to (23) gives the non-normalized cumulative probability distribution:
CDF$\displaystyle _{x.y}(z)=\begin{cases}-2lh & z<l.h\\ 2(-lh+z(1+\mbox{ln}\frac{l....
...+\mbox{ln}\frac{h^{2}}{z}) & z\in[l^{2},h^{2}]\\ l²-2lh+h² & z>h^{2}\end{cases}$ (24)
Differentiating (24) gives the probability distribution function when $ l<0$ and $ h>0$ . We again divided the CDF with the total expected area, being $ t^{2}$ , so when $ l.h<0$ we have the following probability distribution function:
PDF$\displaystyle _{x.y}(z)=\begin{cases}\frac{2\mbox{ln}h+2\mbox{ln}l-2\mbox{ln}z}...
...x{ln}h-\mbox{ln}z}{t^{2}} & z\in[l^{2},h^{2}]\\ 0 & \mbox{otherwise}\end{cases}$ (25)
Figure 6 shows an example of such distribution. When working out the various cases we made the silent assumption that $ \vert l\vert<h$ . The result however does not change at all. americanEq. 25 has some overlapping cases, so the right subset needs to be chosen depending on whether $ \vert l\vert>h$ or whether $ \vert l\vert<h$ .

3.3.3 The estimate of $ x.y$

The estimate of $ x.y$ is calculated as

$\displaystyle \mathbb{E}[x.y]=\int_{-\infty}^{+\infty}z.$PDF$\displaystyle _{xy}(z)dz$ when $ l.h>0$

$\displaystyle \mathbb{E}[x.y]$ $\displaystyle =$ $\displaystyle \int_{l^{2}}^{lh}z\frac{\mbox{ln}z-2\mbox{ln}l}{(h-l)^{2}}dz$ (26)
    $\displaystyle +\int_{lh}^{h^{2}}z\frac{2\mbox{ln}h-\mbox{ln}z}{(h-l)^{2}}dz$ (27)
  $\displaystyle =$ $\displaystyle \left(\frac{h+l}{2}\right)^{2}$ (28) when $ l.h<0$

$\displaystyle \mathbb{E}[x.y]$ $\displaystyle =$ $\displaystyle \int_{lh}^{0}z\frac{2\mbox{ln}h+2\mbox{ln}l-2\mbox{ln}z}{(h-l)^{2}}dz$  
    $\displaystyle +\int_{0}^{l^{2}}z\frac{2\mbox{ln}h+2\mbox{ln}(-l)-2\mbox{ln}z}{(h-l)^{2}}dz$  
    $\displaystyle +\int_{l^{2}}^{h^{2}}z\frac{2\mbox{ln}h-\mbox{ln}z}{(h-l)^{2}}dz$  
  $\displaystyle =$ $\displaystyle -\frac{h^{2}l^{2}}{2(h-l)^{2}}+\frac{l^{4}(1+2\mbox{ln}h-2\mbox{ln}l)}{2(h-l)^{2}}$  
    $\displaystyle +\frac{h^{4}-l^{4}+4l^{4}(\mbox{ln}l-\mbox{ln}h)}{4(h-l)^{2}}$  
  $\displaystyle =$ $\displaystyle \left(\frac{h+l}{2}\right)^{2}$ (29)
Since (28)==(29) we have in the general case:
$\displaystyle \mathbb{E}[x.y]=\left(\frac{h+l}{2}\right)^{2}$ (30)

3.4 Estimate of $ \left (x.y\right )^{2}$

The probability density function of $ x.y$ is given in (18) and (25).

3.4.1 When $ lh>0$

$\displaystyle \mathbb{E}[\left(x.y\right)^{2}]$ $\displaystyle =$ $\displaystyle \int_{l^{2}}^{h^{2}}\frac{z^{2}}{t^{2}}\begin{cases}
ln\frac{z}{l^{2}} & z\in[l^{2},lh]\\
ln\frac{h²}{z} & z\in[lh,h^{2}]\end{cases}d_{z}$  
  $\displaystyle =$ $\displaystyle \int_{l^{2}}^{lh}\frac{\mbox{ln}z-2\mbox{ln}l}{t^{2}}z^{2}d_{z}$  
    $\displaystyle +\int_{lh}^{h^{2}}\frac{2\mbox{ln}h-\mbox{ln}z}{t^{2}}z^{2}d_{z}$  
  $\displaystyle =$ $\displaystyle \frac{l^{6}-2l^{3}h^{3}+h^{6}}{9(h-l)^{2}}$  
  $\displaystyle =$ $\displaystyle \left(\frac{h^{3}-l^{3}}{3t}\right)^{2}$ (31)

3.4.2 When $ lh<0$

The situation when $ lh<0$ is discussed below. We first start with the case where $ \vert l\vert<h$ . This gives us

$\displaystyle \mathbb{E}[\left(x.y\right)^{2}]$ $\displaystyle =$ $\displaystyle \int_{l.h}^{h^{2}}2\frac{z^{2}}{t^{2}}\begin{cases}
...ac{l.h}{z} & z\in[0,l^{2}]\\
ln\frac{l}{z} & z\in[l^{2},h^{2}]\end{cases}d_{z}$  
  $\displaystyle =$ $\displaystyle \int_{lh}^{0}z^{2}\frac{-2\mbox{ln}lh+2\mbox{ln}z}{t^{2}}d_{z}$  
    $\displaystyle +\int_{0}^{l^{2}}z^{2}\frac{2\mbox{ln}l-2\mbox{ln}z+2\mbox{ln}h}{t^{2}}d_{z}$  
    $\displaystyle +\int_{l^{2}}^{h^{2}}z^{2}\frac{2\mbox{ln}h-\mbox{ln}z}{t^{2}}d_{z}$  
  $\displaystyle =$ $\displaystyle \frac{-2l^{3}h^{3}+l^{6}+h^{6}}{9t^{2}}$  
  $\displaystyle =$ $\displaystyle \left(\frac{h^{3}-l^{3}}{3t}\right)^{2}$ (32)
When $ \vert l\vert>h$ we have the same result.

Since (31)==(32), we conclude that

$\displaystyle \mathbb{E}\left[(x.y)^{2}\right]=\left(\frac{h^{3}-l^{3}}{3t}\right)^{2}$ (33)
Testing this empirically gives the following values: $ \left[0,1\right]\rightarrow0.110476$ , $ \left[0:256\right]\rightarrow4.774.10^{8}$, $ \left[0:65536\right]\rightarrow2.0536.10^{18}$, $ \left[-128,128\right]\rightarrow2.9834.10^{7}$ , $ \left[-1,1\right]\rightarrow0.11124$ , $ \left[64,320\right]\rightarrow1791433841.777777778$, $ \left[-0.25,0.6\right]\rightarrow0.00830775$and $ \left[-32768,32768\right]\rightarrow1.2893.10^{17}$.

3.5 Estimate of $ x.y\left \vert x-y\right \vert$

If we define $ a$ and $ b$ to be the translation and scaling of $ x$ and $ y$ as follows

$\displaystyle a=\frac{x-l}{t}\qquad b=\frac{y-l}{t}$

then $ a$ and $ b$ are uniform distributed over $ [0:1]$ . americanThe variable $ x$ (and $ y$ ) can also be defined in function of $ a$ as

$\displaystyle x=t.a+l\qquad y=t.b+l$ (34)

This substitution will make it possible to rewrite $ \mathbb{E}[x.y\vert x-y\vert]$

    $\displaystyle \mathbb{E}[x.y\vert x-y\vert]$  
  $\displaystyle =$ $\displaystyle \mathbb{E}[(t.a+l)(t.b+l)\vert t.a-t.b\vert]$  
  $\displaystyle =$ $\displaystyle t\mathbb{E}[(t²a.b+t.a.l+t.b.l+l²)\vert a-b\vert]$  
The estimates for $ t.a.l$ and $ t.b.l$ must be the same since they both are drawn from the same distribution.
$\displaystyle \mathbb{E}[x.y\vert x-y\vert]$ $\displaystyle =$ $\displaystyle t³\mathbb{E}[a.b\vert a-b\vert]$  
    $\displaystyle +2t^{2}l\mathbb{E}[a\vert a-b\vert]$  
    $\displaystyle +tl²\mathbb{E}[\vert a-b\vert]$ (35)
The estimate for $ \vert a-b\vert$ is $ \frac{1}{3}$ , as given in (9) (with $ h$ and $ l$ equal to 1 and 0 respectively). $ \mathbb{E}\left[a.b.\left\vert a-b\right\vert\right]$ and $ \mathbb{E}\left[a\left\vert a-b\right\vert\right]$ must be determined further.

3.5.1 Estimating $ \mathbb{E}[a\vert a-b\vert]$

$ \mathbb{E}[a\vert a-b\vert]$ becomes

$\displaystyle \mathbb{E}[a\vert a-b\vert]=\mathbb{E}[a^{2}]-\mathbb{E}[ab]$   with$\displaystyle \, a>b$

$ a$ and $ b$ are uniform distributions, as such $ \mathbb{E}[a^{2}]=\mathbb{E}[b^{2}]=\frac{1}{3}$ . The estimates for $ a-b$ (under condition that $ a>b$ ) and $ b-a$ (under condition that $ b>a)$ will be the same:

$\displaystyle \mathbb{E}[a.b\vert a>b]=\int_{0}^{1}-z$ln$\displaystyle z\, dz=\frac{1}{4}$ (36)

Without the condition that $ a>b$ we need to double the estimate of (36)

$\displaystyle \mathbb{E}[a\vert a-b\vert]=2(\frac{1}{3}-\frac{1}{4})=\frac{1}{6}$

(35) becomes

$\displaystyle \mathbb{E}\left[x.y\left\vert x-y\right\vert\right]=t³\mathbb{E}[a.b.\vert a-b\vert]+\frac{t^{2}l+tl^{2}}{3}$ (37)
The remaining estimate is $ \mathbb{E}[a.b.\vert a-b\vert]$

3.5.2 Estimating $ \mathbb{E}[a.b.\vert a-b\vert]$

The factors $ a.b$ and $ \vert a-b\vert$ are not independent, therefore we cannot rely on the standard rule for uniform products [20]. In order to resolve this we divide the space of possible values in those which $ a>b$ and those with $ a<b$ . This gives, due to linearity of $ \mathbb{E}$ the following:

$\displaystyle \mathbb{E}[a.b\vert a-b\vert]=\begin{cases}\mathbb{E}[a²b]-\mathb...
... & \mbox{for }a>b\\ \mathbb{E}[ab²]-\mathbb{E}[a²b] & \mbox{for }b>a\end{cases}$ (38)
Then we look into a general solution for $ \mathbb{E}[a²b]$ with $ a>b$ by first determining the CDF of $ a^{2}b$
CDF$\displaystyle _{a^{2}b}(z)=Pr[a^{2}b<z]$ (39)
Figure 7: 3D plot of $ a^{2}b$ . Left) not clipped. Middle) cut off at 0.5, Right) cut off at 0.25.
\includegraphics[width=0.3\textwidth]{xxy3d1} \includegraphics[width=0.3\textwidth]{xxy3d2} \includegraphics[width=0.3\textwidth]{xxy3d3}

Plotting $ a^{2}b$ (figure 7) reveals the areas that we are interested in. The plots have been clipped at a specific value of $ z$ . All positions where $ a^{2}b$ was larger than $ z$ have been removed. So, the surface area of the plots that is not clipped forms the cumulative probability that a specific combination of $ a^{2}b$ will be lower than $ z$ . Of course, we are not only interested in understanding the size of the area, but we also need to take into account the extra condition that $ a>b$ .

Figure 8: The isoline of the function $ a²b$ at value $ z=0.25$ . Left) The area below the curve is the surface area of $ \left (x,y\right )$ -positions that are not cut off in figure 7c, whilst respecting $ a>b$ . Right) The area between the curves is the surface area of $ \left (x,y\right )$ -positions that are not cut off in figure 7c, whilst respecting the condition that $ b>a$ and $ b<1$ .


For a fixed value of $ z$ we can calculate the isoline as (taking into account that $ a>b$ ) min$ (\frac{z}{a^{2}},a)$ . Figure 8 presents the shape of this function. The surface area under this curve is the area where $ a^{2}b<z$ while still respecting that $ b\leq a$ . Integrating this equation provides us with the cumulative distribution function (39). To do so, we need to know where the linear 45 degree line cuts the function $ \frac{z}{a^{2}}$ .

$\displaystyle \frac{z}{a^{2}}=a\Rightarrow a=\sqrt[3]{z}$ (40)

This now allows us to write down the surface area below as a couple of integrals

CDF$\displaystyle _{a^{2}b\vert a>b}(z)$ $\displaystyle =$ $\displaystyle \int_{0}^{\sqrt[3]{z}}ada+\int_{\sqrt[3]{z}}^{1}\frac{z}{a^{2}}da$  
  $\displaystyle =$ $\displaystyle \frac{3z^{2/3}}{2}-z$  
Figure: PDF$ _{a^{2}b}(z)$ for values of $ a$ and $ b$ in the range $ [0:1]$ with Left) the extra condition that $ a>b$. Right) the extra condition that $ b>a$ .


This function has a maximum of 0.5 at 1, so we need to multiply it by two to get a proper cumulative distribution function. Differentiating it afterward gives the probability density function of $ a^{2}b$ (left as calculated; right by sapping variables $ a$ and $ b$ ). Figure 9a plots (41).

\begin{displaymath}\begin{array}{l} \mbox{PDF}_{a^{2}b\vert b<a}(z)=\frac{2}{\sq...
...ox{PDF}_{b^{2}a\vert a<b}(z)=\frac{2}{\sqrt[3]{z}}-2\end{array}\end{displaymath} (41)

Since we cannot rely on a symmetry around the 45 degree angle, we also need to calculate the surface area when $ b>a$ . The isoline is given as min$ (1,$max$ (\frac{z}{a^{2}},a))$ . Figure 8 provides more insight in the shape of this function. The function appears piecewise, with first a constant line (1) and then a hyperbolic piece up to its crossing with the 45 degree slope. Under these two segments we also have a linear piece that we need to subtract from the volume. For a fixed $ z$ we determine the crossing of 1 with $ \frac{z}{a^{2}}$ and the crossing between $ f(y)=x$ and $ \frac{z}{a^{2}}$ . The first crossing occurs at position $ a=\sqrt{z}$ . The second crossing occurs at $ \sqrt[3]{z}$ (40).

We can now calculate the surface area as

CDF$\displaystyle _{a^{2}b\vert b>a}(z)$ $\displaystyle =$ $\displaystyle \int_{0}^{\sqrt{z}}1da+\int_{\sqrt{z}}^{\sqrt[3]{z}}\frac{z}{a^{2}}da$  
    $\displaystyle -\int_{0}^{\sqrt[3]{z}}ada$ (42)
  $\displaystyle =$ $\displaystyle 4\sqrt{z}-3z^{\frac{2}{3}}$ (43)
This is still the non-normalized cumulative probability. Differentiating its normalized form (divide by $ \frac{1}{2}$ ) gives us the probability density function (left as calculated; right, after swapping variables). Figure 9b plots (44).
\begin{displaymath}\begin{array}{l} \mbox{PDF}_{a^{2}b\vert b>a}(z)=\frac{2}{\sq...
...ert a>b}(z)=\frac{2}{\sqrt{z}}-\frac{2}{\sqrt[3]{z}}\end{array}\end{displaymath} (44)

Combining (41) and (44) leads to the following estimates for $ a.b\left\vert a-b\right\vert$ (38)

$\displaystyle \mathbb{E}[ab^{2}\vert b>a]$   $\displaystyle =$  
$\displaystyle \mathbb{E}[a²b\vert a>b]$ $\displaystyle =$ $\displaystyle \int_{0}^{1}z.(\frac{2}{\sqrt[3]{z}}-2)dz$  
  $\displaystyle =$ $\displaystyle \frac{3}{15}$  
$\displaystyle \mathbb{E}[ab^{2}\vert a>b]$   $\displaystyle =$  
$\displaystyle \mathbb{E}[a²b\vert b>a]$ $\displaystyle =$ $\displaystyle \int_{0}^{1}z(\frac{2}{\sqrt{z}}-\frac{2}{\sqrt[3]{z}})dz$  
  $\displaystyle =$ $\displaystyle \frac{2}{15}$  
which gives finally

$\displaystyle \mathbb{E}[a.b\vert a-b\vert]=\left\{ \begin{array}{cc}
...] & a>b\\
\mathbb{E}[ab²]-\mathbb{E}[a²b] & b>a\end{array}\right.=\frac{1}{15}$

3.5.3 Estimating $ x.y\vert x-y\vert$

Knowing the value for $ \mathbb{E}[a.b$ $ \left\vert a-b\right\vert]$ makes it possible to finish (37) :

$\displaystyle \mathbb{E}[x.y\left\vert x-y\right\vert]$ $\displaystyle =$ $\displaystyle \frac{t^{3}}{15}+\frac{t²l+tl²}{3}$  
  $\displaystyle =$ $\displaystyle \frac{t^{3}+5t^{2}l+5tl^{2}}{15}$ (45)

Testing this empirically gives the following values: $ \left[0,1\right]\rightarrow0.06666...$ , $ \left[0:256\right]\rightarrow1'118'481$, $ \left[0:65536\right]\rightarrow1.87184.10^{13}$ , $ \left[-128,128\right]\rightarrow-276972$, $ \left[-1,1\right]\rightarrow-0.133374$ , $ \left[-0.25,0.6\right]\rightarrow-0.00163109$ and $ \left[-32768,32768\right]\rightarrow-4.70642.10^{12}$ . Which is correct with the above formula.

3.6 The Correlation Coëfficient

The correlation coefficient between $ \Theta _{\times }$ and $ \Theta _{\left \vert-\right \vert}$ (7) is given as:

$\displaystyle r(\Theta_{\times},\Theta_{\left\vert-\right\vert})=\frac{\mathbb{...
...rt x-y\vert]}{\sigma_{\theta_{\times}}\sigma_{\theta_{\left\vert-\right\vert}}}$ (46)
$\displaystyle \sigma_{\theta_{\times}}$ $\displaystyle =$ $\displaystyle \sqrt{\mathbb{E}[(x.y)^{2}]-\mathbb{E}[x.y]^{2}}$  
$\displaystyle \sigma_{\theta_{\left\vert-\right\vert}}$ $\displaystyle =$ $\displaystyle \sqrt{\mathbb{E}[\vert x-y\vert^{2}]-\mathbb{E}[\vert x-y\vert]^{2}}$  
$ \mathbb{E}[\vert x-y\vert]$ is defined in (9). $ \mathbb{E}[\vert x-y\vert^{2}]$ is defined in (11). $ \mathbb{E}[x.y]$ is given in (30). $ \mathbb{E}[(x.y)^{2}]$ is given in (33) and $ \mathbb{E}[x.y.\vert x-y\vert]$ is given in (45). So then, after the various substitutions, the correlation coefficient (46) becomes

$\displaystyle r(\Theta_{\times},\Theta_{\left\vert-\right\vert})=-\frac{3\sqrt{2}(h-l)}{5\sqrt{7h²+10hl+7l^{2}}}$
Figure 10: The absolute correlation value between $ \Theta _{\times }$ and $ \Theta _{\left \vert-\right \vert}$ . Horizontally, the $ l$ value is set out. In depth the $ h$ value. Vertically, the absolute correlation coefficient. The actual correlation values are negative.

Figure fig:correlationipadplot shows the correlation strength in function of $ l$ and $ h$ . Surprisingly the correlation for any point with $ l=0$ is $ -\frac{3\sqrt{2}}{5\sqrt{7}}=-0.32..$ . The maximal correlation is found with $ h=-l$ , which is then

$\displaystyle r(\Theta_{\times},\Theta_{\left\vert-\right\vert})=-\frac{3\sqrt{2}}{5}=-0.8485...$   for $\displaystyle h=-l$ (47)

4 Estimating $ \Theta _{\left \vert-\right \vert}$ in function of $ \Theta _{\times }$ and vice versa

Figure 11: Scatter plot of $ \Theta _{\left \vert-\right \vert}$ (horizontal) and $ \Theta _{\times }$ (vertical) when drawing samples from a uniform distribution between -1 and 1. Sequence length ($ n$ ) is 65536. Number of relations tested in this plot = $ 10^{4}$.

Figure 11 illustrates the correlation between $ \Theta _{\left \vert-\right \vert}$ and $ \Theta _{\times }$ with $ h=-l=1$ . Since every block comparison is the summation of variables of the same distribution we get in the end a Gaussian distribution [21]with means equal to the means of the underlying operators $ \theta$ and a standard deviation that should be divided by $ \sqrt{n}$ . The standard deviations of $ \theta_{\times}$ and $ \theta_{\left\vert-\right\vert}$ are

$\displaystyle \sigma_{\theta_{\times}}=\frac{h-l}{12}\sqrt{7h^{2}+10h.l+7l^{2}}\qquad\sigma_{\theta_{\left\vert-\right\vert}}=\frac{h-l}{3\sqrt{2}}$

Which means that following the central limit theorem [21], we get, when $ h=-l$ , the following standard deviations for the block operators $ \Theta _{\times }$ and $ \Theta _{\left \vert-\right \vert}$ .

$\displaystyle \sigma_{\Theta_{\times}}=\frac{h^{2}}{3\sqrt{n}}\qquad\sigma_{\Theta_{\left\vert-\right\vert}}=\frac{h\sqrt{2}}{3\sqrt{n}}$ (48)
Based on the knowledge that a negative correlation exists, we can link $ \Theta _{\times }$ and $ \Theta _{\left \vert-\right \vert}$ by subtracting the mean from both and then dividing them by their standard deviation. This gives the equality:

$\displaystyle \frac{\Theta_{\times}-\mathbb{E}\left[\Theta_{\times}\right]}{\si...

Taking out $ \Theta _{\times }$ allows us to predict it based on the measurement of $ \Theta _{\left \vert-\right \vert}$ . We call the predictions respectively $ \Theta'_{\times}$ and $ \Theta'_{\left\vert-\right\vert}$ .

$\displaystyle \Theta'_{\times}=-\frac{\sigma_{\Theta_{\times}}}{\sigma_{\Theta_...
...a_{\left\vert-\right\vert}\right]\right)+\mathbb{E}\left[\Theta_{\times}\right]$ (49)
$\displaystyle \Theta'_{\left\vert-\right\vert}=-\frac{\sigma_{\Theta_{\left\ver...
...a_{\times}\right]\right)+\mathbb{E}\left[\Theta_{\left\vert-\right\vert}\right]$ (50)
Substituting the standard deviations from (48) and the estimates from (30) and (8) further gives the final predictions:

$\displaystyle \Theta'_{\times}=\frac{-h}{\sqrt{2}}\Theta_{\left\vert-\right\ver...

The expected difference between $ \Theta_{\theta}$ and $ \Theta'_{\theta}$ is 0 as shown below:

    $\displaystyle \mathbb{E}\left[\Theta'_{\times}-\Theta_{\times}\right]$  
  $\displaystyle =$ $\displaystyle \mathbb{E}\left[-\frac{\left(\Theta_{\left\vert-\right\vert}-\mat...
  $\displaystyle =$ $\displaystyle -\mathbb{E}\left[\Theta_{\left\vert-\right\vert}-\mathbb{E}\left[...
    $\displaystyle +\mathbb{E}\left[\mathbb{E}\left[\Theta_{\times}\right]\right]-\mathbb{E}\left[\Theta_{\times}\right]$ (51)
  $\displaystyle =$ $\displaystyle -\left(\mathbb{E}\left[\Theta_{\left\vert-\right\vert}\right]-\ma...
    $\displaystyle +\mathbb{E}\left[\Theta_{\times}\right]-\mathbb{E}\left[\Theta_{\times}\right]$ (52)
  $\displaystyle =$ 0 (53)
A similar proof holds for $ \Theta _{\left \vert-\right \vert}$ . The next thing we need to know is, when predicting one in function of the other, how good this prediction is and what standard deviation of the error will be. If $ a$ refers to the measurement and $ a'$ refers to the prediction based on a variable $ b$ , as described above ((49) and (50)), then we find that the standard deviation of the prediction error ($ a'-a$ ) is given as
$\displaystyle \sigma_{a'-a}=\sigma_{a}\sqrt{2(1+r)}$ (54)

(54) can be deduced as follows:

$\displaystyle \sigma_{a'-a}=\sqrt{\mathbb{E}\left[\left(a'-a\right)^{2}\right]-\mathbb{E}\left[a'-a\right]^{2}}$

Since the second term is known to be zero (53), we have

$\displaystyle \sigma_{a'-a}^{2}$ $\displaystyle =$ $\displaystyle \mathbb{E}\left[\left(a'-a\right)^{2}\right]$ (55)
  $\displaystyle =$ $\displaystyle \mathbb{E}\left[a'^{2}\right]-2\mathbb{E}\left[a.a'\right]+\mathbb{E}\left[a^{2}\right]$ (56)
Working out the three terms, with the knowledge that the $ \mathbb{E}\left[b-\mathbb{E}\left[b\right]\right]$ equals 0 and $ \mathbb{E}\left[\left(b-\mathbb{E}\left[b\right]\right)^{2}\right]=\sigma_{b}^{2}$ gives
$\displaystyle \mathbb{E}\left[a'^{2}\right]$ $\displaystyle =$ $\displaystyle \mathbb{E}\left[\left(-\frac{\left(b-\mathbb{E}\left[b\right]\right)\sigma_{a}}{\sigma_{b}}+\mathbb{E}\left[a\right]\right)^{2}\right]$  
  $\displaystyle =$ $\displaystyle \frac{\sigma_{a}^{2}}{\sigma_{b}^{2}}\mathbb{E}\left[\left(b-\mathbb{E}\left[b\right]\right)^{2}\right]$  
    $\displaystyle -2\frac{\sigma_{a}\mathbb{E}\left[a\right]}{\sigma_{b}}\mathbb{E}\left[b-\mathbb{E}\left[b\right]\right]+\mathbb{E}\left[a\right]^{2}$  
  $\displaystyle =$ $\displaystyle \sigma_{a}^{2}+\mathbb{E}\left[a\right]^{2}$ (57)
The last term of (56) to calculate is $ \mathbb{E}\left[a.a'\right]$
    $\displaystyle \mathbb{E}\left[a.a'\right]$  
  $\displaystyle =$ $\displaystyle \mathbb{E}\left[-\frac{\left(b-\mathbb{E}\left[b\right]\right)\sigma_{a}a}{\sigma_{b}}+a\mathbb{E}\left[a\right]\right]$  
  $\displaystyle =$ $\displaystyle -\frac{\sigma_{a}}{\sigma_{b}}\mathbb{E}\left[\left(b-\mathbb{E}\left[b\right]\right)a\right]+\mathbb{E}\left[a\mathbb{E}\left[a\right]\right]$  
  $\displaystyle =$ $\displaystyle -\frac{\sigma_{a}}{\sigma_{b}}\left(\mathbb{E}\left[ab\right]-\mathbb{E}\left[b\right]\mathbb{E}\left[a\right]\right)+\mathbb{E}\left[a\right]^{2}$ (58)
Since the correlation between $ a$ and $ b$ can be written as

$\displaystyle r_{a,b}=\frac{\mathbb{E}\left[ab\right]-\mathbb{E}\left[a\right]\mathbb{E}\left[b\right]}{\sigma_{a}\sigma_{b}}$

we have that $ r_{a,b}\sigma_{a}\sigma_{b}=\mathbb{E}\left[ab\right]-\mathbb{E}\left[a\right]\mathbb{E}\left[b\right]$ and thus (58) becomes

$\displaystyle \mathbb{E}\left[a.a'\right]=-\sigma_{a}^{2}r+\mathbb{E}\left[a\right]^{2}$ (59)
which gives, by substituting (57) and (59) into the standard variation of the prediction mismatch (56):
$\displaystyle \sigma_{a'-a}^{2}$ $\displaystyle =$ $\displaystyle \sigma_{a}^{2}+\mathbb{E}\left[a\right]^{2}+2\sigma_{a}^{2}r-2\mathbb{E}\left[a\right]^{2}+\mathbb{E}\left[a^{2}\right]$  
  $\displaystyle =$ $\displaystyle \sigma_{a}^{2}(2+2r)$  
$\displaystyle \sigma_{a'-a}$ $\displaystyle =$ $\displaystyle \sigma_{a}\sqrt{2+2.r_{a,b}}$  
Going back to our $ \Theta _{\times }$ operator (with $ h=1$ ), this results in

$\displaystyle \sigma_{\Theta'_{\times}-\Theta_{\times}}=\frac{h^{2}}{3\sqrt{n}}\sqrt{2-\frac{6\sqrt{2}}{5}}\sim\frac{0.18}{\sqrt{n}}$

with $ \Theta _{\left \vert-\right \vert}$ this becomes

$\displaystyle \sigma_{\Theta'_{\left\vert-\right\vert}-\Theta_{\left\vert-\righ...

Both are interesting because they allow us to determine a confidence interval for the estimation based on the block size.

For a length $ n=1000$ we get a standard deviation of the prediction error of the $ \Theta'_{\times}$ -mismatch as $ 0.00580176$ . If we would work on the interval $ \left[-128,128\right]$ , we would get a prediction mismatch of $ 95.056$ . On the interval $ \left[-32768,32768\right]$ , the prediction mismatch would be $ 6.2.10^{6}$ . This seems large but the expected results for $ \Theta _{\times }$ are large as well if we work with $ h>1$ . For length $ n=10000$ , $ h=1$ we get a standard deviation of $ 0.00183468$ . For a length of $ n=100$ we have a standard deviation of $ 0.0183468$ . The standard deviation of the $ \left\vert-\right\vert$ prediction based on $ \times$ for length $ n=100$ is $ 0.0259463$ . For $ n=1000$ this becomes $ 0.00820493$ and $ n=10000$ gives $ 0.00259463$ .

5 Discussion

5.1 Bias and non-correlation

It is easy to recognize that a baseline added to any of the two comparator blocks, will not influence the correlation value, it will merely increase the variance of the inproduct, while it will increase the value of the SAD. This results in predictions that might be 'off by a baseline'. Therefore, when using this technique in a sliding window setup, one might consider appropriate techniques to remove baselines, such as low pass filtering, multirate signal processing and so on [12, 17].

A second observation of importance is that zeros, in either block, will influence the reliability of the correlation. The more zeros, the more the SAD and the inproduct will differ. The inproduct essentially skips zeros and always returns a 'minimal match', irrelevant of the other value. The absolute difference will not be influenced by zeros, and will treat them as any other value, and thus still distinguish properly.

5.2 Auto-Comparison

As a practical application we investigated the similarity between auto-correlation and the lagging sum of absolute differences, respectively defined as $ r$ and $ s$ :

$\displaystyle r\left(j\right)=\sum_{i}x_{i}x_{i+j}\qquad s\left(j\right)=\sum_{i}\left\vert x_{i}-x_{i+j}\right\vert$
Figure 12: A plot showing the measured auto-correlation (red), the measured lagging sum of absolute differences (green), the predicted auto-correlation (purple) and the predicted lagging sum of absolute differences (blue)
Image autocor

The input dataset were the first $ 6.10^{4}$ samples of a song, which were then equalized and scaled to the range $ \left[-1,1\right]$ . On this dataset we calculated $ r$ and $ s$ , plotted in figure 12 in red and green respectively. The predicted values of $ r$ and $ s$ in function of the other are respectively purple and blue. We find that the predictions are fairly accurate and allow the recognition of the same peaks as one goes along. Near the end of the dataset $ s$ appears biased due to the window size, which becomes smaller as the lag increases. $ r$ reflects this in its variance. The similarity between $ r$ and $ s$ has already led to an application in meta data extraction, in which an algorithmic approach, using the SAD, assesses music-tempo slightly faster and with less memory requirements than conventional auto-correlation. This has been implemented in the DJ-program BpmDj [22, 23].

5.3 Block Matching

Figure 13: Finding a pattern of 1000 bytes in a haystack of 100000 byes. The standard inproduct comparator is colored red. The SAD is colored green. The predicted inproduct is colored purple and the predicted S.A.D is colored blue. Left) The needle consists of a random uniform sample. Right) The needle is taken from the haystack at position 50000.
Image needle1 Image needle2

A second test we conducted was finding a pattern (called the needle) in a larger data set (the haystack). As a haystack we used the first $ 6.10^{5}$ equalized samples of a song. We generated two needles. The first one was a random white noise pattern, the second one was a random block taken from the haystack. Both needles were $ 10^{3}$ samples long. Using a sliding window block comparator we plotted the inproduct and the SAD, as well as their predicted values (Figure 13). As observed before, when the inproduct increases, the sad decreases and vice versa. Both plots start with an unexpected large sad and noisy inproduct because the song starts with some ambiance, which are many values near zero. Therefore in this area, the inproduct had a smaller variance than the rest of the song while the sad was larger than expected. The needle we looked for in the second test was taken from the middle of the haystack. At this position we indeed find a peaking of all values.

5.4 Local matching

In auto-comparison and block searching we find the same peaks back and, skipping the baseline, at a local level the inproduct behaves similar to the SAD. This makes clear that whatever technique one uses to find a global match, both the inproduct and SAD are equally valid to find local minima. This has an important application in landmark tracking, which can be done equally well with Fourier transformations as with the widely used SAD.

5.5 Equalization of the amplitudes

The equalization of amplitudes can be easily conducted for sound waves or color images. There one simply counts the number of occurrences of specific amplitudes, which is an $ \mathcal{O}\left(n\right)$ operation. Afterwards , a cumulative distribution of this binned histogram can be made. The total cumulative value can then form the divisor for all the binned numbers.

If $ B[v]$ is the number of time the value $ v$ occurred in the audio-stream, then we define $ C[v]$ as

$\displaystyle C[v]=\sum_{i=0}^{v}B[i]$

The maximal value is found at $ C[m]$ . ($ m$ standing for maximum amplitude). The equalization of the histogram now consists of assigning a floating point value to each of the original inputs and using those values. The lookup table will be called $ M$

$\displaystyle M[v]=\frac{2.C[v]}{C[m]}-1$

Translating the original signal through $ M$ will make it uniform distributed as

$\displaystyle X'_{i}=M[X_{i}]$

This operation is $ \mathcal{O}\left(n\right)$ . It has however some memory lookups that might be unattractive for long sequences. The next question is whether on real signals, how well the correlation holds. Can we for instance find the

6 Summary

If two data sequences are uniformly sampled from -1:1 then both the summed inproduct (SIP) and the sum of absolute differences (SAD) behave similarly, with a correlation coefficient of -0.8485. This useful relationship allows the prediction of one operator in function of the other as follows:

SIP$\displaystyle '\left(X,Y\right)\sim\frac{-h}{\sqrt{2}}$SAD$\displaystyle \left(X,Y\right)+\frac{2h^{2}}{3\sqrt{2}}$
SAD$\displaystyle '\left(X,Y\right)\sim-\frac{\mbox{SIP}(X,Y)\sqrt{2}}{h}+\frac{2h}{3}$
The standard deviation of the expected error is given by

$\displaystyle \sigma_{\mbox{SAD}'-\mbox{SAD}}=\frac{0.18.h^{2}}{\sqrt{n}}\qquad\sigma_{\mbox{SIP}'-\mbox{SIP}}=\frac{0.259.h}{\sqrt{n}}$

Applications of this relation might, among others, include optimizations in the areas of sound analysis, pattern-matching, digital image processing and compression algorithms (such as rsync).

The use of absolute difference was attractive because it offers a way to compare signals such that mismatches could not cancel each other out thereby providing the same result independent of any constant added to the signal. Convolution did not behave as such and in fact provided, in our view, not the best manner of comparing two signals: it was dependent on the constant added to the signal (which can happen due to very low frequency oscillation that cannot be captured in the window). On the other hand, when many comparisons are necessary, the inproduct can be implemented very efficiently by means of Fourier transformation.

Regarding the proof and formulas used in this document: it is probably the worst piece of math I've ever been involved with. It is tedious, complicated and at the moment I have no clue how I could simplify it. Probably there are some black magic tricks mathematicians could use, but not being a native mathematician, I'm stuck with what I have.


1.Building Blocks for MPEG Stream Processing Stephan Wong, Jack Kester, Michiel Konstapel, Ricardo Serra, Otto Visser institution: Computer Engineering Laboratory, Electrical Engineering Department, Delft University of Technology; 2002
2.Fast Sum of Absolute Differences Visual Landmark Detector Craig Watman, David Austin, Nick Barnes, Gary Overett, Simon Thompson Proceedings of IEEE Conference on Robotics and Automation; April; 2004
3.The Sum-Absolute-Difference Motion Estimation Accelerator S. Vassiliadis, E.A. Hakkennes, J.S.S.M. Wong, G.G. Pechanek EUROMICRO 98 Proceedings of the 24th EUROMICRO Conference, Vasteras, Sweden, pp. 559-566, Aug 1998, IEEE Computer Society, ISBN: 0-8186-8646-4.; 1998
4.Real-time Area Correlation Tracker Implementation based on Absolute Difference Algorithm Weichao Zhou, Zenhan Jiang, Mei Li, Chunhong Wang, Changhui Rao Optical Engineering; volume: 42; number: 9; pages: 2755-2760; September; 2003
5.Employing Difference Image in Affine Tracking G. Jing, D. Rajan, C.E.Siong Signal and Image Processing; 2006
6.Absolute Difference Generator for use in Display Systems David F. McManigal howpublished: United States Patent 4218751; March 7; 1979
7.Arithmetic circuit for calculating the absolute value of the difference between a pair of input signals Masaaki Yasumoto, Tadayoshi Enomoto, Masakazu Yamashina howpublished: United States Patent 4,849,921; June; 1989
8.Calculating the absolute difference of two integer numbers in a single instruction cycle Roney S. Wong howpublished: United States Patent 5835389; April; 1996
9.Adaptive Bit-Reduce Mean Absolute Difference criterion for block-matchin algorithm and its VLSI Design Hwang-Seok Oh, Yunju Baek, Heung-Kyu Lee Opt. Eng.; volume: 37; number: 12; pages: 3272-3281; December; 1998
10.Numerical Recipes in C++ William T. Veterling, Brian P. Flannery Cambridge University Press; editor: William H. Press and Saul A. Teukolsky; chapter: 15.7; edition: 2nd; February; 2002
11.Do Aligned Sequences Share the Same Fold ? Ruben A. Abagyan, Serge Batalov Journal of Molecular Biology; 1997
12.Discrete-Time Signal Processing Alan V. Oppenheim, Ronald W. Schafer Prentice Hall; editor: John R. Buck; 1989
13.Juggling with Pattern Matching Jean Cardinal, Steve Kremer, Stefan Langerman institution: Universite Libre De Bruxelles; 2005
14.A Randomized Algorithm for Approximate String Matches Mikhail J. Atallah, Frederic Chyzak, Philippe Dumas Algorithmica; volume: 29; number: 3; pages: 468-486; 2001
15.Scientific Visualization: The Visual Extraction of Knowledge from Data Julia Ebling, Gerik Scheuermann Berlin Springer; editor: G.P. Bonneau and T. Ertl and G.M. Nielson; chapter: Clifford Convolution And Pattern Matching On Irregular Grids; pages: 231-248; 2005
16.The Art of Computer Programming Donald Knuth Addison-Wesley; chapter: 1.2.11: Asymptotic Representations; pages: 107-123; volume: 1; edition: 3th; 1997
17.Numerical Recipes in C++ William T. Veterling, Brian P. Flannery Cambridge University Press; editor: William H. Press and Saul A. Teukolsky; chapter: 10; edition: 2nd; February; 2002
18.Uniform Difference Distribution. Eric. W. Weisstein From MathWorld - A Wolfram Web Resource; July; 2003 http://mathworld.wolfram.com/UniformDifferenceDistribution.html
19.Probability Generating Functions of Absolute Difference of Two Random Variables Prem S. Puri Proceeding of the National Academy of Sciences of the United States of America; volume: 56; pages: 1059-1061; 1966
20.Uniform Product Distribution. Eric. W. Weisstein From MathWorld - A Wolfram Web Resource.; July; 2003 http://mathworld.wolfram.com/UniformProductDistribution.html
21.Foundations of Modern Probability O. Kallenberg New York: Springer-Verlag; 1997
22.BPM Measurement of Digital Audio by Means of Beat Graphs & Ray Shooting Werner Van Belle December 2000 http://werner.yellowcouch.org/Papers/bpm04/
23.DJ-ing under Linux with BpmDj Werner Van Belle Published by Linux+ Magazine, Nr 10/2006(25), October 2006 http://werner.yellowcouch.org/Papers/bpm06/

A. Source Code: Measuring Correlation Between Operators

\noindent \texttt{\small\char93 include \char\lq \uml {}helpers...
...har\lq \uml {},rs);}{\small\par
\noindent \texttt{\small\}}\end{singlespace}