Home | Papers | Reports | Projects | Code Fragments | Dissertations | Presentations | Posters | Proposals | Lectures given | Course notes |
![]() |
Correlation between the inproduct and the sum of absolute differences is -0.8485 for uniform sampled signals on [-1:1]Werner Van Belle1* - werner@yellowcouch.org, werner.van.belle@gmail.com Abstract : While investigating relations between various comparators and the inproduct we found that the inproduct correlates strongly towards the absolute difference when the domain from which the values are taken come from a uniform distribution on [-1:1]. This useful relation might help to speed up block comparison in content databases and can be used as a valuable tool to estimate the inproduct based on the absolute difference and vice versa
Keywords:
sum of absolute differences, sum of inproduct, landmark tracking, feature tracking |
To compare two data-blocks (of size
), two easy implementable
techniques are widely used. The first is the summed inproduct (SIP),
defined as
. The second one, termed the
sum of absolute differences (SAD), is defined as
.
Depending on the application behind the comparison one might be favored
over the other.
The absolute difference is a well known operator for block comparison within images and is one of the basic blocks used in MPEG coding [1]. It is often used for motion estimation by means of landmark detection [2, 3, 4, 5] and many different hardware implementations of the absolute difference exist [6, 7, 8, 9]. In general statistics, the absolute difference is also often used as a robust measure [10], with specialized application such as model evaluation of protein 3D structures [11].
In audio
applications, the absolute difference could also be used,
but for historical reasons, the standard inproduct seems to be favored,
mainly because multiplication is a basic operation in digital filters
[12]. The
inproduct, and specifically convolution,
has an interesting application in fuzzy pattern matching. There it
is often used to find sub-patterns in a haystack of information.
Convolution
will then assess the correlation between every position and the
pattern-to-be-found
[13, 14, 15]. Since convolution itself
can be efficiently implemented through a Fourier transform (or a
sliding
sliding window Fourier transform when the datasets are large), pattern
matching becomes a faster operation than finding similar fragment
by means of the absolute difference. To go somewhat into the details,
we denote the Fourier transform as
. On a block of size
it takes time complexity
log
[16].
The inproduct of two signals expressed in their frequency domain is
the same as convolution of their information in the time domain [12].
As such, the function
,
in which
denotes the
inproduct and
denotes
conjugation,
will return a vector of which element
describes the correlation
between signal
and signal
shifted
positions. This operation
itself has time complexity
,
which is much faster than the same process using an iteration of
absolute
differences, which would yield a time complexity of
.
As such, the two operations both have their advantages and disadvantages. The absolute difference can be quickly calculated, but it is difficult to optimize for many comparisons. The inproduct on the other hand is slower on individual comparisons, but has very efficient implementations for large volumes.
Aside from these
trade-offs, both metrics behave very different. The
sum of absolute differences will be small for similar blocks, while
the inproduct instead will be large. Some other subtitles arise as
well: the absolute difference is independent of any constant added
to both vectors since
.
The inproduct does not have such a property:
and will
provide different results for any two same values:
when
. In other words the strength of the reported similarity
depends on the strength of the values themselves. Actually, the
inproduct
itself is strictly speaking not a metric since
for
most values of
.
Notwithstanding their completely different mathematical behavior, both are often used in very similar contexts and therefore, we became interested to understand the relation between a number of well known comparators such as the summed inproduct and the sum of absolute differences. This could make it possible to predict one in function of the other.
To study the relations between various comparators we created
data-blocks, each containing two vectors
and
. The
variables
and
, both of size
and element
from
,
were randomly sampled from the same multi-variate distribution. Based
on such test vectors we measured the spearman rank order and linear
pearson correlation.
and
annotate the
elements of
and
, with the variables
and
.
americanFor 8-bit signed integers,
and
ranges within [-127,128], and for 16 bit signed integers the values
come from [-32767,32768]. americanWe do not
take into account the discrete nature of integers and will work with
real numbers taken from
. americanAll
and
values were drawn from the same distribution, which
could be normal or uniform distributed.
All
comparators are set up as a summation of some underlying operator
, which could be one
of the following 6:
![]() |
(1) |
![]() |
(2) |
![]() |
(3) |
![]() |
(4) |
![]() |
(5) |
![]() |
(6) |
The first operator is the absolute difference.
The second operator
is the multiplication. The third
operator
will provide us with insight on the relation
between the absolute difference and the basic difference. The fourth
operator
investigates the impact of signed to
unsigned
conversion in a comparison process and the last operator.
takes the inverse of the signal and puts it back into the equation
as an attempt to balance the inequality that occurs when comparing
identical couples with a different value.
The
multidimensional comparator
, based on
is defined as
We tested the relations between every pair of comparators using
sequences each of
samples. The values for
and
were first drawn from a uniform distribution between
and
and then from a normal distribution. The random generators are ran2
and gasdev from [17].
In all cases, both
the linear Pearson and spearman rank order correlations were similar.
The programs to measure the correlations can be found in section
appendix.
|
We now assume that both vectors
and
are uniform distributed
within range
, with
and
.
denotes the size
of the interval (
). The comparators
and
,
both depending on
and
, will now be considered to be variables
themselves. The correlation between them,
,
is given by
![]() |
(7) |
Under the assumption that the various sample positions are independent
from each other, one can easily show that
.
This reduces the 5 estimates we need to
,
,
,
and
. We will now determine those
5 values.
Estimates of a variable
can be calculated from probability functions
as follows (
is the
probability distribution function of the variable
)
To calculate the probability distribution function of with
both
and
uniform distributions over
, we must first
obtain the probability function of
, which is given by [18]:
Since this probability distribution is symmetrical around (the
means of the two uniform distribution has been subtracted), we would
get an expected value of 0, but then we would forget to take into
account the absolute difference. englishThe absolute
difference will flip the probability of all values below zero over
the vertical axis and add them there. This will result in an increased
slope to the right and a probability density of 0 for negative values
[19].
Figure 1 plots eq.8. englishThe expectancy can now be calculated
![]() |
(9) |
Testing this empirically gives the following values:
,
,
,
,
and
. Which is consistent
to the above formula.
![]() |
(10) |
we can calculate based on the previous probability
density function (8) as
![]() |
(11) |
Testing this empirically gives the following values:
,
,
,
,
and
. Which is
close to the results as given by eq. 11.
To determine we will calculate the probability
distribution function of
, which might appear as taking the
long road when the probability distribution of uniform products on
is already known [20]. However, we will
later on needs this probability distribution on
to determine
and
and
we can now introduce the technique we will also use in section 3.5.2.
The probability
density function of is defined as
the differential
of the cumulative probability function, which in turn is defined as
|
Figure 2 illustrates how to find the number of
values
(of
) located below a fixed
. The surface area that is not
cut off at
directly
reflects the probability that a value below
can be chosen because both
and
are uniformly distributed.
Since the isoline of
is
located at
, various
forms of
can be combined.
suffers from a discontinuity at 0 when
and its calculation
in this particular context is further complicated with a sudden
quadrant-change
of the isoline when
becomes
negative. In which case quadrant
1 and 3 no longer contain the isoline. Instead, quadrant 2 and 4 will
contain it (see figure 3 for an
illustration). To
approach this problem, we divided the calculation of
CDF
into three branches. The first with
; The second with
and the last with
.
|
Aside from these standard complications, we must also limit our
and
values to the range
. This restricts the maximal
surface area we can find and requires a piecewice approach. We start
out with a general solutions for rectangles originating in
and ending at a specific point. We have the following 4 sections:
Every section forms the two dimensional bounds (on and
)
for the integral
, and depending on the sign of
they wil be added or
substracted to/from each other. Figure
4 presents this graphically. When
is positive
we need to subtract
and
, otherwise we need to add
them.
is used to
denoted the size of area
intersected
with the area below the isoline. The cumulative distribution function
is now defined as
CDF![]() |
(12) |
can be
determined by first obtaining the x-point where
the hyperbole crosses the top line of our box, which happens at
position
.
When
and
are positive, this point will
always be larger than 0. When
, the intersection will happen
outside
, in which case the surface area will be
. When
we need to integrate
and
(See figure 5 for an illustration), which gives
![]() |
(14) |
The cumulative probability function (13)
requires
us to calculate the different sections based on (14)
with
;
and
.
Since (14) appears to be discontinue at
position
we split the cumulative
probability function in 4 segments.
When the cumulative probability
is 0 because there can
be no values
and
, both larger than
that can lead
to
a multiple that is smaller than
.
CDF![]() |
![]() |
![]() ![]() ![]() |
|
![]() |
![]() ![]() ![]() |
(15) |
CDF![]() |
(17) |
(15), (16) and (17)
are the non-normalized
cumulative distribution functions americanwhen
. Given the maximum surface
area (17), we can divide
and differentiate them all to provide us with the proper probability
distribution (illustrated in figure 6).
We
conclude that, when
:
When the result is somewhat more complex.
Instead
of subtracting
, we need to add them. The interesting thing
is that neither
nor
will intersect with
,
when
. However, as soon as
the situation is different.
(depicted in fig. 3). Therefore we further
branch
the calculation.
When we have
as surface area for
and
.
![]() ![]() |
(19) |
When we are
interested in the outside of the function
,
which is to be found in
and
. Since the area below
the curve matches that as if
,
and
were positive, we
get surface area
(The minus signs have been added to make
and
positive). As such, the remaining surface area must then be
![]() |
(22) |
The estimate of is calculated
as
The probability density function of is given in (18)
and (25).
The situation when
is discussed below. We first start with
the case where
. This gives us
Since (31)==(32), we conclude that
![]() |
(33) |
If we define and
to be the translation and scaling of
and
as follows
then and
are uniform distributed over
. americanThe
variable
(and
) can also be defined in function of
as
![]() |
(34) |
This substitution will make it possible to rewrite
![]() |
|||
![]() |
![]() |
||
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
|||
![]() |
(35) |
becomes
and
are uniform distributions, as such
.
The estimates for
(under
condition that
) and
(under condition that
will be the same:
![]() ![]() |
(36) |
Without the condition that
we need to double the estimate of
(36)
(35) becomes
![]() |
(37) |
The factors and
are not
independent, therefore we
cannot rely on the standard rule for uniform products [20].
In order to resolve this we divide the space of possible values in
those which
and those
with
. This gives, due to linearity
of
the following:
![]() |
(38) |
Plotting (figure 7) reveals the areas
that we are interested in. The plots have been clipped at a specific
value of
. All positions where
was larger than
have been removed. So, the surface area of the plots that is not
clipped
forms the cumulative probability that a specific combination of
will be lower than
. Of course, we are not only interested in
understanding the size of the area, but we also need to take into
account the extra condition that
.
![]()
|
![]() |
For a fixed value of we
can calculate the isoline as (taking
into account that
) min
. Figure
8 presents the shape of this function.
The surface
area under this curve is the area where
while still respecting
that
. Integrating this equation provides us with the cumulative
distribution function (39). To do so, we need
to know
where the linear 45 degree line cuts the function
.
![]() |
(40) |
This now allows us to write down the surface area below as a couple of integrals
CDF![]() |
![]() |
![]() |
|
![]() |
![]() |
![]()
|
![]() |
This function has a maximum of 0.5 at 1, so we need to multiply it
by two to get a proper cumulative distribution function.
Differentiating
it afterward gives the probability density function of (left
as calculated; right by sapping variables
and
). Figure 9a
plots (41).
![]() |
(41) |
Since we cannot rely on a symmetry around the 45 degree angle, we
also need to calculate the surface area when
. The isoline is
given as min
max
. Figure 8
provides more insight in the shape of this function. The function
appears piecewise, with first a constant line (1) and then a hyperbolic
piece up to its crossing with the 45 degree slope. Under these two
segments we also have a linear piece that we need to subtract from
the volume. For a fixed
we
determine the crossing of 1 with
and the crossing between
and
. The first
crossing occurs at position
. The second crossing occurs
at
(40).
We can now calculate the surface area as
CDF![]() |
![]() |
![]() |
|
![]() |
(42) | ||
![]() |
![]() |
(43) |
Combining (41) and (44) leads
to the following
estimates for (38)
![]() |
![]() |
||
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
||
![]() |
![]() |
![]() |
|
![]() |
![]() |
Knowing the value for
makes
it possible to finish (37) :
![]() |
![]() |
![]() |
|
![]() |
![]() |
(45) |
Testing this empirically gives the following values:
,
,
,
,
,
and
.
Which is correct with the above formula.
The correlation coefficient between and
(7) is given as:
![]() |
(46) |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]()
|
Figure fig:correlationipadplot shows the correlation strength
in function of and
. Surprisingly the correlation for any
point with
is
. The
maximal correlation is found with
, which is then
![]() ![]() |
(47) |
Figure 11 illustrates the
correlation
between and
with
.
Since every block comparison is the summation of variables of the
same distribution we get in the end a Gaussian distribution [21]with means equal to the means of the underlying operators
and a standard deviation that should be divided by
. The
standard deviations of
and
are
Which means that following the central limit theorem [21],
we get, when
, the following standard deviations for the block
operators
and
.
![]() |
(48) |
Taking out
allows us to predict it based on the
measurement of
. We call the predictions
respectively
and
.
![]() |
(49) |
The expected difference between and
is 0 as shown below:
![]() |
|||
![]() |
![]() |
||
![]() |
![]() |
||
![]() |
(51) | ||
![]() |
![]() |
||
![]() |
(52) | ||
![]() |
0 | (53) |
(54) can be deduced as follows:
Since the second term is known to be zero (53), we have
![]() |
![]() |
![]() |
(55) |
![]() |
![]() |
(56) |
![]() |
![]() |
![]() |
|
![]() |
![]() |
||
![]() |
|||
![]() |
![]() |
(57) |
we have that
and thus (58) becomes
![]() |
(59) |
![]() |
![]() |
![]() |
|
![]() |
![]() |
||
![]() |
![]() |
![]() |
with this becomes
Both are interesting because they allow us to determine a confidence interval for the estimation based on the block size.
For a length we get a standard deviation
of the prediction
error of the
-mismatch as
. If we
would work on the interval
, we would get
a prediction mismatch of
. On the interval
,
the prediction mismatch would be
. This seems large but
the expected results for
are large as
well if we
work with
. For length
,
we get a standard
deviation of
. For a length of
we have a standard
deviation of
. The standard deviation of the
prediction based on
for length
is
.
For
this becomes
and
gives
.
It is easy to recognize that a baseline added to any of the two comparator blocks, will not influence the correlation value, it will merely increase the variance of the inproduct, while it will increase the value of the SAD. This results in predictions that might be 'off by a baseline'. Therefore, when using this technique in a sliding window setup, one might consider appropriate techniques to remove baselines, such as low pass filtering, multirate signal processing and so on [12, 17].
A second observation of importance is that zeros, in either block, will influence the reliability of the correlation. The more zeros, the more the SAD and the inproduct will differ. The inproduct essentially skips zeros and always returns a 'minimal match', irrelevant of the other value. The absolute difference will not be influenced by zeros, and will treat them as any other value, and thus still distinguish properly.
As a practical application we investigated the similarity between
auto-correlation and the lagging sum of absolute differences,
respectively
defined as and
:
![]()
|
The input dataset were the first samples of a song, which
were then equalized and scaled to the range
.
On this dataset we calculated
and
, plotted in figure 12
in red and green respectively. The predicted values of
and
in function of the other are respectively purple and blue. We find
that the predictions are fairly accurate and allow the recognition
of the same peaks as one goes along. Near the end of the dataset
appears biased due to the window size, which becomes smaller as the
lag increases.
reflects this
in its variance. The similarity
between
and
has already led to an application in
meta data
extraction, in which an algorithmic approach, using the SAD, assesses
music-tempo slightly faster and with less memory requirements than
conventional auto-correlation. This has been implemented in the
DJ-program BpmDj [22, 23].
|
A second test we conducted was finding a pattern (called the needle)
in a larger data set (the haystack). As a haystack we used the first
equalized
samples of a song. We generated two needles.
The first one was a random white noise pattern, the second one was
a random block taken from the haystack. Both needles were
samples long. Using a sliding window block comparator we plotted the
inproduct and the SAD, as well as their predicted values (Figure 13).
As observed before, when the inproduct increases, the sad decreases
and vice versa. Both plots start with an unexpected large sad and
noisy inproduct because the song starts with some ambiance, which
are many values near zero. Therefore in this area, the inproduct had
a smaller variance than the rest of the song while the sad was larger
than expected. The needle we looked for in the second test was taken
from the middle of the haystack. At this position we indeed find a
peaking of all values.
In auto-comparison and block searching we find the same peaks back and, skipping the baseline, at a local level the inproduct behaves similar to the SAD. This makes clear that whatever technique one uses to find a global match, both the inproduct and SAD are equally valid to find local minima. This has an important application in landmark tracking, which can be done equally well with Fourier transformations as with the widely used SAD.
The equalization of amplitudes can be easily conducted for sound waves
or color images. There one simply counts the number of occurrences
of specific amplitudes, which is an
operation.
Afterwards , a cumulative distribution of this binned histogram can
be made. The total cumulative value can then form the divisor for
all the binned numbers.
If is the number of time the
value
occurred in
the audio-stream,
then we define
as
The maximal value is found at
. (
standing for maximum
amplitude). The equalization of the histogram now consists of assigning
a floating point value to each of the original inputs and using those
values. The lookup table will be called
Translating the original signal through will make it uniform
distributed as
This operation is
. It has however some
memory lookups that might be unattractive for long sequences. The
next question is whether on real signals, how well the correlation
holds. Can we for instance find the
If two data sequences are uniformly sampled from -1:1 then both the summed inproduct (SIP) and the sum of absolute differences (SAD) behave similarly, with a correlation coefficient of -0.8485. This useful relationship allows the prediction of one operator in function of the other as follows:
SIP![]() ![]() |
SAD![]() |
Applications of this relation might, among others, include optimizations in the areas of sound analysis, pattern-matching, digital image processing and compression algorithms (such as rsync).
The use of absolute difference was attractive because it offers a way to compare signals such that mismatches could not cancel each other out thereby providing the same result independent of any constant added to the signal. Convolution did not behave as such and in fact provided, in our view, not the best manner of comparing two signals: it was dependent on the constant added to the signal (which can happen due to very low frequency oscillation that cannot be captured in the window). On the other hand, when many comparisons are necessary, the inproduct can be implemented very efficiently by means of Fourier transformation.
Regarding the proof and formulas used in this document: it is probably the worst piece of math I've ever been involved with. It is tedious, complicated and at the moment I have no clue how I could simplify it. Probably there are some black magic tricks mathematicians could use, but not being a native mathematician, I'm stuck with what I have.
1. | Building Blocks for MPEG Stream Processing Stephan Wong, Jack Kester, Michiel Konstapel, Ricardo Serra, Otto Visser institution: Computer Engineering Laboratory, Electrical Engineering Department, Delft University of Technology; 2002 |
2. | Fast Sum of Absolute Differences Visual Landmark Detector Craig Watman, David Austin, Nick Barnes, Gary Overett, Simon Thompson Proceedings of IEEE Conference on Robotics and Automation; April; 2004 |
3. | The Sum-Absolute-Difference Motion Estimation Accelerator S. Vassiliadis, E.A. Hakkennes, J.S.S.M. Wong, G.G. Pechanek EUROMICRO 98 Proceedings of the 24th EUROMICRO Conference, Vasteras, Sweden, pp. 559-566, Aug 1998, IEEE Computer Society, ISBN: 0-8186-8646-4.; 1998 |
4. | Real-time Area Correlation Tracker Implementation based on Absolute Difference Algorithm Weichao Zhou, Zenhan Jiang, Mei Li, Chunhong Wang, Changhui Rao Optical Engineering; volume: 42; number: 9; pages: 2755-2760; September; 2003 |
5. | Employing Difference Image in Affine Tracking G. Jing, D. Rajan, C.E.Siong Signal and Image Processing; 2006 |
6. | Absolute Difference Generator for use in Display Systems David F. McManigal howpublished: United States Patent 4218751; March 7; 1979 |
7. | Arithmetic circuit for calculating the absolute value of the difference between a pair of input signals Masaaki Yasumoto, Tadayoshi Enomoto, Masakazu Yamashina howpublished: United States Patent 4,849,921; June; 1989 |
8. | Calculating the absolute difference of two integer numbers in a single instruction cycle Roney S. Wong howpublished: United States Patent 5835389; April; 1996 |
9. | Adaptive Bit-Reduce Mean Absolute Difference criterion for block-matchin algorithm and its VLSI Design Hwang-Seok Oh, Yunju Baek, Heung-Kyu Lee Opt. Eng.; volume: 37; number: 12; pages: 3272-3281; December; 1998 |
10. | Numerical Recipes in C++ William T. Veterling, Brian P. Flannery Cambridge University Press; editor: William H. Press and Saul A. Teukolsky; chapter: 15.7; edition: 2nd; February; 2002 |
11. | Do Aligned Sequences Share the Same Fold ? Ruben A. Abagyan, Serge Batalov Journal of Molecular Biology; 1997 |
12. | Discrete-Time Signal Processing Alan V. Oppenheim, Ronald W. Schafer Prentice Hall; editor: John R. Buck; 1989 |
13. | Juggling with Pattern Matching Jean Cardinal, Steve Kremer, Stefan Langerman institution: Universite Libre De Bruxelles; 2005 |
14. | A Randomized Algorithm for Approximate String Matches Mikhail J. Atallah, Frederic Chyzak, Philippe Dumas Algorithmica; volume: 29; number: 3; pages: 468-486; 2001 |
15. | Scientific Visualization: The Visual Extraction of Knowledge from Data Julia Ebling, Gerik Scheuermann Berlin Springer; editor: G.P. Bonneau and T. Ertl and G.M. Nielson; chapter: Clifford Convolution And Pattern Matching On Irregular Grids; pages: 231-248; 2005 |
16. | The Art of Computer Programming Donald Knuth Addison-Wesley; chapter: 1.2.11: Asymptotic Representations; pages: 107-123; volume: 1; edition: 3th; 1997 |
17. | Numerical Recipes in C++ William T. Veterling, Brian P. Flannery Cambridge University Press; editor: William H. Press and Saul A. Teukolsky; chapter: 10; edition: 2nd; February; 2002 |
18. | Uniform Difference Distribution. Eric. W. Weisstein From MathWorld - A Wolfram Web Resource; July; 2003 http://mathworld.wolfram.com/UniformDifferenceDistribution.html |
19. | Probability Generating Functions of Absolute Difference of Two Random Variables Prem S. Puri Proceeding of the National Academy of Sciences of the United States of America; volume: 56; pages: 1059-1061; 1966 |
20. | Uniform Product Distribution. Eric. W. Weisstein From MathWorld - A Wolfram Web Resource.; July; 2003 http://mathworld.wolfram.com/UniformProductDistribution.html |
21. | Foundations of Modern Probability O. Kallenberg New York: Springer-Verlag; 1997 |
22. | BPM Measurement of Digital Audio by Means of Beat Graphs & Ray Shooting Werner Van Belle December 2000 http://werner.yellowcouch.org/Papers/bpm04/ |
23. | DJ-ing under Linux with BpmDj Werner Van Belle Published by Linux+ Magazine, Nr 10/2006(25), October 2006 http://werner.yellowcouch.org/Papers/bpm06/ |
http://werner.yellowcouch.org/ werner@yellowcouch.org | ![]() |