Hardware-optimised library for Area correlation algorithm
Hosted by
  SourceForge Logo

Select topic Area Correlation SIMD Instruction

Area Correlation

The measure of similarity between images is known as their correlation. The computation of the correlation is very similar to that involved in convolution. Only the indexing scheme is different. We can think of the window, that is the small image containing the feature to be found, as playing a similar role to that of the mask in convolution. The window is a small array of pixels being positioned over the main image at different places and taking part in a sum of products calculation. The resulting intrinsic image is a correlation map. Its pixel values represent how well each small neighborhood of the main image matches the window. If there is a single, strong maximum value in the correlation map it indicates the location of the desired feature (1).
It is usual to normalize the result of a correlation in order to make the resulting peaks sharper and easier to identify. Consequently, the full equation for a correlation is a little more complex than that for a convolution (2).

Original image with selected area of interests (red square).
Coefficient of Un Normalized Area correlation (white is highest)


This result illustrate, that is necessary to use normalization during the calculation.

Coefficient of Normalized Area correlation (white is highest)


The result is right.

SIMD Instruction

SIMD - single instruction, multiple data;

SIMD technology in CPU:
 • Intel MMX;
 • AMD 3DNow;
 • Intel SSE;
 • Intel SSE2;
 • Motorola AltiVec;

The MMX technology uses the single instruction, multiple data (SIMD) technique for performing arithmetic and logical operations on the bytes, words, or doublewords packed into MMX registers. For example, the PADDSW instruction adds 4 signed word integers from one source operand to 4 signed word integers in a second source operand and stores 4 word integer results in the destination operand. (Note that the same MMX register is generally used for the second source and the destination operand.) This SIMD technique speeds up software performance by allowing the same operation to be carried out on multiple data elements in parallel. The MMX technology supports parallel operations on byte, word, and doubleword data elements when contained in MMX registers. The SIMD execution model supported in the MMX technology directly addresses the needs of modern media, communications, and graphics applications, which often use sophisticated algo-rithms that perform the same operations on a large number of small data types (bytes, words, and doublewords). For example, most audio data is represented in 16-bit (word) quantities. The MMX instructions can operate on 4 words simultaneously with one instruction. Video and graphics information is commonly represented as palletized 8-bit (byte) quantities. Here, one MMX instruction can operate on 8 bytes simultaneously.

Single Instruction, Multiple Data (SIMD) Execution Model

Typical architecture of processor with SIMD.

Scheme AMD K6-III, contains ten execution pipelines—store, load, integer X ALU, integer Y ALU, MMX ALU (X), MMX ALU (Y), MMX/3DNow! multiplier, 3DNow! ALU, Floating-Point, and Branch.


X and Y modules can work simultaneously.

It's typicaly for Intel Pentium MMX and above, AMD K6-2 and above, VIA C3..

Mapping the MMX registers on the floating-point stack enables backwards compatibility for the register saving that must occur as a result of task switching.

MMX data

Packed byte
Eight 8-bit bytes packed into 64 bits
Signed integer range(–2^7 to 2^7–1)
Unsigned integer range(0 to 2^8–1)
Packed word
Four 16-bit words packed into 64-bits
Signed integer range(–2^15 to 2^15–1)
Unsigned integer range(0 to 2^16–1)
Packed doublewords
Two 32-bit packed into 64 bits
Signed integer range(–2^31 to 2^31–1)
Unsigned integer range(0 to 2^32–1)
One 64-bit quadword
Signed integer range(–2^63 to 2^63–1)
Unsigned integer range(0 to 2^64–1)

SSE/SSE2 instruction.
In Intel Pentium III and AMD Athlon XP exist SSE registers - 8 x 128bit.
SSE support 32bit floating-point operation.
SSE2 and Intel Pentium 4 processor add support for 8, 16, 32 64 bit integer data in XMM(SSE) registers.


128-Bit Packed Double- Precision Floating-Point;
128-Bit Packed Byte Integers;
128-Bit Packed Word Integers;
128-Bit Packed Doubleword Integers;
128-Bit Packed Quadword Integers

AMD Hammer (x86-64) architecture will support:
*64-bit virtual addresses (implementations can have less). *Register extensions through a new prefix (REX): - Adds eight GPRs (R8–R15). - Widens GPRs to 64 bits. - Adds eight 128-bit streaming SIMD extension (SSE) registers (XMM8–XMM15).
*64-bit instruction pointer (RIP).
The AltiVec technology extends the instruction set architecture (ISA) of the PowerPC architecture. AltiVec technology is a short vector parallel architecture. The AltiVec ISA is based on separate vector/SIMD-style (single instruction stream, multiple data streams) execution units that have high data parallelism.


- The IA-32 Intel Architecture Software Developer’s Manual consists of three volumes: Basic Architecture, Order Number 245470 - 24547004.pdf;
- AMD-K6 MMX Enhanced Processor. Multimedia Technology - 20726.pdf;
- AMD-K6-III Processor Data Sheet - 21918.pdf
- AltiVec Technology- Programming Environments Manual - altivec_pem.pdf;
- AMD 64-Bit Technology.The AMD x86-64 ™ Architecture.Programmers Overview - x86-64_overview.pdf



© 2000-2001 by Andrey Yaromenok aka Azazello