Опубликован: 12.07.2012 | Доступ: свободный | Студентов: 355 / 24 | Оценка: 4.00 / 4.20 | Длительность: 11:07:00
Специальности: Программист
Лекция 5:

Optimizing compiler. Vectorization

< Лекция 4 || Лекция 5: 12345 || Лекция 6 >

Different data types can be packed in vector registers as follows

Packed data type Vector length Bits per element Data type range
signed bytes 16 8 -2**7 to 2**7-1
unsigned bytes 16 8 0 до 2**8-1
signed words 8 16 -2**15 to 2**15-1
unsigned words 8 16 0 до 2**16
signed doublewords 4 32 -2**31 to 2**31-1
unsigned doublewords 4 32 0 до 2**32-1
signed quadwords 2 64 -2**63 to 2*63-1
unsigned quadwords 2 64 0 до 2**64-1
single-precision fps 4 32 2**-126 to 2**127
double-precision fps 2 64 2**-1022 to 2**1023

Selecting the appropriate data type for calculations can significantly affect application performance.

Optimization with Switches

SIMD – SSE, SSE2, SSE3, SSE4.2 Support

Рис. 5.5. SIMD – SSE, SSE2, SSE3, SSE4.2 Support

Instruction groups

Data movement instructions
Instruction Suffix Description
movdqa move double quadword aligned
movdqu move double quadword unaligned
mova [ ps, pd ] move floating-point aligned
movu [ ps, pd ] move floating-point unaligned
movhl [ ps ] move packed floating-point high to low
movlh [ ps ] move packed floating-point low to high
movh [ ps, pd ] move high packed floating-point
movl [ ps, pd ] move low packed floating-point
mov [ d, q, ss, sd ] move scalar data
lddqu load double quadword unaligned
Mov <d/sh/sl>dup move and duplicate
pextr [ w ] extract word
pinstr [ w ] insert word
pmovmsk [ b ] move mask
movmsk [ ps, pd ] move mask

An aligned data movement instruction cannot be applied to the memory location which is not aligned by 16 (bytes).

Intel arithmetic instructions
Instruction Suffix Description
padd [ b, w, d, q ] packed addition (signed and unsigned)
psub [ b, w, d, q ] packed subtraction (signed and unsigned)
padds [ b, w ] packed addition with saturation (signed)
paddus [ b, w ] packed addition with saturation (unsigned)
psubs [ b, w ] packed subtraction with saturation (signed)
psubus [ b, w ] packed subtraction with saturation (unsigned)
pmins [ w ] packed minimum (signed)
pminu [ b ] packed minimum (unsigned)
pmaxs [ w ] packed maximum (signed)
pmaxu [ b ] packed maximum (unsigned)
Floating-point arithmetic instructions
Instruction Suffix Description
add [ ss, ps, sd, pd ] addition
div [ ss, ps, sd, pd ] division
min [ ss, ps, sd, pd ] minimum
max [ ss, ps, sd, pd ] maximum
mul [ ss, ps, sd, pd ] multiplication
sqrt [ ss, ps, sd, pd ] square root
sub [ ss, ps, sd, pd ] subtraction
rcp [ ss, ps] approximated reciprocal
rsqrt [ ss, ps] approximated reciprocal square root
Idiomatic arithmetic instructions
Instruction Suffix Description
pang [ b, w ] packed average with rounding (unsigned)
pmulh/pmulhu/pmull [ w ] packed multiplication
psad [ bw ] packed sum of absolute differences (unsigned)
pmadd [ wd ] packed multiplication and addition (signed)
addsub [ ps, pd ] floating-point addition/subtraction
hadd [ ps, pd ] floating-point horizontal addition
hsub [ ps, pd ] floating-point horizontal subtraction
Logical instructions
Instruction Suffix Description
pand bitwise logical AND
pandn bitwise logical AND-NOT
por bitwise logical OR
pxor bitwise logical XOR
and [ ps, pd ] bitwise logical AND
andn [ ps, pd ] bitwise logical AND-NOT
or [ ps, pd ] bitwise logical OR
xor [ ps, pd ] bitwise logical XOR

Comparison instructions :

Таблица .
Instruction Suffix Description
pcmp<cc> [ b, w, d ] packed compare
cmp<cc> [ ss, ps, sd, pd ] floating-point compare

<cc> defines comparison operation.

lt – less, gt – greater, eq - equal

Conversion instructions
Instruction Suffix Description
packss [wb, dw] pack with saturation (signed)
paсkus [wb] pack with saturation (unsigned)
cvt<s2d> conversion
cvtt<s2d> conversion with truncation
Shift instructions
Instruction Suffix Description
psll [ w, d, q, dq ] shift left logical (zero in)
psra [w, d] shift right arithmetic (sign in)
psrl [ w, d, q, dq ] shift right logical (zero in)
Shuffle instructions
Instruction Suffix Description
pshuf [ w, d ] packed shuffle
pshufh [w] packed shuffle high
pshufl [w] packed shuffle low
ырга [ ps, pd ] shuffle
Unpack instructions
Instruction Suffix Description
punpckh [bw, wd, dq, qdq] unpack high
punpckl [bw, wd, dq, qdq] unpack low
unpckh [ps, pd] unpack high
unpckl [ps, pd] unpack low
Cacheability control and prefetch instructions
Instruction Suffix Description
movnt [ ps, pd, q, dq ] move aligned non-temporal
prefetch<hint> prefetch with hint

State management instructions

These instructions are commonly used by operating system.

< Лекция 4 || Лекция 5: 12345 || Лекция 6 >