Опубликован: 12.07.2012 | Доступ: свободный | Студентов: 355 / 24 | Оценка: 4.00 / 4.20 | Длительность: 11:07:00
Специальности: Программист
Лекция 6:

Optimizing compiler. Auto parallelization

< Лекция 5 || Лекция 6: 12345 || Лекция 7 >

C/C++ extended array notation

C/C++ language extension for array notations is an Intel-specific language extension that is a part of Intel® Cilk™ Plus feature supported by the Intel® compiler.

The C/C++ extension provides data parallel array notations with the following major benefits:

  • Allows you to use array notation to program parallel operations in a familiar language
  • Achieves predictable performance based on mapping parallel constructs to the underlying multi-threaded and SIMD hardware
  • Enables compiler parallelization and vectorization with less reliance on alias and dependence analysis

When you use the array notations, the Intel® compiler implements them using vector code.

Usage Recommendations

Use the array notations when your algorithm requires operations on arrays and where it does not require a specific order of operations among the elements of the array(s).

To use the array notations in your application, keep the following sequence of steps in mind:

Insert the array notations language extensions into your application source code.

Compile the application at optimization level –O1 and above to enable vectorization. By default, the compiler generates SIMD vector instructions in the SSE2 instruction set. To generate SIMD vector instructions beyond SSE2, you can add target/architecture-specific compiler options to the compile command.

By default, the Intel® compiler accepts the array notations language extensions to generate vector and multi-threaded code based on the data parallel constructs in the program.

CEAN (C/C++ Extensions for Array Notations Programming Model)

Array declarations:
Length Storage Class Declaration
Fixed Static static int a[16][128]
Auto void foo(void) { int a[16][128]; }
Parameter void bar(int a[16][128]);
Heap int (*p2d)[128];
Variable (C99) Auto void foo(int m, int n) { int a[m][n]; }
Parameter void bar(int m, int n, int a[m][n]);
Heap void bar(int m, int n) { int (*p2d)[n]; }

Declaration of the array sections

  section_operator :: = [<lower bound>:<length>:<stride>]
  a[0:3][0:4]
  b[0:2:3]

You must use –std=c99 (Linux и MAC OS) or /Qstd=c99 compiler options

Example:

typedef int (*p2d)[128];
  p2d p = (p2d) malloc (sizeof(int)*rows*128);
  p[0:rows][:]

Most of C/C++ operators are available for array sections.

a[:]*b[:]                  // element-wise multiplication
a[3:2][2:2] + b[5:2][5:2]  // matrix addition
a[0:4]+c                   // adds scalar to an array section
a[:][:] = b[:][1][:] + c   // array assignment
Function prototypes
Function Prototypes Descriptions
__sec_reduce(fun, identity, a[:]) Generic reduction function. Reduces fun across the array a[:] using identity as the initial value.
__sec_reduce_add(a[:]) Built-in reduction function. Adds values passed as arrays
__sec_reduce_mul(a[:]) Built-in reduction function. Multiplies values passed as arrays
__sec_reduce_all_zero(a[:]) Built-in reduction function. Tests that array elements are all zero
__sec_reduce_all_nonzero(a[:]) Built-in reduction function. Tests that array elements are all non-zero
__sec_reduce_any_nonzero(a[:]) Built-in reduction function. Tests for any array element that is non-zero
__sec_reduce_min(a[:]) Built-in reduction function. Determines the minimum value of array elements
__sec_reduce_max(a[:]) Built-in reduction function. Determines the maximum value of array elements
__sec_reduce_min_ind(a[:]) Built-in reduction function. Determines the index of minimum value of array elements
__sec_reduce_max_ind(a[:]) Built-in reduction function. Determines the index of maximum value of array elements
#include <stdio.h>
#include <stdlib.h>
#define N 2000
typedef double (*p2d)[];
void matrix_mul(int n, double a[n][n], 
        double b[n][n],double c[n][n]) {
 int i,j;
 a[:][:] =1;
 b[:][:] =-1;
 for(i=0;i<n;i++)
  for(j=0;j<n;j++)
      c[i][j]=c[i][j]+ 
__sec_reduce_add(a[i][:]*b[:][j]);
 return;
}
int main() {
 p2d a= (p2d)malloc(N*N*sizeof(double)) ;
 p2d b= (p2d)malloc(N*N*sizeof(double)) ;
 p2d c= (p2d)malloc(N*N*sizeof(double));
 matrix_mul(N,a,a,a);
 matrix_mul(N,a,b,c);
 free(a);
 free(b);
 free(c);
}

< Лекция 5 || Лекция 6: 12345 || Лекция 7 >