НОУ ИНТУИТ | Introduction to performance optimization using Intel SW tools. Лекция 6: Optimizing compiler. Auto parallelization

Учитесь и получайте официальные документы БЕСПЛАТНО. Вы можете поддержать наш проект.

Твой путь к знаниям!

Опубликован: 12.07.2012 | Доступ: свободный | Студентов: 355 / 24 | Оценка: 4.00 / 4.20 | Длительность: 11:07:00

Специальности: Программист

Теги: basic, basic block, call graph, linux, loop optimization, microprocessor, objective-c, openmp, optimizing compiler, permute, pipelining, prefetcher, register allocation, remark

|

Вам нравится? Нравится 9 студентам

| Поделиться |

Поддержать курс

| Скачать электронную книгу

C/C++ extended array notation

C/C++ language extension for array notations is an Intel-specific language extension that is a part of Intel® Cilk™ Plus feature supported by the Intel® compiler.

The C/C++ extension provides data parallel array notations with the following major benefits:

Allows you to use array notation to program parallel operations in a familiar language
Achieves predictable performance based on mapping parallel constructs to the underlying multi-threaded and SIMD hardware
Enables compiler parallelization and vectorization with less reliance on alias and dependence analysis

When you use the array notations, the Intel® compiler implements them using vector code.

Usage Recommendations

Use the array notations when your algorithm requires operations on arrays and where it does not require a specific order of operations among the elements of the array(s).

To use the array notations in your application, keep the following sequence of steps in mind:

Insert the array notations language extensions into your application source code.

Compile the application at optimization level –O1 and above to enable vectorization. By default, the compiler generates SIMD vector instructions in the SSE2 instruction set. To generate SIMD vector instructions beyond SSE2, you can add target/architecture-specific compiler options to the compile command.

By default, the Intel® compiler accepts the array notations language extensions to generate vector and multi-threaded code based on the data parallel constructs in the program.

CEAN (C/C++ Extensions for Array Notations Programming Model)

Array declarations:
Length	Storage Class	Declaration
Fixed	Static	static int a[16][128]
	Auto	void foo(void) { int a[16][128]; }
	Parameter	void bar(int a[16][128]);
	Heap	int (*p2d)[128];
Variable (C99)	Auto	void foo(int m, int n) { int a[m][n]; }
	Parameter	void bar(int m, int n, int a[m][n]);
	Heap	void bar(int m, int n) { int (*p2d)[n]; }

Declaration of the array sections

  section_operator :: = [<lower bound>:<length>:<stride>]
  a[0:3][0:4]
  b[0:2:3]

You must use –std=c99 (Linux и MAC OS) or /Qstd=c99 compiler options

Example:

typedef int (*p2d)[128];
  p2d p = (p2d) malloc (sizeof(int)*rows*128);
  p[0:rows][:]

Most of C/C++ operators are available for array sections.

a[:]*b[:]                  // element-wise multiplication
a[3:2][2:2] + b[5:2][5:2]  // matrix addition
a[0:4]+c                   // adds scalar to an array section
a[:][:] = b[:][1][:] + c   // array assignment

Function prototypes
Function Prototypes	Descriptions
__sec_reduce(fun, identity, a[:])	Generic reduction function. Reduces fun across the array a[:] using identity as the initial value.
__sec_reduce_add(a[:])	Built-in reduction function. Adds values passed as arrays
__sec_reduce_mul(a[:])	Built-in reduction function. Multiplies values passed as arrays
__sec_reduce_all_zero(a[:])	Built-in reduction function. Tests that array elements are all zero
__sec_reduce_all_nonzero(a[:])	Built-in reduction function. Tests that array elements are all non-zero
__sec_reduce_any_nonzero(a[:])	Built-in reduction function. Tests for any array element that is non-zero
__sec_reduce_min(a[:])	Built-in reduction function. Determines the minimum value of array elements
__sec_reduce_max(a[:])	Built-in reduction function. Determines the maximum value of array elements
__sec_reduce_min_ind(a[:])	Built-in reduction function. Determines the index of minimum value of array elements
__sec_reduce_max_ind(a[:])	Built-in reduction function. Determines the index of maximum value of array elements

#include <stdio.h>
#include <stdlib.h>
#define N 2000
typedef double (*p2d)[];
void matrix_mul(int n, double a[n][n], 
        double b[n][n],double c[n][n]) {
 int i,j;
 a[:][:] =1;
 b[:][:] =-1;
 for(i=0;i<n;i++)
  for(j=0;j<n;j++)
      c[i][j]=c[i][j]+ 
__sec_reduce_add(a[i][:]*b[:][j]);
 return;
}

int main() {
 p2d a= (p2d)malloc(N*N*sizeof(double)) ;
 p2d b= (p2d)malloc(N*N*sizeof(double)) ;
 p2d c= (p2d)malloc(N*N*sizeof(double));
 matrix_mul(N,a,a,a);
 matrix_mul(N,a,b,c);
 free(a);
 free(b);
 free(c);
}

Дальше >>

Авторизоваться

Introduction to performance optimization using Intel SW tools

Optimizing compiler. Auto parallelization

C/C++ extended array notation

Usage Recommendations

CEAN (C/C++ Extensions for Array Notations Programming Model)

Вопросы и ответы