Опубликован: 12.07.2012 | Доступ: свободный | Студентов: 293 / 18 | Оценка: 4.00 / 4.20 | Длительность: 11:07:00
Специальности: Программист
Лекция 6:

Optimizing compiler. Auto parallelization

< Лекция 5 || Лекция 6: 12345 || Лекция 7 >
Аннотация: The lecture describes main features of the multiprocessor and multicore computing systems, pros and cons of the multithreaded applications. Auto parallelization as the simple method for multi-threaded application creation. Compiler command line options. Some of the language extensions used for parallelization Manual and automatic prefetching.
Ключевые слова: presentation, CAN, processor, edition, dual, core, Series, processor family, dual-processor, AND, energy efficiency, priority, hyperthreading, technology, system, with, uniform memory access, CPU, computer system, processing, parallel computer, distributed memory, autonomous, memory system, smp system, data bus, NUMA, addressing space, latency, instability, application performance, multiprocessor machine, application, FROM, ONE, access speed, memory, used, computing resource, overhead, data race, business applications, aware, parallelization, compiler optimization, sequential, utilize, simultaneous, automatic parallelization, free, programmer, manual, generate, let, this, slide, demonstration, speedup, improvement, real, TIME, USER, amount, example, e-cash, subsystem, system bus, bus bandwidth, bottleneck, right, picture, relation, matrix, multiplication, highly, scalable, algorithm, permute, optimization, initial, instruction execution, ITS, applicability, heuristic, Report, control, diagnostic, level, option, if, compiler, parallelize, problem, remark, insufficient, inner loop, fuse, loop optimization, unsupported, avoidance, function call, local variable, Global Variable, structure member, disambiguation, assumption, prove, estimated, compilation time, compiler option, fusion, interface, OPEN, software interface, programming, multiprocessor, computation, shared memory, environment variable, Modified, launch, default, ALL, available, loop parallelization, LIKE, function, iteration, space, part, AS, thread, VALUES, iterative loop, separate, optimizing compiler, able, create, estimation, loop iteration, ETC, runtime check, effective, Aggressive Mode, loop fusion, loop interchange, loop distribution, loop unrolling, program design, simplify, usage, prefetcher, Data, slow, cache, software, insertion, special, implicit, intrinsic function, form, Line, specified, size, hint, hardware, memory access pattern, choose, appropriate, scheme, constant, Computing, ignore, busy, help, memory requirements, slowdown, inefficiency, IDEA, performance, gain, SEC, input, SP, enlargement, hard, determine, Instruction, memory subsystem, memory latency, bandwidth, call, result, IA-64, language extension, array, feature, extension, data parallelism, parallel operation, vectorized code, WHERE, NOT, require, SEQUENCE OF, INSERT, compile, optimization level, vectorization, compiler generator, SIMD, vector instruction, SSE2, instruction set, Add, Command, vector, code base, array declaration, declaration, mac, MOST, function prototype, reduction, functionality testing, array element

Multicore and multiprocessor is de facto standard

The presentation can be downloaded here.

Intel® Pentium® Processor Extreme Edition (2005-2007)

Dual core introduced

Intel® Xeon® Processor 5100, 5300 Series

Intel® Core™2 Processor Family (2006-)

Dual-processor and quad-core

Intel® Xeon® Processor 5200, 5400, 7400 Series

Intel® Core™2 Processor Family (2007-)

2-4 processors, up to 6 cores

Intel® Atom™ Processor Family (2008-)

Energy Efficiency has high priority

Intel® Core™i7 Processor Family (2008-)

Hyperthreading technology. System with non-uniform memory access.

CPU core is a complete computer system that shares some of the processing resources with other cores

Dual CPU Core Chip

Рис. 6.1. Dual CPU Core Chip

Multiprocessor systems are

  1. Massively parallel computers or systems with distributed memory (MPP systems).
    • Each processor is completely autonomous
    • There is a communications medium.
    • Advantages: good scalability
    • Disadvantages: slow inter processor communication/li>
  2. Shared memory systems (SMP systems)
    • All processors are equidistant from the memory. Communication with the memory via a common data bus.
    • Advantages: good inter processor communication
    • Disadvantages:
      • poor scalability
      • the high cost of cache subsystem synchronization
  3. Systems with non-uniform memory access (NUMA)
    • Memory is physically distributed among processors. Single address space is supported at the hardware level.
    • Advantages: good inter processor communication and scalability
    • Disadvantages: different latency for the different parts of memory.
Intel QuickPath Architecture

увеличить изображение
Рис. 6.2. Intel QuickPath Architecture

Instability of the application performance on multiprocessor machines with non-uniform memory access.

OS periodically moves application from one core to another with different access speed to the memory used.

Pros and cons of the multi-threaded applications

  • + +:
    • Computational resources are increased according to the kernel count.
  • - -:
    • The increasing complexity of design
    • Thread synchronization overhead
    • Data races/ resource concurrency
    • Thread creation overhead
  • Conclusion:
    • If you are developing business applications, clearly aware of the goals and price of parallelism in your application.

Auto parallelization is a compiler optimization which automatically converts sequential code into multi-threaded in order to utilize multiple cores simultaneously. The purpose of the automatic parallelization is to free programmer from the difficult and tedious manual parallelization.


enable the auto-parallelization to generate multi-threaded code for loops that can be safely executed in parallel

< Лекция 5 || Лекция 6: 12345 || Лекция 7 >
Еленеа Бобко
Еленеа Бобко
Беларусь, Минск
Dunduk Dunduk
Dunduk Dunduk