GPGPU

nvcc

gpucc

Usage :

$ clang++ -Wall -Wextra -O3 --cuda-path=/opt/cuda -I/opt/cuda/samples/common/inc -L/opt/cuda/lib64 -lcudart test.cu -o test.out

Open Source GPGPU compiler built on top of LLVM

  • Straight-line scalar optimizations
    • Addressing Mode (base + imm)

    • Injured Redundancy

    • Pointer Arithmetic Reassociation

    • Straight-Line Strength Reduction

    • Global Reassociation

  • Inferring memory spaces
    • load/store PTX
      • .reg : registers

      • .sreg : special, read-only, platform-specific registers

      • .const : shared, read-only memory

      • .global : global memory, shared by all threads

      • .local : local memory, private to each thread

      • .param : parameters passed to the kernel

      • .shared : memory shared between threads in a block

      • .tex : global texture memory (deprecated)

    • Memory space qualifiers

    • Fixed-point data-flow analysis

  • Loop unrolling and function inlining
    • Higher threshold

    • #pragma unroll

    • __forceinline_

  • Memory-space alias analysis

  • Speculative execution

  • Bypassing 64-bit divisions

Reference