GPGPU

nvcc

Usage :

$ clang++ -Wall -Wextra -O3 --cuda-path=/opt/cuda -I/opt/cuda/samples/common/inc -L/opt/cuda/lib64 -lcudart test.cu -o test.out

Open Source GPGPU compiler built on top of LLVM

Straight-line scalar optimizations
- Addressing Mode (base + imm)
- Injured Redundancy
- Pointer Arithmetic Reassociation
- Straight-Line Strength Reduction
- Global Reassociation
Inferring memory spaces
- load/store PTX
  
  .reg : registers
  
  .sreg : special, read-only, platform-specific registers
  
  .const : shared, read-only memory
  
  .global : global memory, shared by all threads
  
  .local : local memory, private to each thread
  
  .param : parameters passed to the kernel
  
  .shared : memory shared between threads in a block
  
  .tex : global texture memory (deprecated)
- Memory space qualifiers
- Fixed-point data-flow analysis
Loop unrolling and function inlining
- Higher threshold
- #pragma unroll
- __forceinline_
Memory-space alias analysis
Speculative execution
Bypassing 64-bit divisions