GPGPU
nvcc
gpucc
Usage :
$ clang++ -Wall -Wextra -O3 --cuda-path=/opt/cuda -I/opt/cuda/samples/common/inc -L/opt/cuda/lib64 -lcudart test.cu -o test.out
Open Source GPGPU compiler built on top of LLVM
- Straight-line scalar optimizations
Addressing Mode (base + imm)
Injured Redundancy
Pointer Arithmetic Reassociation
Straight-Line Strength Reduction
Global Reassociation
- Inferring memory spaces
- load/store PTX
.reg : registers
.sreg : special, read-only, platform-specific registers
.const : shared, read-only memory
.global : global memory, shared by all threads
.local : local memory, private to each thread
.param : parameters passed to the kernel
.shared : memory shared between threads in a block
.tex : global texture memory (deprecated)
Memory space qualifiers
Fixed-point data-flow analysis
- Loop unrolling and function inlining
Higher threshold
#pragma unroll
__forceinline_
Memory-space alias analysis
Speculative execution
Bypassing 64-bit divisions
Reference
[2015] LLVM Developers’ Meeting: Jingyue Wu 「Optimizing LLVM for GPGPU」
- Wikipedia - Parallel Thread Execution
pseudo-assembly language used in Nvidia’s CUDA
CUDA -> PTX (translate bye nvcc) -> Binary (compile by graphics driver)