Why do GPUs Use SIMT – Isn’t SIMT Slow With Branching?
Almost all modern GPUs use the Single-Instruction Multiple-Thread (SIMT) architecture. With SIMT a group of threads (or wavefront) execute the same instruction in lock-step. Apart from making it a pain to program, it also means that when threads execute different code paths (e.g., if/else), the GPU must execute each branch one-by-one instead of simultaneously. This […]