Archive

Archive for February, 2009

CUDA Thread, Warp and SIMT

February 26th, 2009

Coming from the general CPU background, it is important to understand the difference in the thread execution model between CPU threads and CUDA threads. The major difference is in how the threads are scheduled. From the software’s point of view, CPU threads (no matter they are hyperthreads or vertical threads) are executed independently. CUDA threads are scheduled in a groups of warps. The threads within a warp are executed in a somewhat lock-step way called single-instruction multiple-thread (SIMT).

From the Nvidia Compute PTX ISA 1.2 manual (p.9)

Individual threads composing a SIMT warp start together at the same program address … A warp executes one common instruction at a time, …. If threads of a warp diverge via a data-dependent conditional branch, the warp serially executes each branch path taken, disabling threads that are not on that path, and when all paths complete, the threads converge back to the same execution path. Branch divergence occurs only within a warp; different warps execute independently regardless of whether they are executing common or disjointed code paths.

Notice that when there is a branch, the execution of the two branch paths (if both will be executed) are serialized. Say we have 32 threads in a warp and 16 of them will take branch A and the rest will take branch B, and processor chooses to execute A before B. Then none of the 16 threads on the B branch will be executed until those on branch A complete. Because of this hardware imposed ordering, one cannot assume the two branches will be executed concurrently!

As a result, programs that try to implement consumer/producer style communication within a warp between the two branches using busy-waiting loop may hang. For example, if the consumer branch is executed first, the consumer threads will loop forever because the producer threads never get a chance to execute.

Categories: Parallel Programming Tags:

Talking Head

February 5th, 2009

A short video of me talking about compiler and tools support of parallel programing in Sun Studio while I was at Sun:

Video

Categories: Parallel Programming Tags:

“The Design of OpenMP Tasks”

February 5th, 2009

My co-authored paper “The Design of OpenMP Tasks” is now featured in the March issue of IEEE Transactions on Parallel and Distributed Systems.

Categories: Parallel Programming Tags: