TAU Performance System® is a profiling and tracing toolkit for performance analysis of hybrid parallel programs written in CUDA C, CUDA C++, OpenCL or using pyCUDA or OpenACC.  TAU gathers performance information of GPU computations and integrates it with other application performance data, through instrumentation of functions, methods, basic blocks, and statements 
to
 capture
 a 
performance 
picture
 of 
the
 resulting 
application
 execution.

To
 address
 the
 high‐level
 programming
 aspect,
 TAU can be integrated
 with
 CUDA compilers and OpenACC compilers.
 
 TAU intercepts the runtime library routines and automatically
 inserts
 calls
 to
 the
 TAU measurement
 interfaces
 in
 its
 runtime
 system
 and
 compiler generated
 code.

Example: TAU profile of GPU Accelerated NAMD

Screen Shotr

For more information visit: