VampirTrace performance monitor gives detailed insight into the runtime behavior of accelerators. This enables an extensive performance analysis and optimization of hybrid programs written in CUDA, OpenACC, OpenCL, and PyCUDA.

VampirTrace is capable of tracing GPU accelerated applications

and generates exact time stamps for all GPU related events. The

information can be used to generate quick profiles or can also be

graphically analyzed using Vampir. Vampir allows interactive navigation

(zooming, moving) through the timelines of the execution of a parallel

application annotated with a lot of statistics like time consumed, number of

invocations, messages statistics, performance counter support, etc. The

latest addition also allows capturing of GPU performance counters.