VampirTrace performance monitor gives detailed insight into the runtime behavior of accelerators. This enables an extensive performance analysis and optimization of hybrid programs written in CUDA, OpenACC, OpenCL, and PyCUDA.
VampirTrace is capable of tracing GPU accelerated applications
and generates exact time stamps for all GPU related events. The
information can be used to generate quick profiles or can also be
graphically analyzed using Vampir. Vampir allows interactive navigation
(zooming, moving) through the timelines of the execution of a parallel
application annotated with a lot of statistics like time consumed, number of
invocations, messages statistics, performance counter support, etc. The
latest addition also allows capturing of GPU performance counters.
Most projects claim at least 20% performance increase after isolating
serial or unbalanced portions of their code and optimizing them.
For more information visit these links: