Julien Demouth

Julien is a Senior Manager in the GPU Architecture group at NVIDIA. He is one of the co-authors of many of the low level implementations for Deep Learning in cuDNN and TensorRT. Among other things, Julien wrote the first version of FFT-based 2D convolutions for cuDNN, he wrote a large fraction of the Implicit GEMM convolutions for Maxwell, Pascal and Volta GPUs, and he is the author of several Winograd implementations. Julien holds a Ph.D. in Computational Geometry from INRIA in France.

Posts by Julien Demouth

Technical Walkthrough 0

CUTLASS: Fast Linear Algebra in CUDA C++

Update May 21, 2018: CUTLASS 1.0 is now available as Open Source software at the CUTLASS repository. CUTLASS 1.0 has changed substantially from our preview… 25 MIN READ
Technical Walkthrough 0

How We Achieved Record Finance Benchmark Performance on Tesla K80

STAC Research develops financial benchmarks in partnership with leading banks and software or hardware vendors. The STAC-A2 suite of benchmarks aims to… 7 MIN READ
GPU Pro Tip
Technical Walkthrough 0

CUDA Pro Tip: Minimize the Tail Effect

When I work on the optimization of CUDA kernels, I sometimes see a discrepancy between Achieved and Theoretical Occupancies. The Theoretical Occupancy is the… 3 MIN READ
Technical Walkthrough 0

American Option Pricing with Monte Carlo Simulation in CUDA C++

In finance, an option (or derivative) is the common name for a contract that, under certain conditions, gives a firm the right or obligation to receive or… 10 MIN READ