Julien Demouth

Julien is a Senior Manager in the GPU Architecture group at NVIDIA. He is one of the co-authors of many of the low level implementations for Deep Learning in cuDNN and TensorRT. Among other things, Julien wrote the first version of FFT-based 2D convolutions for cuDNN, he wrote a large fraction of the Implicit GEMM convolutions for Maxwell, Pascal and Volta GPUs, and he is the author of several Winograd implementations. Julien holds a Ph.D. in Computational Geometry from INRIA in France.

Posts by Julien Demouth

Artificial Intelligence

CUTLASS: Fast Linear Algebra in CUDA C++

Update May 21, 2018: CUTLASS 1.0 is now available as Open Source software at the CUTLASS repository. CUTLASS 1.0 has changed substantially from our preview… 25 MIN READ

How We Achieved Record Finance Benchmark Performance on Tesla K80

STAC Research develops financial benchmarks in partnership with leading banks and software or hardware vendors. The STAC-A2 suite of benchmarks aims to… 7 MIN READ
Accelerated Computing

CUDA Pro Tip: Minimize the Tail Effect

When I work on the optimization of CUDA kernels, I sometimes see a discrepancy between Achieved and Theoretical Occupancies. The Theoretical Occupancy is the… 3 MIN READ
Accelerated Computing

American Option Pricing with Monte Carlo Simulation in CUDA C++

In finance, an option (or derivative) is the common name for a contract that, under certain conditions, gives a firm the right or obligation to receive or… 10 MIN READ