Posts by Tony Scudiero
Development & Optimization
Jul 02, 2025
Advanced NVIDIA CUDA Kernel Optimization Techniques: Handwritten PTX
As accelerated computing continues to drive application performance in all areas of AI and scientific computing, there's a renewed interest in GPU optimization...
11 MIN READ
Simulation / Modeling / Design
May 01, 2025
NVIDIA Blackwell and NVIDIA CUDA 12.9 Introduce Family-Specific Architecture Features
One of the earliest architectural design decisions that went into the CUDA platform for NVIDIA GPUs was support for backward compatibility of GPU code. This...
14 MIN READ
Development & Optimization
Mar 12, 2025
Understanding PTX, the Assembly Language of CUDA GPU Computing
Parallel thread execution (PTX) is a virtual machine instruction set architecture that has been part of CUDA from its beginning. You can think of PTX as the...
13 MIN READ
Simulation / Modeling / Design
Apr 22, 2014
Separate Compilation and Linking of CUDA C++ Device Code
Managing complexity in large programs requires breaking them down into components that are responsible for small, well-defined portions of the overall program....
13 MIN READ