NVIDIA CUDA Tile

NVIDIA® CUDA® Tile is a tile-based GPU programming model that targets portability for NVIDIA Tensor Cores. CUDA Tile unlocks peak GPU performance with a programming model that simplifies the creation of optimized, tile-based kernels across NVIDIA platforms.


Get Started

CUDA Tile is based on the Tile IR specification and tools, including cuTile, which is the user-facing language support for CUDA Tile IR (Intermediate Representation) in Python (and, in the future, C++). The NVIDIA Python implementation of this tile-based programming model is cuTile Python.

CUDA Tile IR

Virtual Instruction Set for Tile Programming:

  • Enables native programming of GPUs within the structured high-performance context of the tile programming model

Get Started With CUDA Tile IR

cuTile Python

Python-Native, Tiled Kernel Development:

  • Seamless high-level Python expression of the CUDA Tile programming model

  • Built on the foundation of the Tile IR specification

  • Offers the ability to write, define, and optimize tiled GPU kernels using familiar Python syntax

Get Started With cuTile Python 

CUDA Tile C++

Tile Kernel Development in C++

  • C++ expression of the CUDA Tile programming model

  • Built on the foundation of the Tile IR
    specification

  • Offers the ability to write, define, and
    optimize tile GPU kernels using familiar C++ syntax.

Get Started With CUDA Tile C++ 

Learning Library

Video

Get Started with cuTile Python

cuTile-Python

Jump straight into development. Get a step-by-step walkthrough on setting up and immediately using the cuTile-Python starter kit to write and run your very first tile kernel in Python.

Video

Deep Dive: How to Use cuTile-Python

cuTile-Python

Explore the full capabilities of cuTile-Python, including detailed information and practical examples on implementing advanced tiling techniques and leveraging the framework to optimize and deploy your GPU kernels.

Tech Blog

CUDA Tile: A New Era of GPU Programming

CUDA Tile

Discover CUDA Tile, the revolutionary programming model designed by NVIDIA to fundamentally simplify and optimize parallel computing. Learn how CUDA Tile works and the story behind its creation.

Tech Blog

CUDA Tile Programming in Python With cuTile Python

cuTile Python

Ready to unify GPU performance with Python's simplicity? This blog post will guide you through getting started with cuTile Python, showing you exactly how to begin defining and deploying CUDA Tile kernels using the power and flexibility of the Python language.

OSS (Github)

cuTile Python GitHub

cuTile Pyton

Access the official cuTile Python GitHub page to explore the source code, contribute to the project, and report issues. Get the complete API documentation to ensure you have all the resources needed to implement and optimize your tile kernels.

Documentation

CUDA Tile IR Documentation

CUDA Tile IR

Building CUDA Tile compilers and libraries? Consult the CUDA Tile IR documentation to gain a deep technical understanding of the specification required for implementing and targeting the CUDA Tile programming model.

OSS (Github)

cuTile Python GitHub

CUDA Tile C++

Access the CUDA Tile C++ samples to explore example tile codes which illustrates the concepts of writing CUDA Tile C++ in practice.

Documentation

CUDA Tile C++ Documentation

CUDA Tile C++

Visit the CUDA Tile C++ documentation page to explore detailed information on API usage, code examples and best practices, providing everything you need to know to effectively leverage CUDA Tile programming in your C++ applications.

OSS (Github)

CUDA Tile IR GitHub

CUDA Tile IR

Dive into the official specification for the CUDA Tile IR. Explore the precise technical details and definitions necessary to fully understand the structure, semantics, and constraints of the IR, which is essential for building or targeting the CUDA Tile infrastructure.

Documentation

cuTile Python Comprehensive Documentation

cuTile Python

Visit the complete cuTile Python documentation page to explore detailed information on installation, API usage, code examples, and best practices, providing everything you need to effectively leverage the CUDA Tile programming model within your Python projects.

OSS (Github)

TileGym Github Repository

TileGym

Head to the GitHub repository for TileGym, a CUDA Tile kernel library that provides a rich collection of kernel tutorials and examples for tile-based GPU programming. It also demonstrates how to integrate CUDA Tile into real-world large language models such as Llama 3 and DeepSeek V2.


More Resources

NVIDIA Developer Program

Join the NVIDIA Developer Program

NVIDIA Training and Certification

Get Training and Certification

NVIDIA CUDA Newsletter

Sign up for CUDA Newsletter

Get started with CUDA Tile today.

Explore the Quick Start Guide