NVIDIA CUDA Tile

NVIDIA® CUDA® Tile is a tile-based GPU programming model that targets portability for NVIDIA Tensor Cores. CUDA Tile unlocks peak GPU performance with a programming model that simplifies the creation of optimized, tile-based kernels across NVIDIA platforms.

Get Started

CUDA Tile is based on the Tile IR specification and tools, including cuTile, which is the user-facing language support for CUDA Tile IR (Intermediate Representation) in Python (and, in the future, C++). The NVIDIA Python implementation of this tile-based programming model is cuTile Python.

CUDA Tile IR

Virtual Instruction Set for Tile Programming:

Enables native programming of GPUs within the structured high-performance context of the tile programming model

Get Started With CUDA Tile IR

cuTile Python

Python-Native, Tiled Kernel Development:

Seamless high-level Python expression of the CUDA Tile programming model
Built on the foundation of the Tile IR specification
Offers the ability to write, define, and optimize tiled GPU kernels using familiar Python syntax

Get Started With cuTile Python

Learning Library

Video

Get Started with cuTile Python

cuTile-Python

Explore the full capabilities of cuTile-python, including detailed information and practical examples on implementing advanced tiling techniques and leveraging the framework to optimize and deploy your GPU kernels.

Video

Deep Dive: How to Use cuTile-Python

cuTile-Python

Explore the full capabilities of cuTile-Python, including detailed information and practical examples on implementing advanced tiling techniques and leveraging the framework to optimize and deploy your GPU kernels.

Tech Blog

CUDA Tile: A New Era of GPU Programming

CUDA Tile

Discover CUDA Tile, the revolutionary programming model designed by NVIDIA to fundamentally simplify and optimize parallel computing. Learn how CUDA Tile works and the story behind its creation.

Tech Blog

CUDA Tile Programming with cuTile Python

cuTile Python

Ready to unify GPU performance with Python's simplicity? This blog post will guide you through getting started with cuTile Python, showing you exactly how to begin defining and deploying CUDA Tile kernels using the power and flexibility of the Python language.

OSS (Github)

cuTile Python GitHub

cuTile Python

Access the official cuTile Python GitHub page to explore the source code, contribute to the project, and report issues. Get the complete API documentation to ensure you have all the resources needed to implement and optimize your tiled kernels.

Documentation

cuTile Python Comprehensive Documentation

cuTile Python

Visit the complete cuTile Python documentation page to explore detailed information on installation, API usage, code examples, and best practices, providing everything you need to effectively leverage the CUDA Tile programming model within your Python projects.

OSS (Github)

CUDA Tile IR Dialect

CUDA Tile IR

Dive into the open source MLIR dialect and bytecode serializer for CUDA Tile IR, essential for building MLIR based compilers that generate CUDA Tile IR.

Documentation

CUDA Tile IR Documentation

CUDA Tile IR

Building CUDA Tile compilers and libraries? Consult the CUDA Tile IR documentation to gain a deep technical understanding of the specification required for implementing and targeting the CUDA Tile programming model.

OSS (Github)

TileGym Github Repository

TileGym

Head to the GitHub repository for TileGym, a CUDA Tile kernel library that provides a rich collection of kernel tutorials and examples for tile-based GPU programming. It also demonstrates how to integrate CUDA Tile into real-world large language models such as Llama 3 and DeepSeek V2.

More Resources

Join the NVIDIA Developer Program

Get Training and Certification

Sign up for CUDA Newsletter

Get started with CUDA Tile today.

Explore the Quick Start Guide