nvmath-python
nvmath-python (Beta) is an open source library that gives Python applications high-performance pythonic access to the core mathematical operations implemented in the NVIDIA CUDA-X™ Math Libraries for accelerated library, framework, deep learning compiler, and application development.
Install From Pip:Install From Conda:
Other Links:
Key Features
Truly Pythonic Accelerated Math
nvmath-python provides pythonic host and device APIs for using the highly optimized NVIDIA math libraries in Python applications, without the need for intermediary C or C++ bindings. This allows Python applications across deep learning, data processing, and more to leverage the power of NVIDIA hardware for computations out-of-the-box.
nvmath-python provides freedom of using either stateless (function-style) or stateful (class-style) APIs, depending on how much control of expressiveness a user wants. The library also provides integration with the standard logger library to offer visibility into computational details.
Interoperability With Python Ecosystem
nvmath-python works in conjunction with popular Python packages. This includes GPU-based packages like CuPy, PyTorch, and RAPIDS and CPU-based packages like NumPy, SciPy, and scikit-learn. You can keep using familiar data structures and workflows while benefiting from accelerated math through nvmath-python.
In combination with Python compilers, such as Numba, you can implement device callback functions to customize nvmath-python behavior.
High Performance
nvmath-python delivers performance comparable to the underlying C libraries. The APIs are designed for minimal overhead, allowing near-native performance for Python applications. Callbacks allow device kernels fusion without the extra overheads of a host.
nvmath-python Supported Operations
Linear Algebra
Advanced matrix-matrix multiplication allows performing fused kernel matrix-matrix multiplications with a bias, different scaling factors, and epilog functions.
The operation is available in a variety of precisions, both as a host API and as a device API.
The nvmath-python library offers a specialized matrix multiplication interface to perform scaled matrix-matrix multiplication with predefined epilogues as a single fused kernel. This can potentially lead to significantly better efficiency. On top of that, nvmath-python’s stateful APIs decompose such operations into planning, autotuning, and execution phases, which enables amortization of one-time preparatory costs across multiple executions.

Fast Fourier Transforms
Backed by the NVIDIA cuFFT library, nvmath-python provides a powerful set of APIs to perform N-dimensional discrete Fourier Transformations. These include forward and inverse transformations for complex-to-complex, complex-to-real, and real-to-complex cases. The operations are available in a variety of precisions, both as host and device APIs.
nvmath-python also provides a set of utility functions APIs to facilitate an implementation of user-specific caching for FFT plans.

The user can provide callback functions written in Python to selected nvmath-python operations like FFT, which results in a fused kernel and can lead to significantly better performance. Advanced users may benefit from nvmath-python device APIs that enable fusing core mathematical operations like FFT and matrix multiplication into a single kernel, bringing performance close to the theoretical maximum.
Resources
Get started with nvmath-python