It contains excellent GPU implementations of hundreds of matrix, signal, and image processing routines that enable it outperform CPU libraries like IPP, MKL, Eigen, Armadillo, and more.

It is optimized for any CUDA-enabled GPU. The same code will run on laptops, desktops, or servers.

It includes thousands of lines of highly-tuned device code.

It performs run-time analysis of your code to increase arithmetic intensity and memory throughput while avoiding unnecessary temporary allocations.

It combines and enhances all the best CUDA libraries available, including the fastest FFT, BLAS, and LAPACK implementations.

A simple array notation you can learn in minutes.

A few lines of ArrayFire code accomplishes what would have taken 10-100X lines in raw CUDA.

It is easier than templated programming and goes farther than simple directive-based approaches (and outperforms those approaches too).

It supports easily scaling to take advantage of multiple GPUs.

It can be used in C/C++ applications by itself or integrated with your existing CUDA code.

It has hundreds of functions you need to make your code faster including arithmetic, linear algebra, statistics, signal processing, image processing, and related algorithms (see more).

It supports single and double-precision floating point values, complex numbers, and booleans (see more).

It supports manipulating vectors, matrices, and N-dimensional arrays (see more).

It can execute loop iterations in parallel with gfor (see more).