Simulation / Modeling / Design

Inside the Programming Evolution of GPU Computing

In a recent interview, NVIDIA VP of Accelerated Computing Ian Buck talks about the history of using GPUs for more than just game graphics.

Back in 2000, Buck and a small computer graphics team at Stanford University were watching the steady evolution of computer graphics processors for gaming and thinking about how such devices could be extended to fit a wider class of applications.

Ian Buck, VP of Accelerated Computing at NVIDIA

“At the time, a lot of the GPU development was driven by the need for more realism, which meant programs were being written that could run at every pixel to improve the game,” Buck tells The Platform.

“These programs were tiny then—four instructions, maybe eight—but they were running on every pixel on the screen; a million pixels, sixty times per second. This was essentially a massively parallel program to try to make beautiful games, but we started by seeing a fit for matrix multiplies and linear algebra within that paradigm.”

Buck’s small research team developed Brook, the original precursor to the now ubiquitous parallel programming model, CUDA, which has been developed and championed by NVIDIA—a place Buck found himself after the company, eager to explore computational opportunities for GPUs, snatched him away from his Stanford research work.

The idea behind Brook, and of course, later, CUDA, was to create a programming approach that would resonate with any C programmer but offer the higher level parallel programming concepts that could be compiled to the GPU. Brook took off in a few scientific computing circles, where interest continued to build after 2004, when Buck took the work to NVIDIA.

Now, over a decade later, there are “too many to count” hard at work on everything from libraries, programming tweaks, and of course, the NVIDIA Tesla series GPU accelerators, the most recent of which is the K80, which has 4,992 CUDA cores across its two GPUs and close to two teraflops peak double precision floating point performance at its base clock speed.

Read the entire interview on The Platform >>

Discuss (0)