GPUs are efficient parallel processors that can deliver low-latency response times for services responding to arbitrary incoming requests.
The NVIDIA GPU REST Engine (GRE) is a critical component for developers building low-latency web services. GRE includes a multi-threaded HTTP server that presents a RESTful web service and schedules requests efficiently across multiple NVIDIA GPUs. The overall response time depends on how much processing you need to do, but GRE itself adds very little overhead and can process null-requests in as little as 10 microseconds.
The GRE powered web service takes incoming requests, fetches the required input data and schedules quanta of work to a work queue that is serviced asynchronously. A separate thread manages the work for each GPU and ensures that all GPUs in the system are continuously servicing requests from the queue.
GRE uses a technique called “latency hiding” for high-throughput performance. This means that at the same time the data for a particular request is being processed by a GPU, the data for the next request is being prepared and the results from the previous request are being delivered back to the requestor. All in parallel. The entire queue is processed as efficiently as possible and, in many cases, the GPU never has to wait for data to arrive.
The NVIDIA Image Compute Engine (ICE) is an example of a micro-service powered by GRE that provides GPU-accelerated image processing and re-sizing services to web and mobile applications.
You can build your own services and accelerate them in the same way. Start with GRE so you can focus on implementing the algorithm that powers the service you want to accelerate. You need to specify how the data needed for your algorithm can be captured in the form of a URL. Then you just need to provide the function to operate on that data. If, for example, you were doing some kind of audio processing you could just write a GPU function or call an existing GPU-accelerated library function from the NVIDIA Performance Primitives (NPP) library that takes the input audio sample and some parameters and applies the transformation.
We have released GRE in an open source format that allows you to look at the scheduler component itself and try out your own custom algorithms. The kit includes a simple HTTP server written in the Go language and an image classification service powered by the cuDNN deep learning library.
If you're interested in being notified about updates to GRE, please consider signing up for the interest list below.
You may also be interested in: