ICYMI: Exploring Challenges Posed by Biased Datasets Using RAPIDS cuDF
Read about an innovative GPU solution that solves limitations using small biased datasets with RAPIDS cuDF.
Read about an innovative GPU solution that solves limitations using small biased datasets with RAPIDS cuDF.
Today’s data science problems demand a dramatic increase in the scale of data as well as the computational power required to process it. Unfortunately, the end of Moore’s law means that handling large data sizes in today’s data science ecosystem requires scaling out to many CPU nodes, which brings its own problems of communication bottlenecks, energy, and … Continued
Get the latest best practices about how to accelerate your data science projects with RAPIDS.
RAPIDS is about creating bridges, connections, and clean handoffs between GPU PyData libraries. Interoperability with functionality is our goal. For example, if you’re working with RAPIDS cuDF but need a more linear-algebra oriented function that exists in CuPy, you can leverage the interoperability of the GPU PyData ecosystem to use that function. Just like you … Continued
In this post, I introduce a design and implementation of a framework within RAPIDS cuDF that enables compiling Python user-defined functions (UDF) and inlining them into native CUDA kernels. This framework uses the Numba Python compiler and Jitify CUDA just-in-time (JIT) compilation library to provide cuDF users the flexibility of Python with the performance of … Continued
Gathering business insights can be a pain, especially when you’re dealing with countless data points. It’s no secret that GPUs can be a time-saver for data scientists. Rather than wait for a single query to run, GPUs help speed up the process and get you the insights you need quickly. In this video, Allan Enemark, … Continued
Dive into the RAPIDS Accelerator for Apache Spark toolset, including the workload qualification tool for estimating speedup on GPU and the profiling tool for tuning jobs.
Data collected on a vast scale has fundamentally changed the way organizations do business, driving demand for teams to provide meaningful data science, machine learning, and deep learning-based business insights quickly. Data science leaders, plus the Dev Ops and IT teams supporting them, constantly look for ways to make their teams productive while optimizing their costs … Continued
The human body is made up of nearly 40 trillion cells, of many different types. Recent advances in experimental biology have made it possible to explore the genetic material of single cells. With the birth of this new field of single-cell genomics, scientists can now probe the DNA and RNA of individual cells in the … Continued
As the scale of available data continues to grow, so does the need for scalable and intelligent data processing systems to swiftly harness useful knowledge. Especially in high-stakes domains such as life sciences and finance, alongside scalability, transparency of data-driven processes becomes paramount to ensure the utmost trustworthiness. Started by scientists coming from the Knowledge … Continued
Numba is the Just-in-time compiler used in RAPIDS cuDF to implement high-performance User-Defined Functions (UDFs) by turning user-supplied Python functions into CUDA kernels – but how does it go from Python code to CUDA kernel? In this post we’ll take a look at Numba’s compilation pipeline.
Performing an exhaustive exact k-nearest neighbor (kNN) search, also known as brute-force search, is expensive, and it doesn’t scale particularly well to larger datasets. During vector search, brute-force search requires the distance to be calculated between every query vector and database vector. For the frequently used Euclidean and cosine distances, the computation task becomes equivalent … Continued
At GTC Europe in Munich Germany, NVIDIA announced RAPIDS, a suite of open-source software libraries for executing end-to-end data science and analytics pipelines entirely on GPUs. RAPIDS aims to accelerate the entire data science pipeline including data loading, ETL, model training, and inference. This will enable more productive, interactive, and exploratory workflows. The RAPIDS libraries … Continued
Machine learning (ML) data is big and messy. Organizations have increasingly adopted RAPIDS and cuML to help their teams run experiments faster and achieve better model performance on larger datasets. That, in turn, accelerates the training of ML models using GPUs. With RAPIDS, data scientists can now train models 100X faster and more frequently. Like … Continued
Given the parallel nature of many data processing tasks, it’s only natural that the massively parallel architecture of a GPU should be able to parallelize and accelerate Apache Spark data processing queries, in the same way that a GPU accelerates deep learning (DL) in artificial intelligence (AI). NVIDIA has worked with the Apache Spark community … Continued