Explore NVIDIA Data Science Tools and Technologies

Click to Enlarge

How Data Science Works

Data science starts with data processing, which involves data ingestion and decompression to collect and unpack data, followed by data cleaning to remove errors and inconsistencies. Next, data analytics is performed to extract insights. Machine learning and deep learning are then used to train models on the processed data, allowing them to learn patterns. These trained models can then make predictions or inferences, providing valuable insights and helping in decision-making processes.

Explore Data Science Tools and Technologies

These tools, and more, are built on CUDA-X™ Data Science, a collection of open-source libraries that accelerate the data science and data processing ecosystem.

Learn More about CUDA-X Data Science

NVIDIA cuDF

NVIDIA cuDF is an open source CUDA-X™ library that accelerates popular libraries like pandas, Polars, and Apache Spark on NVIDIA GPUs — delivering massive speed-ups for DataFrame operations with no code changes required.

Get Started With cuDF

NVIDIA cuML

NVIDIA cuML is an open-source CUDA-X™ Data Science library that accelerates scikit-learn, UMAP, and HDBSCAN on GPUs — supercharging machine learning workflows with no code changes required.

Get Started With cuML

NVIDIA NeMo Curator

NVIDIA NeMo Curator provides pre-built accelerated pipelines to process multimodal data at scale, improving the performance of agentic systems.

Get Started With NeMo Curator

RAPIDS Accelerator for Apache Spark

The RAPIDS Accelerator for Apache Spark combines the power of the RAPIDS cuDF library and the scale of the Spark distributed computing framework, accelerating your existing Apache Spark applications with minimal code changes.

Get Started With RAPIDS for Spark

NVIDIA Morpheus

NVIDIA Morpheus is a GPU-accelerated, end-to-end AI framework enabling developers to create optimized applications for filtering, processing, and classifying large volumes of streaming cybersecurity data.

Get Started With Morpheus

NVIDIA cuOpt

NVIDIA® cuOpt™ is a world-record-setting, GPU-accelerated logistics solver that uses heuristics, metaheuristics, and optimizations to calculate complex vehicle routing problems with a wide range of constraints.

Get Started With cuOpt

NVIDIA NeMo Retriever

NVIDIA NeMo™ Retriever is a collection of generative AI microservices enabling enterprises to seamlessly connect models to diverse business data and deliver highly accurate responses.

Get Started With NeMo Retriever

NVIDIA cuVS

NVIDIA cuVS is an open-source library for GPU-accelerated vector search and data clustering. It enables higher throughput, lower latency, and faster index build times, and improves the efficiency of semantic search within pipelines and applications such as information retrieval or RAG.

Get Started With cuVS

NVIDIA NeMo Data Designer

NVIDIA NeMo™ Data Designer generates high-quality, domain-specific synthetic data from scratch or seed examples—accelerating model development while eliminating privacy risks and data collection bottlenecks.

Get Started With NeMo Data Designer

Data Science