Data Science

Data science is the process of processing and interpreting data to extract insights and inform decision-making, often using machine learning and advanced statistical techniques.

A workflow diagram showing how data science works

Click to Enlarge

How Data Science Works

Data science starts with data processing, which involves data ingestion and decompression to collect and unpack data, followed by data cleaning to remove errors and inconsistencies. Next, data analytics is performed to extract insights. Machine learning and deep learning are then used to train models on the processed data, allowing them to learn patterns. These trained models can then make predictions or inferences, providing valuable insights and helping in decision-making processes.

Explore Data Science Tools and Technologies

NVIDIA RAPIDS

NVIDIA RAPIDS is an open-source suite of GPU-accelerated data science and AI libraries with APIs that match the most popular open-source data tools. It accelerates performance by orders of magnitude at scale across data pipelines.

NVIDIA RAPIDS cuDF

NVIDIA RAPIDS cuDF is a GPU DataFrame library for manipulating tabular data. cuDF also provides a pandas accelerator mode that brings GPU-acceleration to your pandas workflows with zero code changes.

RAPIDS Accelerator for Apache Spark

The RAPIDS Accelerator for Apache Spark combines the power of the RAPIDS cuDF library and the scale of the Spark distributed computing framework, accelerating your existing Apache Spark applications with minimal code changes.

NVIDIA Morpheus

NVIDIA Morpheus is a GPU-accelerated, end-to-end AI framework enabling developers to create optimized applications for filtering, processing, and classifying large volumes of streaming cybersecurity data.

NVIDIA cuOpt

NVIDIA® cuOpt™ is a world-record-setting, GPU-accelerated logistics solver that uses heuristics, metaheuristics, and optimizations to calculate complex vehicle routing problems with a wide range of constraints.

NVIDIA NeMo Retriever

NVIDIA NeMo™ Retriever is a collection of generative AI microservices enabling enterprises to seamlessly connect models to diverse business data and deliver highly accurate responses.

Data Science Learning Library