This page is the syllabus for the NVIDIA Deep Learning Institue (DLI) Accelerated Data Science Teaching Kit outlining each module's organization in the downloaded Teaching Kit .zip file. It shows the content for every module as well as a link to the suggested online DLI course for each module where applicable. You will also find links to the lecture videos available. More will become available in future releases of the kit.


Module 1: Introduction to Data Science

Lecture Slides

  • 1.1 - Teaching Kit Modules Overview
  • 1.2 - What is Data Science?
  • 1.3 - Why is Data Science Important?
  • 1.4 - Learning Goals and Expectations
  • 1.5 - Analytics Building Blocks
  • 1.6 - Example Data Science Project 1: Apolo Graph Exploration
  • 1.7 - Example Data Science Project 2: NetProbe Auction Fraud Detection
  • 1.8 - Data Science Buzzwords, Hype Cycle, General vs Narrow AI
  • 1.9 - Career Paths and Challenges
  • 1.10 - Diversity Gaps in Science and Engineering
  • 1.11 - Hidden Figures in Data Science From Underrepresented Groups

Lecture Videos

Labs

  • Introduction to RAPIDS and cuDF

Quiz

  • Module 1 Quiz

Module 2: Data Collection

Lecture Slides

  • 2.1 - Collecting Data
  • 2.2 - Scraping Data
  • 2.3 - Popular Scraping Libraries
  • 2.4 - Data Annotation and Data Quality
  • 2.5 - SQLite as Simple, Effective Storage
  • 2.6 - SQL Refresher
  • 2.7 - Beware of Missing Indexes

Lecture Videos

Labs

  • Data Collection via API
  • Data Annotation in Active Learning
  • GPU-accelerated SQL with BlazingSQL

DLI Online Course Section

Quiz

  • Module 2 Quiz

Module 3: Data Pre-processing (ETL)

Lecture Slides

  • 3.1 - Introduction to Data Pre-processing
  • 3.2 - Data Cleaning and Statistical Preprocessing
  • 3.3 - Data Cleaners: OpenRefine and Wrangler
  • 3.4 - Feature Selection: Introduction to Filter Methods
  • 3.5 - Feature Selection: Introduction to Model-based Methods
  • 3.6 - Feature Reduction: PCA

Lecture Videos

Labs

  • Data Wrangling with OpenRefine
  • Outlier Detection with IQR
  • Feature Reduction with PCA

Quiz

  • Module 3 Quiz

Module 4: Data Ethics and Reducing Bias in Data Sets

Lecture Slides

  • 4.1 - Sources of Bias and Fairness Measures
  • 4.2 - Tools for Discovering and Interpreting Bias in Models
  • 4.3 - Challenges Faced by Underrepresented Groups

Lecture Videos

Labs

  • Classifier Audit with FairVis

Quiz

  • Module 4 Quiz

Module 5: Data Integration

Lecture Slides

  • 5.1 - Knowledge Graph
  • 5.2 - Data De-duplication

Lecture Videos

  • Available in a future release of the Teaching Kit

Quiz

  • Module 5 Quiz

Module 6: Data Analytics, Concepts and Tasks

Lecture Slides

  • 6.1 - Break Complex Problems into Simpler Ones: Part 1
  • 6.2 - Break Complex Problems into simpler Ones: Part 2

Lecture Videos

  • Available in a future release of the Teaching Kit

Quiz

  • Module 6 Quiz

Module 7: Visualization 101

Lecture Slides

  • 7.1 - What is Info Vis and Why it is Important
  • 7.2 - Human Perception
  • 7.3 - Gestalt Psychology
  • 7.4 - Chart Basics
  • 7.5 - Colors
  • 7.6 - Visual Exploratory Data Analytics with cuXFilter

Lecture Videos

  • Available in a future release of the Teaching Kit

Labs

  • Creating Visualizations

Quiz

  • Module 7 Quiz

Module 8: Fixing Common Visualization Issues

Lecture Slides

  • 8.1 - Fixing Bar Charts, Line Charts, Tables and More
  • 8.2 - Applying What You’ve Learned
  • 8.3 - Crown Jewel, Self-contained Figures and More Tips

Lecture Videos

  • Available in a future release of the Teaching Kit

Quiz

  • Module 8 Quiz

Module 9: Data Visualization for Web (D3)

Lecture Slides

  • 9.1 - Why Learn D3?
  • 9.2 - Prerequisites: Javascript and SVG
  • 9.3 - D3 Overview
  • 9.4 - Enter-Update-Exit
  • 9.5 - Attributes, Styles, Classes and Text
  • 9.6 - Scales and Axes
  • 9.7 - Dynamic Data and Interaction

Lecture Videos

  • Available in a future release of the Teaching Kit

Labs

  • Web-based Visualization (D3)
  • Server and Client-side Visualizations (Datashader, Plotly, Plotly Dash)

Quiz

  • Module 9 Quiz

Module 10: Scalable Computing (Hadoop, Hive)

Lecture Slides

  • 10.1 - Big Data is Common. How to Store It?
  • 10.2 - Why Hadoop?
  • 10.3 - MapReduce Overview
  • 10.4 - Example MapReduce Program
  • 10.5 - How to Try Hadoop
  • 10.6 - Pig and Hive

Lecture Videos

  • Available in a future release of the Teaching Kit

Labs

  • Hadoop

Quiz

  • Module 10 Quiz

Module 11: Scalable Computing (Spark)

Lecture Slides

  • 11.1 - Spark Overview
  • 11.2 - Example Spark Programs
  • 11.3 - Spark SQL and Other Spark Libraries
  • 11.4 - RAPIDS and Spark

Lecture Videos

  • Available in a future release of the Teaching Kit

Labs

  • Accelerated Spark with RAPIDS on AWS

Quiz

  • Module 11 Quiz

Module 12: Scalable Computing (HBase)

Lecture Slides

  • 12.1 - HBase Overview
  • 12.2 - How HBase Scales Up Storage
  • 12.3 - How to Use HBase
  • 12.4 - Learn More About HBase

Lecture Videos

  • Available in a future release of the Teaching Kit

Quiz

  • Module 12 Quiz

Module 13: Scalable Computing (Dask and UCX)

Lecture Slides

  • 13.1 - Using Dask and UCX with RAPIDS and BlazingSQL

Lecture Videos

  • Available in a future release of the Teaching Kit

Quiz

  • Module 13 Quiz

Module 14: Machine Learning (Classification)

Lecture Slides

  • 14.1 - Overview
  • 14.2 - Introduction to Supervised Learning
  • 14.3 - Linear Model
  • 14.4 - RAPIDS Acceleration: Linear Regression
  • 14.5 - Overfitting and Cross Validation
  • 14.6 - Decision Tree
  • 14.7 - Visualizing Classification: ROC, AUC, Confusion Matrix
  • 14.8 - Bagging
  • 14.9 - Random Forest
  • 14.10 - RAPIDS Acceleration: Random Forest
  • 14.11 - Boosting
  • 14.12 - XGBoost with RAPIDS
  • 14.13 - k-NN with RAPIDS

Lecture Videos

Labs

  • Decision Tree Classification Clustering
  • Classification (Random Forest)
  • Image Classification with RAPIDS-based Random Forest

DLI Online Course Section

Quiz

  • Module 14 Quiz

Module 15: Machine Learning (Clustering and Dimensionality Reduction)

Lecture Slides

  • 15.1 - Introduction to Unsupervised Learning
  • 15.2 - KMeans and Hierarchical Clustering
  • 15.3 - RAPIDS Acceleration: KMeans
  • 15.4 - DBSCAN
  • 15.6 - t-SNE
  • 15.7 - UMAP
  • 15.8 - Visualizing Clusters
  • 15.9 - RAPIDS Acceleration: PCA, UMAP, DBSCAN

Lecture Videos

Labs

  • KMeans Clustering
  • Dimensionality Reduction and Visualization

Quiz

  • Module 15 Quiz

Module 16: Neural Networks

Lecture Slides

  • 16.1 - Introduction to Artificial Neural Networks
  • 16.2 - Activation Function and Perceptron
  • 16.3 - Multilayer Perceptron
  • 16.4 - Advanced Neural Networks

Lecture Videos

Labs

  • Binary Classification with Perceptron

DLI Online Courses with Student Certificate Opportunity

Other Shorter DLI Online Courses

Quiz

  • Module 16 Quiz

Module 17: Graph Analytics

Lecture Slides

  • 17.1 - How to Represent and Store Graphs
  • 17.2 - Graph Power Laws
  • 17.3 - Centralities: Degree, Betweenness, Clustering Coefficient
  • 17.4 - PageRank and Personalized PageRank
  • 17.5 - Interactive Graph Exploration

Lecture Videos

  • Available in a future release of the Teaching Kit

Labs

  • Graph Analytics with cuGraph

Quiz

  • Module 17 Quiz

Module 18: Streaming Data

Lecture Slides

  • 18.1 - Machine Learning for Streaming Data Analysis
  • 18.2 - Data Preprocessing
  • 18.3 - Learning Process
  • 18.4 - Reasoning and Data Resource

Lecture Videos

Labs

  • Sales Forecasting via RAPIDS Linear Regression

Quiz

  • Module 18 Quiz

Module 19: Genomics

Lecture Slides

  • 19.1 - Introduction to Genomics
  • 19.2 - Data Preprocessing
  • 19.3 - Clustering and Validation
  • 19.4 - Statistical Analysis

Lecture Videos

Labs

  • Cancer Recognition on Genomics Data via Decision Tree Algorithm

Quiz

  • Module 19 Quiz

Module 20: Text Analytics

Lecture Slides

  • 20.1 - Basics: Preprocessing, Representation, Word Importance
  • 20.2 - Latent Semantic Indexing (Singular Value Decomposition)
  • 20.3 - SVD: Dimensionality Reduction, and Other Uses
  • 20.4 - Text Visualization

Lecture Videos

  • Available in a future release of the Teaching Kit

Labs

  • Latent Semantic Indexing for Text via Singular Value Decomposition (cuML)

Quiz

  • Module 20 Quiz

Module 21: CPU vs. GPU-accelerated Data Science

Lecture Slides

  • 21.1 - RAPIDS Benefits
  • 21.2 - Refactoring Workloads

Lecture Videos

  • Available in a future release of the Teaching Kit

Labs

  • Accelerating Workloads Using RAPIDS

DLI Online Course Selection with Student Certificate Opportunity

Quiz

  • Module 21 Quiz

Module 22: Working in Data Science Teams

Lecture Slides

  • 22.1 - Forming Great Teams
  • 22.2 - Project Idea Checklist: Heilmeier Questions
  • 22.3 - Pay Attention to Software Licenses Early On

Lecture Videos

  • Available in a future release of the Teaching Kit

Quiz

  • Module 22 Quiz

Module 23: Code Backup and Version Control

Lecture Slides

  • 23.1 - Git: Overview and Benefits
  • 23.2 - Warning! Keep Your Repository Private Initially
  • 23.3 - GitHub and Bitbucket

Lecture Videos

  • Available in a future release of the Teaching Kit

Quiz

  • Module 23 Quiz

Module 24: Team Project (Fake News Detection)

Lecture Slides

  • 24.1 - Introduction to Team Project
  • 24.2 - Evaluation of Team Project

Lecture Videos

Team Project

  • Fake News Detection (CuML)