DLI Accelerated Data Science Teaching Kit Syllabus
This page is the syllabus for the NVIDIA Deep Learning Institue (DLI) Accelerated Data Science Teaching Kit outlining each module's organization in the downloaded Teaching Kit .zip file. It shows the content for every module as well as a link to the suggested online DLI course for each module where applicable. You will also find links to stream the lecture videos when they become available in future releases of the kit.
Module 1: Introduction to Data Science
Lecture Slides
- 1.1 - Teaching Kit Modules Overview
- 1.2 - What is Data Science?
- 1.3 - Why is Data Science Important?
- 1.4 - Learning Goals and Expectations
- 1.5 - Analytics Building Blocks
- 1.6 - Example Data Science Project 1: Apolo Graph Exploration
- 1.7 - Example Data Science Project 2: NetProbe Auction Fraud Detection
- 1.8 - Data Science Buzzwords, Hype Cycle, General vs Narrow AI
- 1.9 - Career Paths and Challenges
- 1.10 - Diversity Gaps in Science and Engineering
- 1.12 - Hidden Figures in Data Science From Underrepresented Groups
Lecture Videos
- Available in a future release of the Teaching Kit
Labs
- Introduction to RAPIDS and cuDF
Quiz
- Module 1 Quiz
Module 2: Data Collection
Lecture Slides
- 2.1 - Collecting Data
- 2.2 - Scraping Data
- 2.3 - Popular Scraping Libraries
- 2.4 - Data Annotation and Data Quality
- 2.5 - SQLite as Simple, Effective Storage
- 2.6 - SQL Refresher
- 2.7 - Beware of Missing Indexes
Lecture Videos
- Available in a future release of the Teaching Kit
Labs
- Data Collection via API
- Data Annotation in Active Learning
- GPU-accelerated SQL with BlazingSQL
DLI Online Course Section
Quiz
- Module 2 Quiz
Module 3: Data Pre-processing (ETL)
Lecture Slides
- 3.1 - Introduction to Data Pre-processing
- 3.2 - Data Cleaning and Statistical Preprocessing
- 3.3 - Data Cleaners: OpenRefine and Wrangler
- 3.4 - Feature Selection: Introduction to Filter Methods
- 3.5 - Feature Selection: Introduction to Model-based Methods
- 3.6 - Feature Reduction: PCA
Lecture Videos
- Available in a future release of the Teaching Kit
Labs
- Data Wrangling with OpenRefine
- Outlier Detection with IQR
- Feature Reduction with PCA
Quiz
- Module 3 Quiz
Module 4: Data Ethics and Reducing Bias in Data Sets
Lecture Slides
- 4.1 - Sources of Bias and Fairness Measures
- 4.2 - Tools for Discovering and Interpreting Bias in Models
- 4.3 - Challenges Faced by Underrepresented Groups
Lecture Videos
- Available in a future release of the Teaching Kit
Labs
- Classifier Audit with FairVis
Quiz
- Module 4 Quiz
Module 5: Data Integration
Lecture Slides
- 5.1 - Knowledge Graph
- 5.2 - Data De-duplication
Lecture Videos
- Available in a future release of the Teaching Kit
Quiz
- Module 5 Quiz
Module 6: Data Analytics, Concepts and Tasks
Lecture Slides
- 6.1 - Break Complex Problems into Simpler Ones: Part 1
- 6.2 - Break Complex Problems into simpler Ones: Part 2
Lecture Videos
- Available in a future release of the Teaching Kit
Quiz
- Module 6 Quiz
Module 7: Visualization 101
Lecture Slides
- 7.1 - What is Info Vis and Why it is Important
- 7.2 - Human Perception
- 7.3 - Gestalt Psychology
- 7.4 - Chart Basics
- 7.5 - Colors
- 7.6 - Visual Exploratory Data Analytics with cuXFilter
Lecture Videos
- Available in a future release of the Teaching Kit
Labs
- Creating Visualizations
Quiz
- Module 7 Quiz
Module 8: Fixing Common Visualization Issues
Lecture Slides
- 8.1 - Fixing Bar Charts, Line Charts, Tables and More
- 8.2 - Applying What You’ve Learned
- 8.3 - Crown Jewel, Self-contained Figures and More Tips
Lecture Videos
- Available in a future release of the Teaching Kit
Quiz
- Module 8 Quiz
Module 9: Data Visualization for Web (D3)
Lecture Slides
- 9.1 - Why Learn D3?
- 9.2 - Prerequisites: Javascript and SVG
- 9.3 - D3 Overview
- 9.4 - Enter-Update-Exit
- 9.5 - Attributes, Styles, Classes and Text
- 9.6 - Scales and Axes
- 9.7 - Dynamic Data and Interaction
Lecture Videos
- Available in a future release of the Teaching Kit
Labs
- Web-based Visualization (D3)
- Server and Client-side Visualizations (Datashader, Plotly, Plotly Dash)
Quiz
- Module 9 Quiz
Module 10: Scalable Computing (Hadoop, Hive)
Lecture Slides
- 10.1 - Big Data is Common. How to Store It?
- 10.2 - Why Hadoop?
- 10.3 - MapReduce Overview
- 10.4 - Example MapReduce Program
- 10.5 - How to Try Hadoop
- 10.6 - Pig and Hive
Lecture Videos
- Available in a future release of the Teaching Kit
Labs
- Hadoop
Quiz
- Module 10 Quiz
Module 11: Scalable Computing (Spark)
Lecture Slides
- 11.1 - Spark Overview
- 11.2 - Example Spark Programs
- 11.3 - Spark SQL and Other Spark Libraries
- 11.4 - RAPIDS and Spark
Lecture Videos
- Available in a future release of the Teaching Kit
Labs
- Accelerated Spark with RAPIDS on AWS
Quiz
- Module 11 Quiz
Module 12: Scalable Computing (HBase)
Lecture Slides
- 12.1 - HBase Overview
- 12.2 - How HBase Scales Up Storage
- 12.3 - How to Use HBase
- 12.4 - Learn More About HBase
Lecture Videos
- Available in a future release of the Teaching Kit
Quiz
- Module 12 Quiz
Module 13: Scalable Computing (Dask and UCX)
Lecture Slides
- 13.1 - Using Dask and UCX with RAPIDS and BlazingSQL
Lecture Videos
- Available in a future release of the Teaching Kit
Quiz
- Module 13 Quiz
Module 14: Machine Learning (Classification)
Lecture Slides
- 14.1 - Overview
- 14.2 - Introduction to Supervised Learning
- 14.3 - Linear Regression
- 14.4 - RAPIDS Acceleration: Linear Regression
- 14.5 - Overfitting and Cross Validation
- 14.6 - Decision Tree
- 14.7 - Visualizing Classification: ROC, AUC, Confusion Matrix
- 14.8 - Bagging
- 14.9 - Random Forests
- 14.10 - RAPIDS Acceleration: Random Forest
- 14.11 - Boosting
- 14.12 - XGBoost with RAPIDS
- 14.13 - k-NN with RAPIDS
Lecture Videos
- Available in a future release of the Teaching Kit
Labs
- Decision Tree Classification Clustering
- Classification (Random Forest)
- Image Classification with RAPIDS-based Random Forest
DLI Online Course Section
Quiz
- Module 14 Quiz
Module 15: Machine Learning (Clustering and Dimensionality Reduction)
Lecture Slides
- 15.1 - Introduction to Unsupervised Learning
- 15.2 - KMeans and Hierarchical Clustering
- 15.3 - RAPIDS Acceleration: KMeans
- 15.4 - DBSCAN
- 15.6 - t-SNE
- 15.7 - UMAP
- 15.8 - Visualizing Clusters
- 15.9 - RAPIDS Acceleration: PCA, UMAP, DBSCAN
Lecture Videos
- Available in a future release of the Teaching Kit
Labs
- KMeans Clustering
- Dimensionality Reduction and Visualization
Quiz
- Module 15 Quiz
Module 16: Neural Networks
Lecture Slides
- 16.1 - Introduction to Artificial Neural Networks
- 16.2 - Activation Function and Perceptron
- 16.3 - Multilayer Perceptron
- 16.4 - Advanced Deep Neural Networks
Lecture Videos
- Available in a future release of the Teaching Kit
Labs
- Binary Classification with Perceptron
DLI Online Courses with Student Certificate Opportunity
Other Shorter DLI Online Courses
- Deep Learning at Scale with Horovod
- Getting Started with Image Segmentation
- Modeling Time-Series Data with Recurrent Neural Networks in Keras
- Medical Image Classification Using the MedNIST Dataset
- Image Classification with TensorFlow: Radiomics — 1p19q Chromosome Status Classification
Quiz
- Module 16 Quiz
Module 17: Graph Analytics
Lecture Slides
- 17.1 - How to Represent and Store Graphs
- 17.2 - Graph Power Laws
- 17.3 - Centralities: Degree, Betweenness, Clustering Coefficient
- 17.4 - PageRank and Personalized PageRank
- 17.5 - Interactive Graph Exploration
Lecture Videos
- Available in a future release of the Teaching Kit
Labs
- Graph Analytics with cuGraph
Quiz
- Module 17 Quiz
Module 18: Streaming Data
Lecture Slides
- 18.1 - Machine Learning for Streaming Data Analysis
- 18.2 - Data Preprocessing
- 18.3 - Learning Process
- 18.4 - Reasoning and Data Resource
Lecture Videos
- Available in a future release of the Teaching Kit
Labs
- Sales Forecasting via RAPIDS Linear Regression
Quiz
- Module 18 Quiz
Module 19: Genomics
Lecture Slides
- 19.1 - Introduction to Genomics
- 19.2 - Data Preprocessing
- 19.3 - Clustering and Validation
- 19.4 - Statistical Analysis
Lecture Videos
- Available in a future release of the Teaching Kit
Labs
- Cancer Recognition on Genomics Data via Decision Tree Algorithm
Quiz
- Module 19 Quiz
Module 20: Text Analytics
Lecture Slides
- 20.1 - Basics: Preprocessing, Representation, Word Importance
- 20.2 - Latent Semantic Indexing (Singular Value Decomposition)
- 20.3 - SVD: Dimensionality Reduction, and Other Uses
- 20.4 - Text Visualization
Lecture Videos
- Available in a future release of the Teaching Kit
Labs
- Latent Semantic Indexing for Text via Singular Value Decomposition (cuML)
Quiz
- Module 20 Quiz
Module 21: CPU vs. GPU-accelerated Data Science
Lecture Slides
- 21.1 - RAPIDS Benefits
- 21.2 - Refactoring Workloads
Lecture Videos
- Available in a future release of the Teaching Kit
Labs
- Accelerating Workloads Using RAPIDS
DLI Online Course Selection with Student Certificate Opportunity
Quiz
- Module 21 Quiz
Module 22: Working in Data Science Teams
Lecture Slides
- 22.1 - Forming Great Teams
- 22.2 - Project Idea Checklist: Heilmeier Questions
- 22.3 - Pay Attention to Software Licenses Early On
Lecture Videos
- Available in a future release of the Teaching Kit
Quiz
- Module 22 Quiz
Module 23: Code Backup and Version Control
Lecture Slides
- 23.1 - Git: Overview and Benefits
- 23.2 - Warning! Keep Your Repository Private Initially
- 23.3 - GitHub and Bitbucket
Lecture Videos
- Available in a future release of the Teaching Kit
Quiz
- Module 23 Quiz
Module 24: Team Project (Fake News Detection)
Lecture Slides
- 24.1 - Introduction to Project
- 24.2 - Evaluation of Team Project
Lecture Videos
- Available in a future release of the Teaching Kit
Team Project
- Fake News Detection (CuML)