Data science development faces many challenges in the areas of:
- Exploration and model development
- Training and evaluation
- Model scoring and inference
Some estimates point to 70%-90% of the time is spent on experimentation – much of which will run fast and efficiently on GPU-enabled mobile and desktop workstations. Running on a Linux mobile workstation, for example, presents another set of challenges – including installing and configuring a data science stack, stack updates, driver installation and updates, support for needed Office productivity apps, and no easy or intuitive way to access helpful tools and software to accelerate development.
New Data Science Client and WSL2 to the rescue!
In a GTC Live session, Dima Rekesh, Karan Jhavar, and myself will discuss a new Data Science Client (DSC) and support for Windows Subsystem for Linux 2 (WSL2) to address the previously stated challenges. This makes it even more practical to run countless experiments locally before model training at scale, but also removes the complexities of a local data science stack while having compatibility with popular Microsoft Office applications.
When data scientists want or need unlimited experimentation for creativity and better models overall, The NVIDIA DSC is designed to make developers productive faster while providing simple access to common tools and frameworks (e.g. – Jupyter Notebooks, RAPIDS, etc.) to make data science development on workstations easier and more productive.
If you’d like to learn more, we encourage you to register for the NVIDIA GTC Conference and attend the LIVE session:
Note: For those not familiar with the NVIDIA Data Science Stack, it provides you with a complete system for the software you utilize every day. It’s pre-installed and tuned for NVIDIA GPUs. Included on pre-installed Ubuntu 20.04 Linux OS is Python 3.8, pandas, numpy, scipy, numba, scikit-learn, Tensorflow, PyTorch, Keras , RAPIDS (cudf, cuml, cugraph), cupy and many more. There is GPU accelerated python software that speeds up machine learning tasks 10x-30x faster. Examples include common ML algorithms, K-means, logistical and linear regression, KNN, Random Forest Classifier, and XGBoost Classifier using NVIDIA RAPIDS. Cuml is fully GPU accelerated and accepts CSV spreadsheet data or Parquet file formats.
More about the Data Science Client (DSC)
NVIDIA Data Science Client (DSC) is currently a Beta release and runs on your desktop as a status bar icon. It is optimized to use few system resources and monitors and updates itself, your NVIDIA Driver, CUDA SDK (including cuDNN), and all the Data Science Stack software described above. A GA released version of the DSC is expected late 2021.
DSC is a desktop complement of the command line-oriented data science stack. DSC is minimalist and unobtrusive. It is designed to target ease of use and reproducibility. The DSC also provides one-click access to common tools such as VS Code and Spyder, but places emphasis on Jupyter as the main development environment supporting a curated set of dockerized kernels – the majority of which are available as NGC assets.
The DSC also manages the latest set of NVIDIA GPU Cloud (NGC) containers. You can quickly launch NGC containers for RAPIDS/PyTorch/Tensorflow into a locally running Juypter notebook server as a tab in your Chrome browser in milliseconds. DSC and NVIDIA Data Science Stack (DSS) are running the same software you run in a VM in the Cloud. This gives confidence that the python source code developed on your NVIDIA GPU workstation or mobile will run everywhere with predictable results.
Learn more details about the Data Science Client (DSC) and how to download it.
Windows Subsystem for Linux 2 (WSL2) support
This is available now as part of a Public Preview running on pre-released versions of WIN10. Utilizing WSL2 is a technology that allows Windows desktop users to run a Linux OS shell. NVIDIA enabled CUDA to run at full performance in the WSL2 shell. NVIDIA is testing RAPIDS and the entire suite of Data Science Stack software with WSL2.
WSL2 means that my data science Python software including Juypter notebook plus Office Productivity software tools (Excel,Outlook, PowerPoint, etc..) run in a single-booted Windows 10 image. There is no longer a need for dual boot.
Data science workstations in action
NVIDIA knows of many data science workloads that run exceptionally well on mobile workstations built on the NVIDIA Data Science Stack. Some of these environments and workloads will be demonstrated in the following GTC21 sessions:
- Machine Learning with PyTorch Lightning and Grid.ai from Your GPU Enabled Workstation [S32153]
- From Laptops to SuperPODs: Seamless Scale for Model Development [S32160]
- Eliminating Reproducibility and Portability Issues for Data Science Workflows, from Laptop to Cloud and Back [S32169]
- Collaborative Debugging and Visualizing of Machine Learning Models on NVIDIA Workstations [S32156]
We have also seen many new and innovative deep learning workloads such as Heartex Label Studio that run well on mobile workstations.
We encourage you to attend our live GTC session:
See you at GTC21!