PyCon Pittsburgh. April 15-23, 2020.

Sponsor Workshop: Capital One: Dask and RAPIDS Tutorial

Presented by:

Mike McCarty, TBD

Description

Dask is a flexible tool for parallelizing Python code on a single machine or across a cluster. It builds upon familiar tools in the SciPy ecosystem (e.g. NumPy and Pandas) while allowing them to scale across multiple cores or machines. This tutorial will cover both the high-level use of dask collections, as well as the low-level use of dask graphs and schedulers.

The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science pipelines entirely on GPUs. RAPIDS is incubated by NVIDIA® based on years of accelerated data science experience. RAPIDS relies on NVIDIA CUDA® primitives for low-level compute optimization, GPU parallelism, and high-bandwidth memory speed through user-friendly Python interfaces. This tutorial will teach you how to use the RAPIDS software stack from Python, including cuDF (a DataFrame library interoperable with Pandas), dask-cudf (for distributing DataFrame work over many GPUs), and cuML (a machine learning library that provides GPU-accelerated versions of the algorithms in scikit-learn).