Sponsor Tutorials

Clover Health: Transforming and Analyzing Healthcare Data with Python

To Be Announced
Wednesday 1:30 p.m.–3 p.m. in B118-B119

This workshop will give you an introduction to how we use python for testing, analysis, and processing at Clover. This includes a walkthrough of our tech stack along with a dive into two use cases. The first use case is from a Data Science perspective which will go over how we test SQL queries in our data pipeline. This will get into an example of statistical modeling in a particular insurance operations context. The second use case is from a Engineering perspective which will show how we transform nested JSON structures into consumable flat table structures. This will also touch on techniques for processing large amounts of data. Clover uses lots of python tools and libraries which we're happy to discuss. We rely heavily on Postgres as our primary database solution. However, this talk will highlight SQLAlchemy, Jupyter Notebook, pytest, generators, partial functions, and LRU caching. ## Outline/Agenda 1. Speaker Bios 2. Description of Clover as a company 3. Brief Outline of what we're going to talk about 4. General Tech Stack (how Clover uses python) 5. Use Case #1: Testing and Insurance Operations [presented by Vincent La] 6. Use Case #2: Transforming JSON in ETL processing [presented by Bijan Vakili] 7. Q/A

Datadog: Distributed Tracing for Python

To Be Announced
Thursday 11 a.m.–12:30 p.m. in B110-111

Tracing is a specialized form of logging that is designed to work effectively in large, distributed environments. When done right, tracing follows the path of a request across process and service boundaries. This provides a big step-up in application observability, and can help inform a developer why certain requests are slow, or why they might have behaved unexpectedly. This tutorial will familiarize users with the benefits of tracing, and describe a general toolkit for emitting traces from Python applications in a minimally intrusive way. We will walk through a simple example app, which receives an HTTP request, and gradually instrument it to be observable via traces. We will discuss language constructs that can generate traces - namely decorators, monkey-patching and context managers - and give users hints on how they might add tracing to their own applications and libraries. In the process users will become familiar with the existing standards for modelling traces, and some of the challenges involved in adhering to this model in a distributed, asynchronous environment.


To Be Announced
Thursday 3:30 p.m.–5 p.m. in B110-111

Intel: Accelerating Python across the range of applications: the right tools for the job

David Liu
Thursday 9 a.m.–10:30 a.m. in B110-111

Python's popularity has given way to its use in many areas--from web frameworks all the way to machine learning and scientific computing. However, getting the best performance from Python requires an intimate knowledge of the right tools and techniques that are available today. In this tutorial, participants will learn how to measure, tune and accelerate Python workflows across various domains. This tutorial will cover the following topics: -Performance speedups for scientific computing using Intel® Distribution for Python, multithreading with Intel® Threading Building Blocks library, Numba, and Intel® VTune Amplifier -Data Analytics and machine learning acceleration with pyDAAL -Web framework, scripting, and infrastructure acceleration using the PyPy JIT ## Audience This tutorial is geared towards general users of Python who are wanting better performance with their code and workflows, or require time-bound results from their Python application. A general understanding of Python is recommended; the understanding of the above frameworks and tools is not required and will be introduced during the tutorial. No equipment is required for this tutorial. Attendees who complete the tutorial should be able to understand how to take current code choose the correct frameworks and tools to get the most of their current code and workflows. The attendees will also get a general understanding of how to diagnose performance issues, and mitigate performance bottlenecks as they arise. ## Outline -The Intel Distribution for Python (30 min) -Intro and overview of tools/frameworks(5min) -Intel Distribution for Python and the speedups possible from TBB, Numba, pyDAAL (20 min) -An intro to performance measuring tools: Vtune Amplifier (5 min) -Machine Learning acceleration with PyDAAL (30 min) -An introduction to PyDAAL (5 min) -Data management-batch and out of core (5 min) -Machine learning and acceleration techniques (5 min) -Algorithm accelerations and examples (Regression, PCA, SVM) (15 min) -PyPy JIT (30 min) -Overview of PyPy and how it works (10 min) -Examples of PyPy and Django (10 min) -Openstack acceleration (10 min)

Intel: Bring deep learning to the fingertips of data scientists with Python & BigDL on Apache Spark

To Be Announced
Wednesday 1:30 p.m.–3 p.m. in B110-111

We have seen trends that the data science and big data community begin to engage further with artificial intelligence and deep learning technologies, and efforts to bridge the gap between the deep learning communities and data science / big data communities begin to emerge. However, developing deep neural nets is an intricate procedure, and scaling that to big data scale is an even more challenging process. Therefore, deep learning tools and frameworks, especially visualization support, that can run smoothly on top of big data platforms are essential for scientists to understand, inspect and manipulate their big models and big data. In this talk, we will share how we bring deep learning to the fingertips of big data users and data scientists, by providing visualizations (through widely used frameworks such as Jupyter Notebooks and/or Tensorboard) as well as Python toolkits (e.g., Numpy, Scipy, Scikit-learn, NLTK, Kesra, etc.) on top of BigDL, an open source distributed deep learning library for Apache Spark. In addition, we will also share how real-world big data users and data scientists use these tools to build AI-powered big data analytics applications.

Intel: Scalable, distributed deep learning with Python and Pachyderm

To Be Announced
Wednesday 3:30 p.m.–5 p.m. in B110-111

The recent advances in machine learning and artificial intelligence are amazing! Yet, in order to have real value within a company, data scientists must be able to get their models off of their laptops and deployed within a company’s data pipelines and infrastructure. Those models must also scale to production size data. In this workshop, we will implement a deep learning model locally using Nervana Neon. We will then take that model and deploy both it's training and inference in a scalable manner to a production cluster with Pachyderm. We will also learn how to update the production model online, optimize Python for our tasks, track changes in our model and data, and explore our results. ## Speaker Daniel Whitenack (@dwhitena) is a Ph.D. trained data scientist working with Pachyderm (@pachydermIO). Daniel develops innovative, distributed data pipelines which include predictive models, data visualizations, statistical analyses, and more. He has spoken at conferences around the world (Datapalooza, DevFest Siberia, GopherCon, and more), teaches data science/engineering with Ardan Labs (@ardanlabs), maintains the Go kernel for Jupyter, and is actively helping to organize contributions to various open source data science projects."

Red Hat: Deploying Python web applications to OpenShift/Kubernetes

Graham Dumpleton
Wednesday 11 a.m.–12:30 p.m. in B110-111

This will be a hands on workshop where you will get to experience yourself how easy it is to deploy a Python web application to OpenShift. The latest version of OpenShift is implemented on top of Kubernetes for container orchestration and Docker for the container runtime. On top of these tools OpenShift adds its own special magic sauce to even further simplify the deployment of applications. In the workshop you will learn how to deploy a Python web application direct from a Git repository holding the application source code, with the build process being handled by the Source-to-Image (S2I) tool. Next you will deploy a database from a pre-existing Docker-formatted container image and learn how to hook your Python web application up to it. Finally you will configure a Git repository webhook to automate the deployment process so that every time you commit and push up changes your application will be automatically rebuilt and deployed. During the workshop we will be throwing in various other tidbits to help explain what OpenShift is, how it works and how it can help you to host not only your Python web site, but also more complex applications, be they legacy systems, or new micro service architecture applications, in any language. For the workshop, you will be provided access to an online instance of OpenShift Origin with everything you need. The only piece of software you will need to install locally on your own computer will be a single program binary for our command line client for OpenShift." # Speaker Bio: Graham is the author of mod_wsgi, a popular module for hosting Python web applications with the Apache HTTPD web server. He has a keen interest in Docker and Platform as a Service (PaaS) technologies. He is currently a developer advocate for OpenShift at Red Hat.