1:30 p.m.–3 p.m.
Anyone doing numerical computing with Python will have run into performance barriers. Using Anaconda is a great start to get a suite of extension packages where the underlying data structures and algorithms are written in C or Fortan. We'll briefly review the state of numerical computing in Python, look at some examples to help you remember why you should use NumPy based packages whenever possible, and focus on two options for acceleration: faster serial computing or parallelization. Continuum Analytics has developed two popular open source packages to address these issues: Numba, which provides an LLVM-based JIT that can be easily accessed just through a decorator; and Dask, which provides a distributed computing framework and some high quality data structures that are similar to a Pandas DataFrame or a NumPy NDarray. Participants should have the latest release of Anaconda installed and have some familiarity with Python in order to follow along interactively with the tutorial where we'll learn how to efficiently leverage Dask and Numba.
11 a.m.–12:30 p.m.
Anaconda provides a rich foundation of Python and R packages for data science. This tutorial will demonstrate how Anaconda can be used to turn simple models, scripts, or Jupyter notebooks into deployable applications. Participants should have Anaconda installed and have basic Python programming experience. We'll make use of machine learning and AI libraries such as Pandas, Scikit-learn, Tensorflow, and Keras. The tutorial will also demonstrate the app deployment capabilities of Anaconda Cloud.
Bijan Vakili, Joseph Leingang, Nicole Zuckerman, Vincent La
1:30 p.m.–3 p.m.
This workshop will give you an introduction to how we use python for testing, analysis, and processing at Clover. This includes a walkthrough of our tech stack along with a dive into two use cases.
The first use case is from a Data Science perspective which will go over how we test SQL queries in our data pipeline. This will get into an example of statistical modeling in a particular insurance operations context. The second use case is from a Engineering perspective which will show how we transform nested JSON structures into consumable flat table structures. This will also touch on techniques for processing large amounts of data.
Clover uses lots of python tools and libraries which we're happy to discuss. We rely heavily on Postgres as our primary database solution. However, this talk will highlight SQLAlchemy, Jupyter Notebook, pytest, generators, partial functions, and LRU caching.
1. Speaker Bios
2. Description of Clover as a company
3. Brief Outline of what we're going to talk about
4. General Tech Stack (how Clover uses python)
5. Use Case #1: Testing and Insurance Operations [presented by Vincent La]
6. Use Case #2: Transforming JSON in ETL processing [presented by Bijan Vakili]
Bijan Vakili is a Senior Software Engineer at Clover Health where he is building applications, improving infrastructure and mentoring developers. Prior to joining Clover, Bijan worked in currency and derivative trading, gaming, and network applications, and disaster recovery. He has worked in multiple roles including software developer, team lead, and project manager. Bijan holds a Bachelor’s degree in Software Engineering & Human Biology from the University of Toronto and a MBA from University of Toronto - Rotman School of Management.
Joey Leingang is an Engineering Manager at Clover Health where he focuses on engineering team leadership, scalable development, systems management. Joey has 14 years of development experience, including 5 years of engineering management, and has held technical roles at companies including: Sirono, Arizona Public Media, and the University of Arizona.
Nicole Zuckerman is a software engineer at Clover Health, where she writes the endpoints and data pipelines to help surface better health care for members. She's also deeply invested in effectively on-boarding entry-level engineers, and improving diversity and inclusion in tech. In her free time, Nicole is an avid dancer and teacher, sci-fi book fanatic, soul and jazz aficionado, and cheese lover. She holds an MA in English Literature and Women's Studies from the University of Liverpool.
Vincent La is a Data Scientist at Clover Health where he uses analytics to empower business decisions throughout the company. His focus areas include pricing of insurance products, measuring compliance of claims operations, estimating clinical impact of post discharge programs, and provider outlier detection. Prior to Clover he held positions at the Federal Reserve Board, Dartmouth College Department of Economics, New York Life Insurance Company. Vincent earned his B.A. Economics and Mathematics, Magna Cum Laude, from Dartmouth College, and a Data Mining and Applications Graduate Certificate from Stanford University. He is currently working to complete his Master’s Degree, Computer Science from Georgia Institute of Technology.
11 a.m.–12:30 p.m.
Tracing is a specialized form of logging that is designed to work effectively in large, distributed environments. When done right, tracing follows the path of a request across process and service boundaries. This provides a big step-up in application observability, and can help inform a developer why certain requests are slow, or why they might have behaved unexpectedly. This tutorial will familiarize users with the benefits of tracing, and describe a general toolkit for emitting traces from Python applications in a minimally intrusive way. We will walk through a simple example app, which receives an HTTP request, and gradually instrument it to be observable via traces. We will discuss language constructs that can generate traces - namely decorators, monkey-patching and context managers - and give users hints on how they might add tracing to their own applications and libraries. In the process users will become familiar with the existing standards for modelling traces, and some of the challenges involved in adhering to this model in a distributed, asynchronous environment.
Honza Král, Glen Smith
1:30 p.m.–3 p.m.
Elasticsearch, a distributed, RESTful search and analytics engine, has wide variety of capabilities that can be used from Python. In this workshop we will explore several different use cases and showcase how the associated Python libraries can be used to help you. It is intended for intermediate users who have basic familiarity with Elasticsearch and want to further their understanding.
Some of the topics that will be covered are:
* bulk loading data into Elasticsearch
* how to efficiently building queries and aggregations
* using Elasticsearch for persistence in your application
* syncing data in Elasticsearch and other data stores
Attendees will leave the workshop with the skills to integrate Elasticsearch into their apps and additional tools for analyzing data with Python.
Jasmine Hsu, David Bieber, wesley chun
3:30 p.m.–5 p.m.
Fire, bullets, and productivity with Python? YES! Continuing with tradition, our sponsor workshop session features three half-hour tech talks at they relate to Python & Google. This year, hear about three of our open source libraries that can take your app and productivity to the next level! Whether it’s simplifying creation of command-line interfaces, using physics simulation for machine learning & robotics, or developing automation tools to make you even more productive, this sponsor session is just for you!
* Talk #1. Python Fire: Automatically generating command-line interfaces in Python
Speaker: David Bieber
Abstract: Python Fire is a new open source library that will take any Python component—an object, a class, a module, a function, etc. (anything at all!)—and generate a command-line interface from it automatically. It is both a simple way to create a CLI in Python, as well as a helpful tool for developing, debugging, and exploring Python code. We will show how you can use Python Fire for your own benefit, and discuss how this powerful tool saves time and enables new workflows.
* Talk #2. PyBullet: Physics Simulation for Robotics and Machine Learning
Speaker: Jasmine Hsu
Abstract: The Brain Robotics team at Google aims to improve robotics via machine learning. We foster close collaborations between machine learning researchers, engineers, and roboticists to enable learning at scale on real and simulated robotic systems. The Bullet Physics engine is a heavyweight open-source physics SDK that provides real-time collision detection and multi-physics simulation for VR, games, visual effects, robotics, and machine learning. In order make our physics simulation easier to use, we created Python bindings (PyBullet) to allow our team to quickly and efficiently scale research initiatives. Come hear our talk to learn how you can now easily use Python to apply deep learning approaches to robotics!
* Talk #3. Exploring Google APIs with Python
Speaker: Wesley Chun
Abstract: Ever wanted to integrate Google technologies into the web+mobile apps that you build? Did you know Google has various open source libraries that help you do exactly that (regardless of what your favorite development language is)? Users who may have tried & failed, run into roadblocks, been confused about using our APIs, or have had auth issues are welcome to come and make these non-issues moving forward. Finally, if the wireless is working, we’ll attempt live demos using multiple Google APIs (mainly from G Suite, i.e., Google Drive, Sheets, Calendar, Sheets, Slides, and more) to give you an idea of what it's all about. These sample scripts should get you kickstarted into building your own automation apps in Python!
David Liu, Peter Wang
9 a.m.–10:30 a.m.
Python's popularity has given way to its use in many areas--from web frameworks all the way to machine learning and scientific computing. However, getting the best performance from Python requires an intimate knowledge of the right tools and techniques that are available today. In this tutorial, participants will learn how to measure, tune and accelerate Python workflows across various domains.
This tutorial will cover the following topics:
-Performance speedups for scientific computing using Intel® Distribution for Python, multithreading with Intel® Threading Building Blocks library, Numba, and Intel® VTune Amplifier
-Data Analytics and machine learning acceleration with pyDAAL
-Web framework, scripting, and infrastructure acceleration using the PyPy JIT
This tutorial is geared towards general users of Python who are wanting better performance with their code and workflows, or require time-bound results from their Python application. A general understanding of Python is recommended; the understanding of the above frameworks and tools is not required and will be introduced during the tutorial. No equipment is required for this tutorial.
Attendees who complete the tutorial should be able to understand how to take current code choose the correct frameworks and tools to get the most of their current code and workflows. The attendees will also get a general understanding of how to diagnose performance issues, and mitigate performance bottlenecks as they arise.
-The Intel Distribution for Python (30 min)
-Intro and overview of tools/frameworks(5min)
-Intel Distribution for Python and the speedups possible from TBB, Numba, pyDAAL (20 min)
-An intro to performance measuring tools: Vtune Amplifier (5 min)
-Machine Learning acceleration with PyDAAL (30 min)
-An introduction to PyDAAL (5 min)
-Data management-batch and out of core (5 min)
-Machine learning and acceleration techniques (5 min)
-Algorithm accelerations and examples (Regression, PCA, SVM) (15 min)
-PyPy JIT (30 min)
-Overview of PyPy and how it works (10 min)
-Examples of PyPy and Django (10 min)
-Openstack acceleration (10 min)
Jiao Wang, DingDing
1:30 p.m.–3 p.m.
We have seen trends that the data science and big data community begin to engage further with artificial intelligence and deep learning technologies, and efforts to bridge the gap between the deep learning communities and data science / big data communities begin to emerge. However, developing deep neural nets is an intricate procedure, and scaling that to big data scale is an even more challenging process. Therefore, deep learning tools and frameworks, especially visualization support, that can run smoothly on top of big data platforms are essential for scientists to understand, inspect and manipulate their big models and big data. In this talk, we will share how we bring deep learning to the fingertips of big data users and data scientists, by providing visualizations (through widely used frameworks such as Jupyter Notebooks and/or Tensorboard) as well as Python toolkits (e.g., Numpy, Scipy, Scikit-learn, NLTK, Kesra, etc.) on top of BigDL, an open source distributed deep learning library for Apache Spark. In addition, we will also share how real-world big data users and data scientists use these tools to build AI-powered big data analytics applications.
3:30 p.m.–5 p.m.
The recent advances in machine learning and artificial intelligence are amazing! Yet, in order to have real value within a company, data scientists must be able to get their models off of their laptops and deployed within a company’s data pipelines and infrastructure. Those models must also scale to production size data. In this workshop, we will implement a deep learning model locally using Nervana Neon. We will then take that model and deploy both it's training and inference in a scalable manner to a production cluster with Pachyderm. We will also learn how to update the production model online, optimize Python for our tasks, track changes in our model and data, and explore our results.
Daniel Whitenack (@dwhitena) is a Ph.D. trained data scientist working with Pachyderm (@pachydermIO). Daniel develops innovative, distributed data pipelines which include predictive models, data visualizations, statistical analyses, and more. He has spoken at conferences around the world (Datapalooza, DevFest Siberia, GopherCon, and more), teaches data science/engineering with Ardan Labs (@ardanlabs), maintains the Go kernel for Jupyter, and is actively helping to organize contributions to various open source data science projects."
3:30 p.m.–5 p.m.
This session will cover NLP and text mining using Python and offer several examples of real world applications. Participants will be introduced to various text processing techniques and learn about text classification, clustering, and topic modeling. By the end of the workshop, participants will be able to use Python to explore and build their own models on text data.
Michael Galvin is the Executive Director of Data Science at Metis. He came to Metis from General Electric where he worked to establish their data science strategy and capabilities for field services and to build solutions supporting Global operations, risk, engineering, sales, and marketing. Prior to GE, Michael spent several years as a data scientist working on problems in credit modeling at Kabbage and corporate travel and procurement at TRX. Michael holds a Bachelor's degree in Mathematics and a Master's degree in Computational Science and Engineering from the Georgia Institute of Technology where he also spent 3 years working on machine learning research problems related to computational biology and bioinformatics. Additionally, Michael spent 12 years in the United States Marine Corps where he held various leadership roles within aviation, logistics, and training units.
11 a.m.–12:30 p.m.
This will be a hands on workshop where you will get to experience yourself how easy it is to deploy a Python web application to OpenShift.
The latest version of OpenShift is implemented on top of Kubernetes for container orchestration and Docker for the container runtime. On top of these tools OpenShift adds its own special magic sauce to even further simplify the deployment of applications.
In the workshop you will learn how to deploy a Python web application direct from a Git repository holding the application source code, with the build process being handled by the Source-to-Image (S2I) tool. Next you will deploy a database from a pre-existing Docker-formatted container image and learn how to hook your Python web application up to it. Finally you will configure a Git repository webhook to automate the deployment process so that every time you commit and push up changes your application will be automatically rebuilt and deployed.
During the workshop we will be throwing in various other tidbits to help explain what OpenShift is, how it works and how it can help you to host not only your Python web site, but also more complex applications, be they legacy systems, or new micro service architecture applications, in any language.
For the workshop, you will be provided access to an online instance of OpenShift Origin with everything you need. The only piece of software you will need to install locally on your own computer will be a single program binary for our command line client for OpenShift."
# Speaker Bio:
Graham is the author of mod_wsgi, a popular module for hosting Python web applications with the Apache HTTPD web server. He has a keen interest in Docker and Platform as a Service (PaaS) technologies. He is currently a developer advocate for OpenShift at Red Hat.
Kieran Hervold, Dominique Toppani, Emily Leproust, Mesut Arik
11 a.m.–12:30 p.m.
Quick Introduction about how Twist Bioscience is enabling world changing innovation like enabling DNA computing and turning sugar into vaccines.
Rust Bindings with Python
We will demonstrate how we combine the best of Rust and Python by creating a Rust library with Python bindings
Hardware Scripting with Embedded IronPython
We’ll demonstrate how we use the flexibility of IronPython runtime to make our hardware scriptable and enable debugging within the same process. This will require a Windows machine or VM image with visual studio installed (free community install)
Polymorphic Marshmallowed JSON with SQLAlchemy
This exercise will guide you through creating polymorphic objects that are transparently serialized in and out of Postgres JSON fields. We will also work on creating graph objects with mutability tracking
About Twist - Twist Bioscience is a disruptive synthetic biology startup that has already collected big name customers. We are building the most powerful online platform backed by our unique high throughput DNA synthesis technology. We want to make ordering DNA as easy as ordering laundry detergent online with the smoothest end-to-end user experience.