Tutorials

<center> <h2>Tutorial Registration is now Open!</h2> <a href="/2019/registration/register/" class="btn">Register Now!</a><br> <p>See the schedule in grid form <a href="/2019/schedule/tutorials">here</a></p> </center>

Analyzing Census Data with Pandas

Sergio Sánchez
Wednesday 1:20 p.m.–4:40 p.m. in Room 9

### Census 2020 is coming! Did you know the government budgeted __12.5 billion__ dollars to count __EVERY SINGLE PERSON IN THE COUNTRY__ in 2020? Imagine how much data you could acquire with 12.5 billion dollars. ### Not so excited about Census data? How about cool `pandas` tricks? ![pandas_gif](https://media1.tenor.com/images/60379b3ecd5b8d9d886d90018dba63ab/tenor.gif?itemid=5274556) In this tutorial you will go from a simple data exploration and analysis workflow to learning more advanced techniques social scientists apply when dealing with Census data. If you've been interested in honing your `pandas` skills or you'd just ___love___ to learn how to calculate the demographically-adjusted employment rate gap for your county using `python`, well you've come to the right place. This tutorial is perfect for novice data analysts, pythonistas, social scientists, and journalists that want to learn about the powerful `pandas` library and how to use it to analyze public use micro-data, and for those who've been using it but could learn a trick or two to make their workflow even more effective. Does the acronyms ACS, CPS, PUMA, or IPUMS mean anything to you? If not, the more reason to join! Come learn something new!

Applied Deep Learning for NLP Using PyTorch

Elvis Saravia
Thursday 1:20 p.m.–4:40 p.m. in Room 22

Natural language processing (NLP) has experienced a rapid growth over the last few years and has become an important skill to build applications that range from social features to clinical and health solutions. In this tutorial, we will introduce PyTorch as a tool to build and experiment with various modern NLP techniques by building deep learning architectures based on convolutional neural networks (CNNs), recurrent neural networks (RNNs), and bidirectional long short-term memory networks (biLSTMs). We will cover a wide range of topics with the purpose of providing participants with enough fundamental knowledge and skills to be able to apply modern NLP to real-world problems using PyTorch. Some concepts and topics include but not limited to data loaders, vectorization, computation graphs, sentiment analysis, fine-grained emotion classification, and neural machine translation. Nowadays, it is just not enough to arbitrarily train a model and deploy it for production use without properly debugging it. This tutorial also aims to provide hands-on examples and well-organized exercises that teach students how to properly test, train and evaluate NLP models using best practices. Once models are properly trained and evaluated they will be efficiently transformed, stored, and then restored to obtain inferences from real-world, natural language data.

Bayesian Data Science by Simulation

Eric Ma, Hugo Bowne-Anderson
Wednesday 1:20 p.m.–4:40 p.m. in Room 22

This tutorial is an Introduction to Bayesian data science through the lens of simulation or hacker statistics. We will become familiar with many common probability distributions through i) matching them to real-world stories & ii) simulating them. We will work with joint/conditional probabilities, Bayes Theorem, prior/posterior distributions and likelihoods, while seeing their applications in real-world data analyses. We’ll see the utility of Bayesian inference in parameter estimation and comparing groups and we’ll wrap up with a dive into the wonderful world of probabilistic programming.

Beginning Python for Human People with Feelings

Melanie Crutchfield
Wednesday 9 a.m.–12:20 p.m. in Room 10

This tutorial is for people who are **brand new to Python**. It's for people with curioisty to feed, anxiety to overcome, and worlds to change. It's for people named Edna. (And others not named Edna.) During this tutorial you'll be encouraged to **bring your whole self to learning**. We'll start with the very basics of Python, keeping your fingers on the keyboard to gain as much practice as possible. Between strings, functions, and other fun Python-y things, we'll discuss learning deeply, nourishing our brains, and boosting happiness with science. No prior experience required; come just as you are. This about being a whole person. It's about learning Python, because Python is really cool. It's also about staying afloat. Being productive. Focusing. It's about finding joy in the error codes. Come play. It'll be awesome.

Building data pipelines in Python: Airflow vs scripts soup

Tania Allard
Wednesday 9 a.m.–12:20 p.m. in Room 21

In data science (in its all its variants) a significant part of an individual’s time is spent preparing data into a digestible format. In general, a data science pipeline starts with the acquisition of raw data which is then manipulated through ETL processes and leads to a series of analytics. Good data pipelines can be used to automate and schedule these steps, help with monitoring tasks, and even to dynamically train models. On top of that, they make the analyses easier to reproduce and productise. In this workshop, you will learn how to migrate from ‘scripts soups’ (a set of scripts that should be run in a particular order) to robust, reproducible and easy-to-schedule data pipelines in Airflow. First, we will learn how to write simple recurrent ETL pipelines. We will then integrate logging and monitoring capabilities. And we will end using Airflow along with Jupyter Notebooks and paper mill to produce reproducible analytics reports.

Building Evolutionary API with GraphQL and Python

Dave Anderson
Thursday 9 a.m.–12:20 p.m. in Room 22

You are a developer. Maybe you're building a rich web experience, like a single page app using JavaScript and a framework like React, Angular or Vue. Maybe you have multiple clients besides web on mobile platforms like iOS or Android. Maybe you have an external facing public API for use by clients with many diverse needs. One things is for sure: you need a robust API. That API should be able to evolve over time to meet the growing and changing demands of the business and your clients. The frameworks and paradigms we choose as we develop any software can help or hinder that change. A well-designed GraphQL API enables flexibility and stability across changes, as well as easy service discovery and thinner clients with less responsibilities, ensuring that your application grows successfully over time. The tutorial will focus on building a GraphQL API using the __Python__ library __Graphene__ with a __Django__ backend as a vehicle for teaching the principals of evolutionary API that can be applied across any tech stack, including REST, as well as the more practical concerns of working with __Graphene__ and designing your API for GraphQL. A frontend, built using __JavaScript__ with __React__ and the __Apollo__ GraphQL client library, will be made available so users can understand the full-stack considerations of building this API and reacting to evolving concerns over time. Writing JavaScript will not be required, but being comfortable reading it and setting up a local environment will help get more out of this tutorial. We'll attempt to answer questions such as: - When is using GraphQL for an API most effective? - How do I get started with GraphQL in Python? - What does it mean for an API to be Relay-compliant? What benefits are there? Drawbacks if we don't comply? - How can we make use of field arguments for sorting, filtering and other concerns? - What kinds of changes are safe to make to my API as clients begin consuming it? - How can I ensure my GraphQL API performs well and we avoid the dreaded _N+1 As A Service_ problem? - How should I design mutation responses for my GraphQL API to serve client needs? - How can multiple stakeholders decide how to evolve the API together?

Build Your Own 2D Platformer Game

Paul Vincent Craven
Wednesday 1:20 p.m.–4:40 p.m. in Room 10

Use Python and the [Arcade](http://arcade.academy) library to create your own 2D platformer. Learn to work with Sprites and the [Tiled Map Editor](https://www.mapeditor.org/) to create your own games. Add coins, ramps, moving platforms, enemies, and more.

Data Science Best Practices with pandas

Kevin Markham
Thursday 1:20 p.m.–4:40 p.m. in Room 19

The pandas library is a powerful tool for multiple phases of the data science workflow, including data cleaning, visualization, and exploratory data analysis. However, the size and complexity of the pandas library makes it challenging to discover the best way to accomplish any given task. In this tutorial, you'll use pandas to answer questions about multiple real-world datasets. Through each exercise, you'll learn important data science skills as well as "best practices" for using pandas. By the end of the tutorial, you'll be more fluent at using pandas to correctly and efficiently answer your own data science questions. Participants should have an intermediate knowledge of pandas and an interest in data science, but are not required to have any experience with the data science workflow. Datasets will be provided by the instructor.

Dealing with Datetimes

Paul Ganssle
Wednesday 1:20 p.m.–4:40 p.m. in Room 16

Dealing with dates and times is famously complicated. In this tutorial, you'll work through a few common datetime-handling tasks and handle some edge cases you are likely to encounter at some point in your career. This tutorial will cover: - Working with time zones - Serializing and deserializing datetimes - Datetime arithmetic - Scheduling recurring events The format will be a mix of short lectures and hands-on exercises.

Design Patterns in Python for the Untrained Eye

Ariel Ortiz
Wednesday 1:20 p.m.–4:40 p.m. in Room 21

Design patterns are prepackaged solutions to common software design problems. We get two important benefits when using them. Firstly, we get a way to solve typical software development issues by using a proven solution. Secondly, we get a shared vocabulary that allows us to communicate more effectively with other software designers. Getting acquainted with design patterns is the next step to become a better object oriented programmer. In this tutorial we'll answer some of the most important questions surrounding design patterns: What are they? How can we use them in our programs? When should we avoid them? We'll also have the opportunity to explore and play with the Python implementations of some of the classical design patterns made famous by the Gang of Four (Gamma, Helm, Johnson & Vlissides) while learning relevant design principles at the same time. Don't forget to bring your own laptop with your preferred Python 3 development environment.

Escape from auto-manual testing with Hypothesis!

Zac Hatfield-Dodds
Thursday 1:20 p.m.–4:40 p.m. in Room 9

If you’ve ever written some tests, or discovered that tested code can still have bugs, this tutorial is for you. [Hypothesis](https://hypothesis.readthedocs.io/) lets you write tests that should pass for every case… then finds bugs by generating inputs you wouldn’t have looked for. Even better, you get to save time by writing fewer but more powerful tests, so this process improves your productivity as well as your code! - Learn what property-based testing is, and how it relates to [other kinds of tests](https://www.hillelwayne.com/post/a-bunch-of-tests/)! - Write your first property-based test, with example code and an overview of common tactics! - Describe inputs - use and compose strategies, then define your own or infer them from other code! - Use `hypothesis.stateful` to generate and test whole programs - Get the low-down on Hypothesis: performance tips, debugging tools, and more! You’ll be ready to find real bugs by half way through the tutorial; and by the end you’ll be ready to use Hypothesis in ways we never imagined. There will be dedicated time for Q&A about applying Hypothesis (or PBT ideas) in your domain, testing anything from web apps to big data pipelines to other languages, before you leave to drag the world kicking and screaming into a new and terrifying age of high quality software.

Faster Python Programs - Measure, don't Guess

Mike Müller
Thursday 1:20 p.m.–4:40 p.m. in Room 20

Optimization can often help to make Python programs faster or use less memory. Developing a strategy, establishing solid measuring and visualization techniques as well as knowing about algorithmic basics and datastructures are the foundation for a successful optimization. The tutorial will cover these topics. Examples will give you a hands-on experience on how to approach efficiently. Python is a great language. But it can be slow compared to other languages for certain types of tasks. If applied appropriately, optimization may reduce program runtime or memory consumption considerably. But this often comes at a price. Optimization can be time consuming and the optimized program may be more complicated. This, in turn, means more maintenance effort. How do you find out if it is worthwhile to optimize your program? Where should you start? This tutorial will help you to answer these questions. You will learn how to find an optimization strategy based on quantitative and objective criteria. You will experience that one's gut feeling what to optimize is often wrong. The solution to this problem is: „Measure, Measure, and Measure!“. You will learn how to measure program run times as well as profile CPU and memory. There are great tools available. You will learn how to use some of them. Measuring is not easy because, by definition, as soon as you start to measure, you influence your system. Keeping this impact as small as possible is important. Therefore, we will cover different measuring techniques. Furthermore, we will look at algorithmic improvements. You will see that the right data structure for the job can make a big difference. Finally, you will learn about different caching techniques. ## Software Requirements You will need Python 3.7 installed on your laptop. Python 2.7 or 3.5/3.6 should also work. Python 3.x is strongly preferred. You may use Python 3.8 if is released at the time of the tutorial and all dependencies can be installed. ### JupyterLab I will use a JupyterLab for the tutorial because it makes a very good teaching tool. You are welcome to use the setup you prefer, i.e editor, IDE, REPL. If you also like to use a JupyterLab, I recommend `conda` for easy installation. Similarly to `virtualenv`, `conda` allows creating isolated environments but allows binary installs for all platforms. There are two ways to install `Jupyter` via `conda`: 1. Use [Minconda][1]. This is a small install and (after you installed it) you can use the command `conda` to create an environment: `conda create -n pycon2019 python=3.7` Now you can change into this environment: `conda activate pycon2019`. The prompt should change to `(pycon2019)`. Now you can install JupyterLab: `conda install jupyterlab`. 2. Install [Anaconda][2] and you are ready to go if you don't mind installing lots of packages from the scientific field. Personally, I prefer the Miniconda approach. ### Working witch ``conda`` environments After creating a new environment, the system might still work with some stale settings. Even when the command ``which`` tells you that you are using an executable from your environment, this might actually not be the case. If you see strange behavior using a command line tool in your environment, use ``hash -r`` and try again. [1]: https://conda.io/miniconda.html [2]: https://www.anaconda.com/download/ ### Tools You can install these with ``conda`` or ``pip`` (in the active ``conda`` environment): * [SnakeViz][3] * [line_profiler][4] * [Pympler][5] * [memory_profiler][6] #### Linux Using the package manager of your OS is alternative if you prefer this approach. [3]: http://jiffyclub.github.io/snakeviz/ [4]: https://pypi.python.org/pypi/line_profiler/ [5]: https://pypi.python.org/pypi/Pympler [6]: https://pypi.python.org/pypi/memory_profiler

First Steps in Web Development With Python

Miguel Grinberg
Thursday 9 a.m.–12:20 p.m. in Room 21

Are you a Python beginner interested in learning Web Development? If you find the number of different technologies that you need to learn to build even a simple web site overwhelming, this might be a class for you. I will assume that you have basic Python knowledge and no web development experience, and through a series of lectures and hands-on exercises, I will help you make sense of it all. By the end of the class you will have a high-level understanding of the web development ecosystem, plus a complete starter web application running in your laptop.

Getting started with Kubernetes and container orchestration

Jérôme Petazzoni, AJ Bowen
Wednesday 9 a.m.–12:20 p.m. in Room 20

You've started to "containerize" your applications by writing a Dockerfile or two, and now you want to run your containers in a cluster. But Kubernetes is quite different from Docker: the smallest unit of deployment is not a container, but a *pod*; pods are accessed through specialized load balancers called *services*; there are *labels* and *selectors* everywhere; and everything is created by expressing desired state with YAML, lots of YAML. In this hands-on tutorial, we will learn about Kubernetes and its key concepts, both in theory (we will become familiar with all the things evoked in the previous paragraph) and in practice (we will know how to use them to deploy and scale our applications). Kubernetes has the reputation of being a complex system with a steep learning curve. We will see that it is, indeed, a complex system, but that it is possible to tame its most essential features in just a few hours.

Hands-on Intro to aiohttp

Mariatta, Andrew Svetlov
Thursday 1:20 p.m.–4:40 p.m. in Room 16

Asyncio is a relatively new feature in Python, with the `async` and `await` syntaxes only recently became proper keywords in Python 3.7. Asyncio allows you to write asynchronous programs in Python. In this tutorial, we’ll introduce you to an asyncio web library called `aiohttp`. `aiohttp` is a library for building web client and server using Python and asyncio. We’ll introduce you to several key features of `aiohttp`; including routing, session handling, templating, using middlewares, connecting to database, and making HTTP GET/POST requests. We’ll provide best practises in building your `aiohttp` application, as well as how to write tests for your application. We’ll use all new Python 3.7 features to build web services with asyncio and aiohttp.

Hands-On Web Application Security with Django

Jacinda Shelly
Thursday 1:20 p.m.–4:40 p.m. in Room 21

XSS, SQL Injections and Improper Authorization, oh my! Between the OWASP Top 10, CSRF, stealing sessions, and DDOS attacks, have you ever felt that the world of web security was too complex to understand? Do you find yourself wishing that you understood what those acronyms *really* translate to in a live web application? If so, then this is the tutorial you've been waiting for. In this tutorial, we'll cover essential topics in web security, including the majority of the OWASP Top 10, but we *won't* be doing it in a theoretical manner. We'll take a live, deliberately insecure web application, identify the vulnerabilities, exploit them, and finally fix them. Sound cool? It is! Topics include the following: * Cross-site scripting (XSS) * Cross-site request forgery (CSRF) * Cookies and how they can be abused * Why default passwords are dangerous * Improper authorization checking * Incorrect Session Management * SQL Injection * How to abuse Pickle * And more! You'll also learn next steps and we'll provide suggested resources for continuing your security education. While previous experience with Django is not required, it is recommended. You should have an understanding of how web applications work in general and have completed the official [Django Tutorial](https://docs.djangoproject.com/en/2.1/intro/tutorial01/) or something substantially similar.

Hello World of Machine Learning Using Scikit Learn

Deepak K Gupta
Thursday 9 a.m.–12:20 p.m. in Room 19

_**Welcome to the Machine Learning tutorial for absolute beginners.**_ > This tutorial will not make you an expert in Machine Learning but will cover enough things to acquaint, enable and empower you to understand, explore and exploits the concept and idea behind it. ---------- **We'll be learning by generating our own data with bare minimal data points (5 - 10) so that we can manually verify our machine learning algorithms to understand it better.** _This will also help us to see how changes in data can impact our Machine Learning Algorithms. At the end of this tutorial, we'll also be using one real-world Dataset and play with it._ ---------- _In this tutorial, I'll be covering at least 3 well-known ML algorithms(KNN, Linear and Logistic Regression) along with all the maths behind it. We'll end the tutorial with a real-world mapping application_ _There is no major prerequisite for attending this, you just need to know the basics of python language and I'll cover the rest. We'll be using Scikit Learn for simplicity purpose, again, you don't need to have any prior experience with Scikit Learn or for that matter with Machine Learning_

Introduction to Data Science with Python

Grishma Jena
Thursday 9 a.m.–12:20 p.m. in Room 15

Wish to perform Data Science but don’t know how to? Have a dataset that you really want to analyze but not sure how to start? This hands-on session teaches how to explore datasets, use Machine Learning algorithms and derive insights from models using popular Python tools like Jupyter, pandas, sklearn and numpy. Aimed at budding data scientists with prior programming experience in any language.

IPython and Jupyter in Depth: High productivity, interactive Python

Matthias Bussonnier, Denis Akhiyarov
Thursday 9 a.m.–12:20 p.m. in Room 20

IPython and Jupyter provide tools for interactive computing that are widely used in scientific computing, education, and data science, but can benefit any Python developer. In this tutorial we will introduce you to the latest developement in IPython and Jupyter, get you up to speed on how to install jupyter on your machine and where to seek help for larger deployment. Then we will dive into intermediate features that makes the power of IPython and Jupyter. We will dive into how to make the best use of features like * Async REPL (New in IPython 7) * And how to tie that into the Visualisation capabilities of Jupyter, and the new JupyterLab interface. * Widgets (building simple interactive dashboards based on ipywidgets) * Magics * Multilanguage integration, The notebooks also allow for code in multiple languages allowing to mix Python with Cython, C, R and other programming languages to access features hard to obtain from Python. ### More info For full details about IPython including documentation, previous presentations and videos of talks, please see [the project website](http://ipython.org). The materials for this tutorial will be [available on a github repository](https://github.com/ipython/ipython-in-depth).

Lambda Calculus from the Ground Up

David Beazley
Wednesday 9 a.m.–12:20 p.m. in Room 9

These days, programming style guides are all the rage. However, what if your style guide was so restrictive that it only gave you single-argument functions and nothing else? No modules, no classes, no control flow, no data structures, and not even any primitives like integers or regular expressions. Just functions. Could you actually program anything at all? Surprisingly, the answer is yes. In this tutorial, you'll learn how as you work through a ground-up derivation of the lambda calculus in Python. You will learn nothing practically useful in this tutorial. No packaging. No tools. No libraries. No deployment. No magic Python programming techniques. And certainly learn nothing you would ever want to apply to a real project. You will, on the other hand, have a lot of fun, be completely amazed, and learn some foundational computer science that is a jumping off point for further explorations of functional programming, type theory, programming languages, and more.

Lazy Looping in Python: Making and Using Generators and Iterators

Trey Hunner
Wednesday 1:20 p.m.–4:40 p.m. in Room 19

When processing large amounts of data in Python, we often reach for lists. Unfortunately, processing data using large lists make for ugly code that can be memory inefficient and slow. Python's solution to this problem is lazy looping using generators and iterators. During this tutorial we'll learn a number of lazy looping techniques which will help you write more efficient and more readable Python code. We'll get practice creating generators, playing with iterators, and using generators and iterators to drastically restructure our code in a more descriptive data-centric way. You'll walk out of this tutorial with hands-on experience with Python's various lazy looping constructs and a greater appreciation for how looping works under the hood in Python.

Network Analysis Made Simple

Mridul Seth, Eric Ma
Wednesday 9 a.m.–12:20 p.m. in Room 16

Have you ever wondered about how those data scientists at Facebook and LinkedIn make friend recommendations? Or how epidemiologists track down patient zero in an outbreak? If so, then this tutorial is for you. In this tutorial we will use a variety of datasets to help you understand the fundamentals of network thinking, with a particular focus on constructing, summarizing, and visualizing complex networks.

Open the Black Box: an Introduction to Model Interpretability in Python

Kevin Lemagnen
Thursday 1:20 p.m.–4:40 p.m. in Room 15

What's the use of sophisticated machine learning models if you can't interpret them? In fact, many industries including finance and healthcare require clear explanations of why a decision is made. This tutorial covers recent model interpretability techniques that are essentials in your data scientist toolbox: Eli5, LIME (Local Interpretable Model-Agnostic Explanations) and SHAP (SHapley Additive exPlanations). You will learn how to apply these techniques in Python on real-world data science problems in order to debug your models and explain their decisions. You will also learn the conceptual background behind these techniques so you can better understand when they are appropriate.

Pandas is for Everyone

Daniel Chen
Wednesday 9 a.m.–12:20 p.m. in Room 19

Data Science and Machine learning have been synonymous with languages like Python. Libraries like Numpy and Pandas have become the de facto standard when working with data. The DataFrame object provided by Pandas gives us the ability to work with heterogeneous unstructured data that is commonly used in "real world" data. New learners are often drawn to Python and Pandas because of all the different and exciting types of models and insights the language can do and provide, but are awestruck when faced with the initial learning curve. This tutorial aims to guide the learner from using spreadsheets to using the Pandas DataFrame. Not only does moving to a programming language allow the user to have a more reproducible workflow, but as datasets get larger, some cannot even be opened in a spreadsheet program. The goal is to have an absolute beginner proficient enough with Pandas that they can start working with data in Python. We will cover how to load and view our data and introduce what Dr. Hadley Wickham has coined "tidy data". Tidy data is an important concept because the process of tidying data will fix a host of data problems that are needed to perform analytics. We then cover functions and applying methods to our data with a focus on data cleaning, and how we can use the concept of split-apply-combine (groupby) to summarize or reduce our data. Finally, we cover the role of Pandas in analysis packages such as scikit learn. The tutorial will end with a fitted model. The goal is to get people familiar with Python and Pandas so they can learn and explore many other parts of the Python ecosystem (e.g., scikit learn, dask, seaborn, etc).

Practical API Security

Adam Englander
Thursday 9 a.m.–12:20 p.m. in Room 16

With the dominance of Mobile Apps, Single Page Apps for the Web, and Micro-Services, we are all building more APIs than ever before. Like many other developers, I had struggled with finding the right mix of security and simplicity for securing APIs. Some standards from the IETF have made it possible to accomplish both. Let me show you how to utilize existing libraries to lock down you API without writing a ton of code. In this tutorial, you will learn how to write a secure API with future proof security utilizing JOSE. JOSE is a collection of complimentary standards: JWT, JWE, JWS, JWA, and JWK. JOSE is used by OAuth, OpenID, and others to secure communications between APIs and consumers. Now you can use it to secure your API.

Python by Immersion

Stuart Williams
Thursday 9 a.m.–12:20 p.m. in Room 10

A very fast introduction to Python for software developers with experience in other languages. Instead of a traditional top-down presentation of Python's features, syntax, and semantics, students are immersed in the language bottom-up with hundreds of small examples using the interactive interpreter to quickly gain familiarity with most of the core language features. Special attention is given to concepts in Python that often trip up those new to the language.

Pythonic Objects: idiomatic OOP in Python

Luciano Ramalho
Wednesday 9 a.m.–12:20 p.m. in Room 22

Objects and classes are part of Python since version 1 -- not an afterthought. But all languages implement and support OOP in different ways. "Classic" patterns that make sense elsewhere may not be as useful in Python, and Python provides unique solutions to some familiar problems. This tutorial is about modern, idiomatic OOP in Python 3.7. Most of the discussion will be relevant to previous versions all the way to Python 2.7, but newer features will be highlighted.

Scikit-learn, wrapping your head around machine learning

Chalmer Lowe
Wednesday 1:20 p.m.–4:40 p.m. in Room 20

A gentle introduction to machine learning through scikit-learn. This tutorial will enable attendees to understand the capabilities and limitations of machine learning through hands-on code examples and fun and interesting datasets. Learn when to turn to machine learning and which tools apply to your problem. Also learn about gotchas and problems that are likely to show up when attempting to use machine learning.

To trust or to test?: Automated testing of scientific projects with pytest

Dorota Jarecka, Anna Jaruga
Thursday 9 a.m.–12:20 p.m. in Room 9

Many researchers rely strongly on numerical computations. Unfortunately, testing scientific code is a hard task. Often there is no ground truth available for comparison and the end result of the simulation is unknown even to the code developer herself/himself. Often the user-base of the scientific code is small and the work environment does not provide incentives for testing. However, there are always parts of the code that are relatively easy to cover by Unit Tests. Scientific pipelines could and should have Regressions Tests, which ensure that previously developed software still performs after changes in the code, or in external libraries and computational environment. An automatic test suite should not be a burden and can become a game-changer even for a small programming project. This tutorial is meant to be an introduction to testing in general and to pytest library. Pytest is a full feature tool to test the Python code, it offers a simple way to get started and scales from simple unit testing to complex functional testing. We will begin with simple assert statements and finish with pytest.fixture and pytest.parametrization. The tutorial will also cover a simple integration of the tests suit with Continuous Integration platforms using GitHub and Travis/CircleCI. **Audience:** The tutorial is designed for scientist and data scientists who would like incorporate testing to their everyday work. We expect that people know basic Python and NumPy, and are familiar with simple shell commands. Basic knowledge of Git/GitHub will be useful in the second half of the tutorial.

What To Expect When You’re Expecting: A Hands-On Guide to Regression Testing

Emily Morehouse
Thursday 1:20 p.m.–4:40 p.m. in Room 10

We all know we should be testing our applications, but testing is hard and great testing is even harder. Take a deep dive into what and how to test your Django apps, plus learn how to leverage modern headless browser libraries and automated visual diff-ing to get (and keep) pixel-perfect apps.

Writing about Python (Even When You Hate Writing)

Thursday Bram
Wednesday 1:20 p.m.–4:40 p.m. in Room 15

This tutorial is an introduction to writing about Python: we'll cover potential pitfalls in documentation and other technical writing, practice writing non-technical content (from blog posts to job listings), and testing our writing for readability and accessibility. We'll even cover where writing can fit into your workflow and a few tricks for reducing your writing workload. After this tutorial, you'll be equipped to write about Python for both technical and nontechnical audiences. You might even enjoy writing by the time we're done!

Writing Command Line Applications that Click

Dave Forgac
Wednesday 9 a.m.–12:20 p.m. in Room 15

Click is a Python package that helps you create well-behaved command line interfaces with minimal code. In this tutorial you will: - Learn what makes a command line application "well-behaved" - Build an application that exercises the most commonly-used features of Click - Get an overview of the more advanced functionality available - Package and install the application You will leave with an example application that you can use as a basis for your own command line development.