Tutorials


Introduction to Decorators: Power Up Your Python Code


Wednesday 3 p.m.–5:30 p.m. in tutorials - Tutorial 1

Python supports functions as first-class objects. This means that functions can be assigned to variables, and passed to and from other functions, just like any other object in Python.

One powerful application of this is the decorator syntax, which makes it easy to apply one function to another at compile time. Decorators offer a simple and readable way of adding capabilities to your code. This tutorial will teach you how decorators work, and how to create your own decorators.

Being comfortable with using and creating decorators, will make you a more efficient Python programmer.


Functional Python


Wednesday 3 p.m.–5:30 p.m. in tutorials - Tutorial 2

Python supports multiple programming paradigms. I addition to the procedural and object-oriented approach, it also provides some features that are typical for functional programming.

While these features are optional, they can be useful to create better Python programs. This tutorial introduces Python features that help to implement parts of Python programs in the functional style. Objective is not to write pure functional programs but improve programs design by using functional feature where suitable.

The tutorial points out advantages and disadvantages of functional programming in general and in Python in particular. Participants will learn alternative ways to solve problems. This will broaden their programming toolbox.

Software Requirements

You will need Python 3.9 installed on your laptop. Python 3.7/3.8 should also work. You may use Python 3.10 if is released at the time of the tutorial and all dependencies can be installed.

JupyterLab

I will use a JupyterLab for the tutorial because it makes a very good teaching tool. You are welcome to use the setup you prefer, i.e editor, IDE, REPL. If you also like to use a JupyterLab, I recommend conda for easy installation. Similarly to virtualenv, conda allows creating isolated environments but allows binary installs for all platforms.

There are two ways to install Jupyter via conda:

  1. Use Minconda. This is a small install and (after you installed it)
    you can use the command conda to create an environment:
    conda create -n pycon2021py39 python=3.9
    Now you can change into this environment:
    conda activate pycon2021py39. The prompt should change to (pycon2021py39).
    Now you can install JupyterLab: conda install jupyterlab.

  2. Install Anaconda and you are ready to go if you don't mind installing
    lots of packages from the scientific field.

  3. Install the dependencies:

    * Jupyter Lab 2 conda install jupyterlab
    * more_itertools conda more_itertools
    * toolz conda install toolz

  4. Hint: You do all this in one command:
    conda create -n pycon2021py39 python=3.9 jupyterlab more-itertools toolz

You can create a comparable setup with virtual environments and pip, if you prefer.

Working witch conda environments

After creating a new environment, the system might still work with some stale settings. Even when the command which tells you that you are using an executable from your environment, this might actually not be the case. If you see strange behavior using a command line tool in your environment, use hash -r and try again.


Magical NumPy with JAX


Wednesday 3 p.m.–5:30 p.m. in tutorials - Tutorial 3

The greatest contribution of the age the decade in which deep learning exploded was not these big models, but a generalized toolkit to train any model by gradient descent. We're now in an era where differential computing can give you the toolkit to train models of any kind. Does a Pythonista well-versed in the PyData stack have to learn an entirely new toolkit, a new array library to have access to this power?

This tutorial's answer is as follows: If you can write NumPy code, then with JAX, differential computing is at your fingertips with no need to learn a new array library! In this tutorial, you will learn how to use the NumPy-compatible JAX API to write performant numerical models of the world and train them using gradient-based optimization. Along the way, you will write loopy numerical code without loops, think in data cubes, get your functional programming muscles trained up, generate random numbers completely deterministically (no, this is not an oxymoron!), and preview how to mix neural networks and probabilistic models together... leveraging everything you know about NumPy plus some nearly-learned JAX magic sprinkled in!


Hands-On Regular Expressions in Python


Wednesday 3 p.m.–5:30 p.m. in tutorials - Tutorial 4

What are regular expressions, what are they useful for, and why are they so hard to read?

In this tutorial we will break down the regular expression syntax to better understand how they work. We will learn how to dissect regular expressions, how to use regular expressions in Python, and how to make your regular expressions more readable (yes it's possible... sort of).

We will learn how to use regular expressions for data validation, data parsing, and data normalization. We'll also discuss when not to use regular expressions.


A Complete Beginner's Guide to Python by Making Simple Games


Wednesday 7 p.m.–9:30 p.m. in tutorials - Tutorial 1

Excited about programming? Have you heard good things about Python? Now is the time to dive in and start learning how to program. This three hour tutorial covers the basics of the basics of Python. Programming is a wide and deep field, but you only need a taste. You'll learn about variables, expressions, loops, functions, and most importantly: what those words even mean to begin with. This is a tutorial for complete beginners (or those who want to start over again from the beginning.) This tutorial does not include computer science, machine learning, or brain surgery. By the end, we'll have create a few simple games (Guess the Number, Magic 8 Ball, and a Dice Rolling Simulator) as well as how to guide yourself through the next steps on your programming journey.


Python Unit Testing with Pytest and Mock


Wednesday 7 p.m.–9:30 p.m. in tutorials - Tutorial 2

Writing unit tests for your code is widely accepted as a best practice.
Learn how to use Pytest, the de-facto testing tool standard,
and mock, the built-in library for creating mock objects,
to write high-quality tests.


Dashboards for All


Wednesday 7 p.m.–9:30 p.m. in tutorials - Tutorial 3

Dashboards are useful tools for data professionals from all levels and within different industries. From analysts who want to showcase the insights they have uncovered to researchers wanting to explain the results of their experiments, or developers wanting to outline the most important metrics stakeholders should pay attention to in their applications, these dashboards can help tell a story or, with a bit of interactivity, let the audience pick the story they’d like to see. With this in mind, the goal of this tutorial is to help data professionals from diverse fields and at diverse levels tell stories through dashboards using data and Python.

The tutorial will emphasize both methodology and frameworks through a top-down approach. Several of the open source libraries included are bokeh, holoviews, and panel. In addition, the tutorial covers important concepts regarding data types, data structures, and data visualization and analysis. Lastly, participants will also learn concepts from the fields where the datasets came from and build a foundation on how to reverse engineer data visualizations they find in the wild.


From Spreadsheets to DataFrames: Escaping Spreadsheet Hell With Python


Wednesday 7 p.m.–9:30 p.m. in tutorials - Tutorial 4

A spreadsheet is a wonderful invention and an excellent tool for certain jobs. All too often, however, spreadsheets are called upon to perform tasks that are beyond their capabilities. It’s like the old saying, 'If the only tool you have is a hammer, every problem looks like a nail.' However, some problems are better addressed with a screwdriver, with glue, or with a Swiss Army Knife.

Python is described by some in the programming world as the Swiss Army Knife of programming languages because of its unrivaled versatility and flexibility in use. This allows its users to solve complex problems relatively easily compared with other programming languages and is one of the reasons why Python has become increasingly popular over time.

In this tutorial, we’ll briefly discuss spreadsheets, signs that you might be living in “Excel Hell”, and then we’ll spend the rest of the time learning how to escape it using Python.

In the first section, we’ll extend on what spreadsheet users already know about cells, rows, columns, and formulas, and map them to their Python equivalent, such as variables, lists, dictionaries, and functions. At the end of this section, we’ll do an interactive exercise and learn how we can perform a simple calculation, similar to one you might do in Excel, but instead using Python.

In the second section, we’ll discuss (and attempt) how we can perform more complex tasks including web scraping, data processing, analysis, and visualization, by utilizing a few popular 3rd party libraries used including Requests, Pandas, Flask, Matplotlib, and others.

In the last section, we’ll round out our discussion with a few important concepts in data management, including concept of tidy data, building a data pipeline, and a few strategies (and packages) to use when approaching various data problems, including demo using Apache Airflow.

https://github.com/ryansmccoy/spreadsheets-to-dataframes


Python Packaging Demystified


Thursday 3 p.m.–5:30 p.m. in tutorials - Tutorial 1

For most developers, Python packaging feels like a magical (and cryptic) black box. Apps and libraries use a variety of tools and have different packaging challenges. Once you start reading up on this topic, you come across many seemingly random components: setuptools, pip, poetry, wheels, pyproject.toml, MANIFEST.in, virtual environments, zippapp, shiv, pex, and so on. The sheer number of concepts to master can be overwhelming, leading many programmers to conclude that packaging in Python is a mess. Before you despair, join me in this tutorial session where you'll have a chance to learn how to package and publish/deploy your library and/or application through hands-on exercises.

Topics include:

  • How and why library packaging differs from application packaging
  • Differences between a source tree/source distribution/wheel
  • Differences between a build back-end and a build front-end (and why we even have this separation)
  • Tools used for packaging your library
  • Tools and techniques used to package your application
  • Testing your package for correctness

(Serious) Time for Time Series


Thursday 3 p.m.–5:30 p.m. in tutorials - Tutorial 2

Time to take Time Series seriously!

From inventory to website visitors, resource planning to financial data, time-series data is all around us. Knowing what comes next is key to success in this dynamically changing world. And for that we need reliable forecasting models. While complex & deep models may be good at forecasting, they typically give us little insight about the underlying patterns in our data.

In this tutorial, we'll cover relatively simple yet powerful approaches for time series analysis and seasonality modelling with Pandas.

At the end of this session, you will be familiar with the fundamentals of time series analysis, how to decompose time series into trend, seasonality and error component, and how to use our insights to create simple but powerful models for forecasting.


A Hands-On Introduction to Multi-Objective Optimisation


Thursday 3 p.m.–5:30 p.m. in tutorials - Tutorial 3

Optimising for multiple objectives is a non-trivial task, especially when they are in conflict. For example how can one best overcome the classic trade-off between quality and cost of production, when the monetary value of quality is not defined? In this hands-on Python tutorial you will learn about Pareto Fronts and use them to optimise for multiple objectives simultaneously.

Multi-Objective Optimisation, also known as Pareto Optimisation, is a method to optimise for multiple parameters simultaneously. When applicable, this method provides better results than the common practice of combining multiple parameters into a single parameter heuristic. The reason for this is quite simple. The single heuristic approach is like horse binders limiting the view of the solution space, whereas Pareto Optimisation enables a bird’s eye view.

Real world applications span from supply chain management, manufacturing, aircraft design to land use planning. For example when developing therapeutics, Pareto optimisation may help a biologist maximise protein properties like effectiveness and manufacturability while simultaneously minimising toxicity.

I will provide a git repository with Jupyter notebooks with which you will apply lessons and tools learned to the simple Knapsack problem. Here you will program for filling a bag with packages with the objective of minimising the bag weight while maximising its content value.

My objective is for you to gain a basic intuition for the technique, understand its advantages and shortcomings to be able to assess applicability for your own projects


Hacking Dask: Diving Into Dask’s Internals


Thursday 3 p.m.–5:30 p.m. in tutorials - Tutorial 4

Dask is a popular Python library for scaling and parallelizing Python code on a single machine or across a cluster. It provides familiar, high-level interfaces to extend the PyData ecosystem (e.g. NumPy, Pandas, Scikit-Learn) to larger-than-memory or distributed environments, as well as lower-level interfaces for parallelizing custom algorithms and workflows. In this tutorial we’ll cover more advanced features of Dask like task graph optimization, the worker and scheduler plugin system, how to inspect the internal state of a cluster, and more. Attendees should walk away with a deeper understanding of Dask’s internals, an introduction to more advanced features, and ideas of how they can apply these features effectively to their own data intensive workloads.


Introduction to Property-Based Testing


Thursday 7 p.m.–9:30 p.m. in tutorials - Tutorial 1

Has testing got you down? Ever spent a day writing tests, only to discover that you missed a bug because of some edge case you didn’t know about? Does it ever feel like writing tests is just a formality - that you already know your test cases will pass?

Property-based testing might be just what you need!

After this introduction to property-based testing, you’ll be comfortable with Hypothesis, a friendly but powerful property-based testing library. You’ll also known how to check and enforce robust properties in your code, and will have hands-on experience finding real bugs.

Where traditional example-based tests require you to write out each exact scenario to check - for example, assert divide(3, 4) == 0.75, property-based tests are generalised and assisted. You describe what kinds of inputs are allowed, write a test that should pass for any of them, and Hypothesis does the rest!

from hypothesis import given, strategies as st

@given(a=st.integers(), b=st.integers())
def test_divide(a, b):
   result = a / b
   assert a == b * result

There’s the obvious ZeroDivisionError, fixable with b = st.integers().filter(lambda b: b != 0), but there’s another bug lurking. Can you see it? Hypothesis can!


Writing Documentation with Sphinx and reStructuredText


Thursday 7 p.m.–9:30 p.m. in tutorials - Tutorial 2

The success of Python and open source libraries is not separable from the availability of good documentation. Reading documentation is one of the first things a user of the open source library has to do.

In the Python open source community, documentation is often written using reStructuredText markup language, and built with Sphinx. The official Python documentation and Python Enhancements Proposals (PEPs) are all written using reStructuredText. Being able to write documentation using reStructuredText becomes a necessary skill for any aspiring Python open source contributors and maintainers.

Yet, reStructuredText itself can be seen as a barrier into contributing to open source, since it is not as straightforward as Markdown. Compared to Markdown, reStructuredText is not as widely adopted outside of the Python community. Don’t let this discourage you! Let’s break down this barrier! reStructuredText is not as complicated as you might think. You can learn it!

In this tutorial, we'll go through various useful features of reStructuredText. You will learn how to create and build a documentation project using Sphinx. Not only will you learn a new skill, you can also confidently start contributing to open source projects by helping to improve their documentation.


Practical Deep Learning for Data Scientists


Thursday 7 p.m.–9:30 p.m. in tutorials - Tutorial 3

This tutorial is a chance to get hands-on with PyTorch and GPU Deep Learning (DL). It is specifically targeted toward attendees who may be familiar with the concepts of DL, but want practical experience. Familiarity with Python and typical ML packages (e.g. pandas, numpy, sklearn) is expected.

At the end of this session, you will understand how to:
Build some common DL architectures in PyTorch
Evaluate and improve the performance
Take advantage of more compute (and when you should do so)

This will set you up to take advantage of interesting developments in the field and maybe even contribute your own!


Effective Data Visualization


Thursday 7 p.m.–9:30 p.m. in tutorials - Tutorial 4

From picking the right plot for the particular type of data, statistic, or result; to pre-processing sophisticated datasets, and even making important decisions about the aesthetic of a figure, visualization is both a science and art that requires both knowledge and practice to master.

This tutorial is for python users who are familiar with python and basic plotting, and want to build strong visualization skills that will let them effectively communicate any data, statistic, or result.

We will use python libraries such as seaborn, matplotlib, plotly, and sklearn; and discuss topics such as density estimation, dimensionality reduction, interactive plotting, and making suitable choices for communication. Drawing examples from datasets in the scientific, financial, geospatial (mapping) fields and more.