Tutorials


Introduction to Natural Language Processing


Wednesday 9 a.m.–12:30 p.m. in tutorials - 250AB

With Data Scientist consistently named one of the trendiest jobs of the 21st Century, it’s no surprise that many are flocking to learn skills like Python, mathematics, and machine learning. In this tutorial we’ll introduce attendees to an important subfield of data science: natural language processing (NLP).

Using popular data science libraries such as pandas, spaCy, and scikit-learn, we’ll cover common NLP terminology used in the industry as well as text preprocessing techniques. In addition, we’ll identify real world objects like people and businesses using named entity recognition and summarize data using term frequency. We’ll also learn to analyze the structure of our text data using dependency parsing and part-of-speech tagging. We'll end with an introduction to text similarity and determine key topics using topic modeling.

Attendees will gain hands-on experience by analyzing 500 Amazon Home and Kitchen product reviews.


Distributed Python with Ray: Hands-on with the Ray Core APIs


Wednesday 9 a.m.–12:30 p.m. in tutorials - 250C

This is an introductory and hands-on guided tutorial of Ray Core. Ray provides powerful yet easy-to-use design patterns for implementing distributed systems in Python. This tutorial includes a brief talk to provide an overview of concepts, why one might use Ray for distributing Python and Machine Learning workloads, and a brief discussion on Ray’s Ecosystem.

Primarily, the tutorial will focus on Ray Core APIs to write remote functions, actors, and understand Ray’s basic design patterns for writing distributed Python applications.


Getting started with Object-Oriented Programming through Signal Processing


Wednesday 9 a.m.–12:30 p.m. in tutorials - 250F

In this tutorial OOP foundations will be explored fitting signals and waves into objects.

We will follow a top-down methodology, by modelling signals from scratch, creating fatty objects, and then tweaking their representation introducing inheritance and delegation. We will talk about Python magic methods to implement processing operations. We will eventually see how to implement the Iterator Design Pattern.

Trough the session, we will keep a special eye on code explicitness and simplicity, highlighting pros and cons of every implementation.

A laptop with Python installed is the sole requirement.
Neverthless, it could be handful having a Jupyter notebook instance running to visualize and listen to signals easily. In this case only numpy and matplotlib should be already installed.


All about decorators


Wednesday 9 a.m.–12:30 p.m. in tutorials - 251AB

Decorators are one of Python's most powerful features. But for many developers, they remain somewhat mysterious and intimidating. In this tutorial, you'll learn what decorators are, how they work, how to write them, and when you should use them. Along the way, you'll write decorators that demonstrate their power, as well as some typical use cases — including for caching, filtering inputs, filtering outputs, timing, logging, and security. Along with the numerous hands-on exercises, there will be ample opportunities for questions and interactions. The Jupyter notebook that I'll use for teaching will be shared in real time with participants, and will also be available after the tutorial is over.

If you've always found decorators intimidating, or just wanted to know what they are, then this tutorial will answer your questions, as well as give you the confidence you need to use them in your own code.


Python Metaprogramming: decorators, descriptors, metaclasses, and more


Wednesday 1:30 p.m.–5 p.m. in tutorials - 250AB

Ever wondered how Python frameworks that seem kind of magical (like Django) actually work? That’s the subject of this tutorial. If you can imagine a Python feature that doesn’t exist, you might be able to invent it yourself using one of Python’s metaprogramming features.

In this tutorial we’ll learn about a few of Python’s powerful metaprogramming features: decorators, descriptors, and metaclasses. These 3 features power many of Python’s interesting internals (property, methods, and abstract base classes for example).

By the end, you'll understand how function and class decorators work, how the property decorator works under the hood, and what controls the creation of a class.


Python Types for Fun and Profit


Wednesday 1:30 p.m.–5 p.m. in tutorials - 250C

Many Python developers now use type annotations to catch and fix bugs early in the coding process. This tutorial will introduce you to type annotations in Python. We’ll cover basic ideas about how types work in a dynamic language like Python, and where explicit annotations can provide value. We’ll then explore features of the type system in more depth, and demonstrate how they can be used to precisely yet flexibly express a huge range of programming patterns.

Throughout the tutorial, you will have the chance to get your hands dirty by learning how to add types to small code snippets as well as to an example GitHub project, and run a type checker to see errors as you code. You’ll get to practice and play around with each concept as we discuss it, and walk away with concrete experience adding types to and catching bugs in real code.

A laptop with Python installed is required along with internet access.


Knowledge graph data modelling with TerminusDB


Wednesday 1:30 p.m.–5 p.m. in tutorials - 250DE

For whom is your Workshop

Data scientists, engineers and researchers who have no prior experience in knowledge graph data modelling. In this workshop, we will start from the fundamentals - learning how to think in terms of triples to describe relations of different data objects. If your work involves data analysis, data management, data collaboration or anything data-related, this is a workshop for you to have a brand new insight into how data should be represented and stored.

Short Format of your Workshop

Overview-10 min, Lecture - 60 mins, Breaks- 20 minutes, Hands-on training - 80 mins, Closing - 10 mins

What Attendees will Learn

By the end of the workshop, you will be able to think like a knowledge graph expert and construct a proper schema to store your data in a knowledge graph format. You will acquire the skills that you need to build knowledge graphs in TerminusDB - an open-source graph database that enables revisional control and collaborations.

Course Benefits

You will have learnt a new skill set that may assist you in your project in data science or research. You will have a new tool that you can better model your data and collaborate with others. Also, you gain all the prerequisites to use WOQL - a query language for knowledge graphs and the TerminusDB Python client to manage, manipulate and visualize data in your knowledge graph.


Building your first Dashboard using Dash


Wednesday 1:30 p.m.–5 p.m. in tutorials - 250F

Tutorial breakdown

Setting up

We will first activate a virtual environment, and install all required dependencies prior to the start. The facilitator will ensure that all participants have completed this step before moving to the next topic.

Data exploration and visualization using Jupyter notebooks

In this section we will introduce the dataset and the problem statement. We will use the Pandas Python package to assess the presence of missing information, and get familiar with the content of the data via box plots, scatter plots and histograms using Plotly.

Turning code into functions, and scripting

Once we are familiar with the data and generated a few sample visualizations, we will refactor our code into reusable functions that can be incorporated into a script. We will cover the anatomy of a Python script and interact with the script via the command line.

Introduction to Dash and layouts

In this section, we will learn about the main components in a Dash app. We will introduce various dashboard designs (layouts).

Implement dashboard and test locally

In this section we will implement code needed to generate and deploy a dashboard exploring the selected data locally. We will explore the pitfalls (potential sources of bugs, interpreting and fixing errors as they appear in the dashboard) and implement various layouts.

Deploying online

We will learn about files needed to deploy a dashboard online: Procfiles, requirements.txt, .gitignore and their role in deployment. We will then deploy a test dashboard online using Heroku.


Goodbye, "Hello, World." Hello, Functional FastAPI Web App!


Wednesday 1:30 p.m.–5 p.m. in tutorials - 251AB

Building a web application with Python is super easy. With just a few lines of code, you can get a simple, working app running directly on your computer's browser.

Awesome! But then what?

This tutorial focuses on that awkward transition from beginner to intermediate—when you want a project to be less of a sketchpad and more of an actual, useful tool.

We will learn tactics on how to find and use resources when devising a plan for your web application, as well as hands-on learning for tackling common (and necessary) aspects of building your app, such as configuration, app structure, and database modeling.

For the training, you will be following along as we build the foundation of a fully-functional web application, and will leave with the ability to further refine it for real-world scenarios.


Learning from errors: understanding and debugging Python errors


Thursday 9 a.m.–12:30 p.m. in tutorials - 250AB

Python is a very user-friendly language that is easy to learn. New users can quickly write relatively complicated code that uses various libraries. Writing code can be an exciting experience, especially if it works and we can quickly see the result of our work. However, programming is not only about writing code that works, but also about debugging issues when the code doesn’t work. Unfortunately, many new users are either discouraged when getting errors or use inefficient ways of debugging these errors, e.g. using only print statements. The aim of this tutorial is to change it and convince people that debugging tools are very useful even for beginner and intermediate programmers. I will also discuss how to handle exceptions when writing your own software.

During this tutorial participants will learn:
- about various types of errors,
- how to understand Python traceback output,
- how to use Python debugger,
- how to write software that returns meaningful errors,
- how to report errors to open source projects.


Network Analysis Made Simple


Thursday 9 a.m.–12:30 p.m. in tutorials - 250C

Have you ever wondered about how data scientists at Facebook and LinkedIn make friend recommendations? Or how epidemiologists track down patient zero in an outbreak? If so, then this tutorial is for you. In this tutorial, we will use a variety of datasets to help you understand the fundamentals of network thinking, with a particular focus on constructing, summarizing, visualizing, and using complex networks to solve problems.


A Pythonista’s Introductory Guide to WebAssembly


Thursday 9 a.m.–12:30 p.m. in tutorials - 250DE

Wasm is a binary code format specification first released in 2017. This technology can be implemented in web browsers or standalone applications in a secure, open, portable, and efficient fashion. More precisely, Wasm is an intermediate language for a stack-based virtual machine that uses a just-in-time (JIT) compiler to produce native machine code. Although Wasm was primarily designed as a compilation target for languages such as C/C++ or Rust, it can be integrated with Python in interesting ways. And that’s what we’ll be focusing on during this tutorial. Some experience with JavaScript and web development might come in handy but is not strictly required. At the end, we’ll show how to develop a tiny compiler that has Wasm as it’s compilation target.


Introduction to Property-Based Testing


Thursday 9 a.m.–12:30 p.m. in tutorials - 250F

Description

Has testing got you down? Ever spent a day writing tests, only to discover that you missed a bug because of some edge case you didn’t know about? Does it ever feel like writing tests is just a formality - that you already know your test cases will pass?

Property-based testing might be just what you need!

After this introduction to property-based testing, you’ll be comfortable with Hypothesis, a friendly but powerful property-based testing library. You’ll also known how to check and enforce robust properties in your code, and will have hands-on experience finding real bugs.

Where traditional example-based tests require you to write out each exact scenario to check - for example, assert divide(3, 4) == 0.75 - property-based tests are generalised and assisted. You describe what kinds of inputs are allowed, write a test that should pass for any of them, and Hypothesis does the rest!

```python
from hypothesis import given, strategies as st

@given(a=st.integers(), b=st.integers())
def test_divide(a, b):
result = a / b
assert a == b * result
```

There’s the obvious ZeroDivisionError, fixable with b = st.integers().filter(lambda b: b != 0), but there’s another bug lurking. Can you see it? Hypothesis can!

Audience

This tutorial is for anybody who regularly writes tests in Python, and would like an easier and more effective way to do so. We assume that you are comfortable with traditional unit tests - reading, running, and writing; as well as familar with ideas like assertions. Most attendees will have heard "given, when, then" and "arrange, act, assert". You may or may not have heard of pre- and post-conditions - we will explain what "property-based" means without reference to Haskell or anything algebraic.


JupyterLab for Everybody - Harness the Full Power of Interactive Python Development


Thursday 9 a.m.–12:30 p.m. in tutorials - 251AB

JupyterLab is a widely use tool in the scientific and data science community.
It allows for very fast interactive work.
It is way more powerful than the standard Python REPL or other terminal-based,
improved REPL.

This tutorial introduces basic and more advanced JupyterLab features.
In addition it highlights potential problems that result from the principles
JupyterLab is based upon.
Being aware of these problems is important to avoid them.
Examples of workflows show how JupyterLab can be used by everybody for various
tasks.
You will learn when to use or not to use JupyterLab.
If you haven't use JupyterLab before, you will learn about a new tool that can
be a good addition to your programming toolbox.
If you have already been using JupyterLab, but found it not ideal for your
purposes, you will learn how you can avoid potential pitfalls, to apply
appropriate workflows, and when to switch to or combine JupyterLab with a
different tool.


Introduction to Data Analysis Using Pandas


Thursday 1:30 p.m.–5 p.m. in tutorials - 250AB

Section 1: Getting Started With Pandas

We will begin by introducing the Series, DataFrame, and Index classes, which are the basic building blocks of the pandas library, and showing how to work with them. By the end of this section, you will be able to create DataFrames and perform operations on them to inspect and filter the data.

Section 2: Data Wrangling

To prepare our data for analysis, we need to perform data wrangling. In this section, we will learn how to clean and reformat data (e.g. renaming columns, fixing data type mismatches), restructure/reshape it, and enrich it (e.g. discretizing columns, calculating aggregations, combining data sources).

Section 3: Data Visualization

The human brain excels at finding patterns in visual representations of the data; so in this section, we will learn how to visualize data using pandas along with the Matplotlib and Seaborn libraries for additional features. We will create a variety of visualizations that will help us better understand our data.

Section 4: Hands-On Data Analysis Lab

We will practice all that you’ve learned in a hands-on lab. This section features a set of analysis tasks that provide opportunities to apply the material from the previous sections.


Assessing and mitigating unfairness in AI systems


Thursday 1:30 p.m.–5 p.m. in tutorials - 250C

Fairness in AI systems is an interdisciplinary field of research and practice that aims to understand and address some of the negative impacts of AI systems on society, with an emphasis on improving the impacts of such systems on historically underserved and marginalized communities.

In this tutorial, we will walk through the process of assessing and mitigating fairness-related harms in the context of the U.S. health care system. Specifically, we will consider a scenario involving patient health risk modeling that has demonstrated racial disparities (Obermeyer et al., 2019). This tutorial will consist of a mix of instructional content and hands-on demonstrations using Jupyter notebooks. Participants will use the Fairlearn library to assess an ML model for performance disparities across different racial groups and mitigate those disparities using a variety of algorithmic techniques. Participants will also learn how to explore, document, and communicate fairness issues, drawing on resources such as datasheets for datasets and model cards.

Participants are expected to have intermediate Python skills and familiarity with Scikit-Learn. For maximal benefit, participants should have some experience training and evaluating supervised models in Python.


Documenting your code: from docstrings to automated builds


Thursday 1:30 p.m.–5 p.m. in tutorials - 250DE

If it isn't documented, it doesn't exist.

Documentation can make or break a project. Getting it right takes effort, but that effort doesn't have to be painful. In this tutorial, we will take a multi-stage approach to documentation, starting with the fundamentals, adding complexity and style, then finishing with automated publishing to the web. We will practice a maintainer-friendly workflow that smooths out some of the rough edges of creating documentation.

It is never too early or too late to pick up good documentation techniques and tools. As such, this tutorial will have elements that are relevant to brand new Pythonistas (What does a good docstring look like? What is a type hint?) as well as long-time practitioners (How can I make my docs easier to maintain? Where can I host docs? How can I test examples in my docstrings?).

We will cover code comments, docstrings, and type annotations as ways to add documentation within your code. Next, we will add a user interface and documentation prose layer with JupyterBook, Jupyter Notebooks, and Markdown. After that, we will use Sphinx to build API documentation. Finally, we will automate the build and publish steps with GitHub Actions and GitHub Pages.


Awesome Modern Web Testing with Playwright


Thursday 1:30 p.m.–5 p.m. in tutorials - 250F

Everybody gets frustrated when web apps are broken, but testing them thoroughly doesn't need to be a chore. Playwright, a new open-source browser automation tool from Microsoft, makes testing web apps fun! Playwright outperforms other tools like Selenium WebDriver with a slew of nifty features like automatic waiting, mobile emulation, and network interception. Plus, with isolated browser contexts, Playwright tests can set up much faster than traditional Web UI tests.

In this tutorial, we will build a Python test automation project from the ground up. We will automate web search engine tests together step-by-step using Playwright for interactions and pytest for execution.

Specifically, we will cover:

  1. How to install and configure Playwright
  2. How to integrate Playwright with pytest, Python’s leading test framework
  3. How to perform interactions through page objects
  4. How to conveniently run different browsers, capture videos, and run tests in parallel

By the end of this tutorial, you'll be empowered to test modern web apps with modern web test tools. You'll also have an example project to be the foundation for your future tests. You can use Playwright to test Django apps, Flask apps, or any other kinds of apps!


Python under the hood: What so special about Python Objects?


Thursday 1:30 p.m.–5 p.m. in tutorials - 251AB

Become a stronger and more confident Python programmer by learning the fundamentals of Python objects.

If you've ever asked:

  • Why do I have to pass a string in the function len(string) as an argumnet, when if I want to convert the string to upper case I use string.upper()?
  • With a pandas dataframe why do I sometimes need brackets for example df.describe() as apposed to df.shape?
  • What do people really mean when they say "everything in Python is an object?"

Then this tutorial is for you!

By the end of this tutorial, you will be able to build Python objects from scratch, leverage the magic of Python dunder methods (double underscore, like __len__) and extend existing classes to add functionality. These skills will expand your understanding of Python objects (and afterall, everything in Python is an object) so that you become more confident in writing Python programs.

Please note: this tutorial will not cover:
- Advanced object oriented principles
- Object oriented design patterns


Norvig's lispy: beautiful and illuminating Python code

Peter Norvig of Stanford University wrote lis.py: an interpreter for a subset of the Scheme dialect of Lisp in 132 lines of readable Python. I took Norvig's code, updated it to modern Python coding style, and integrated it into a Jupyter notebook that provides explanations as well as interactive experiments and exercises checked automatically.

Why should you study lis.py? This is what I got out of it:

  • Learning how an interpreter works gave me a deeper understanding of Python and programming languages in general—interpreted or compiled.
  • The simplicity of Scheme is a master class of language design.
  • lis.py is a beautiful example of idiomatic Python code.

Building an interactive data visualization web app with Streamlit

A visualization says a thousand words, and this is especially true in data science. In this tutorial, we are going to be using a Python-based framework called Streamlit to build an interactive web app that visualizes New York City real estate data. This is a hands-on tutorial that involves live-coding and building an app in Python from scratch.


Easy peasy Async I/O: let's write Python code that runs fast without all the complicated names

Async what? This tutorial is about something called async I/O. In a nutshell, it lets you write code that can do stuff while another part of your code is waiting for a download or something from the database, for example. In other words, when you have code that has to download a bunch of files, you don't have to wait for one download to finish to start the other or even to begin to process one of the downloaded files.

We will write a simple scrapper that downloads data from the internet, clean it and save it to a database. Next, we will refactor this code using Python's async I/O tools to learn three things: scenarios this module can help you (and those where it cannot), how to use this module (yes, let's import form asyncio), and how faster your code can run compared to non-async I/O code.


An Introduction to Privacy-Preserving Computation and AI

The importance of data and AI is widely recognized in every field, ranging from health care, finance, to personal shopping recommendations. As collection and use of data is becoming more common, the importance of information secrecy and privacy becomes even more important. Private and privacy-preserving computation and AI is an active and exciting field of research with immense practical use cases and implications.

In this tutorial we’ll have an interactive hands-on look at different privacy-preserving and private computation techniques, such as Differential Privacy, Federated Learning, and Multiparty Computation. We’ll explore libraries and frameworks such as Diffprivlib, Opacus, PySyft, Flower, and CypTen. We’ll also briefly discuss other mechanisms such as Homomorphic Encryption, Private Set Intersection, hardware enclaves, and trusted execution environments.