Tutorials


Intro to Hugging Face: Fine-tuning BERT for NLP tasks

Wednesday 9 a.m.–12:30 p.m. in 250D

You’ve heard about ChatGPT’s conversational ability and how DALL-E can create images from a simple phrase. Now, you want to get your hands dirty training some state of the art (SOTA) deep learning models. We will use Jupyter notebooks to fine-tune an NLP model based on BERT to do sentiment analysis.

In this hands-on tutorial, we will learn about using HuggingFace models from pre-trained open-source checkpoints and adapting these models to our own specific tasks. We will see that using SOTA NLP and computer vision models has been made easier with a combination of HuggingFace and PyTorch.

At the end of this session, you will know how to fine-tune a large public pre-trained model to a particular task and have more confidence navigating the deep learning open source landscape.


Intro to Python for Brand New Programmers

Wednesday 9 a.m.–12:30 p.m. in 250AB

Brand new to programming and want to get some hands-on Python experience? Let's learn some Python together!

During this tutorial we will work through a number of programming exercises together. We'll be doing a lot of asking questions, taking guesses, trying things out, and seeking out help from others.

In this tutorial we'll cover:

  • Types of things in Python: strings, numbers, lists
  • Conditionally executing code
  • Repeating code with loops
  • Getting user input

This tutorial is intended to ease you into Python. Each exercise section is careful not to assume prior programming knowledge.

I expect you to have experience typing on computers and to have rudimentary math skills (just arithmetic). I am not expecting you to have experience with programming. We will define new terms as we use them

You'll leave this tutorial, having written a couple small programs Python yourself. Hopefully you'll also leave with a bit of excitement about what Python can do and curiosity to keep diving deeper.


Web Development With A Python-backed Frontend: Featuring HTMX and Tailwind

Wednesday 9 a.m.–12:30 p.m. in 250E

Want to bring hypermedia into your web design workflow, ditching the complexity of JSON-over-HTTP for a more RESTful approach?

Create and design your web application with htmx and spark joy in your design process. Splash in a little Tailwind CSS, too. (Ssshh. You're a Full-Stack Developer now.)


Data of an Unusual Size: A practical guide to analysis and interactive visualization of massive datasets

Wednesday 9 a.m.–12:30 p.m. in 250F

While most folks aren't at the scale of cloud giants or black hole research teams that analyze Petabytes of data every day, you can easily fall into a situation where your laptop doesn't have quite enough power to do the analytics you need.

"Big data" refers to any data that is too large to handle comfortably with your current tools and infrastructure. As the leading language for data science, Python has many mature options that allow you to work with datasets that are orders of magnitudes larger than what can fit into a typical laptop's memory.

In this hands-on tutorial, you will learn the fundamentals of analyzing massive datasets with real-world examples on actual powerful machines on a public cloud – starting from how the data is stored and read, to how it is processed and visualized.

You will understand how large-scale analysis differs from local workflows, the unique challenges associated with scale, and some best practices to work productively with your data. By the end, you will be able to answer:

What makes some data formats more efficient at scale? Why, how, and when (and when not) to leverage parallel and distributed computation (primarily with Dask) for your work? How to manage cloud storage, resources, and costs effectively? How interactive visualization can make large and complex data more understandable (primarily with hvPlot)? How to comfortably collaborate on data science projects with your entire team?

The tutorial focuses on the reasoning, intuition, and best practices around big data workflows, while covering the practical details of Python libraries like Dask and hvPlot that are great at handling large data. It includes plenty of exercises to help you build a foundational understanding within three hours.


Going beyond with Jupyter Notebooks: Write your first package using literate programming

Wednesday 9 a.m.–12:30 p.m. in 251AB

Literate programming is a programming paradigm that incorporates explanations in natural language (such as Spanish) embedded with the traditional code. Literate programming allows developers to tell a story with their codes, improving the understanding of the project, focusing on documentation, and making it easier to onboard developers. Although being a very well-regarded concept discussed by respected researchers like Donald Knuth, literate programming tools like Jupyter notebooks are considered inefficient for serious software development. This perception has limited Jupyter notebooks to simple python scripts and educational materials.

The Nbdev library has proven that literate programming is useful in developing big and serious projects, like FastAi. This tutorial will show attendees how to get the benefits of literate programming while also following software development best practices. We'll get hands-on experience in writing and publishing a Python Package while using Jupyter Notebooks. In addition to publishing the package, we'll also learn how to deploy the docs, run simple tests and run these tests on CI/CD, making sure that our package will only get published if the tests pass.

Even though this tutorial uses Jupyter Notebooks and Nbdev the student doesn't need previous knowledge of these tools. A simple computer with Python and pip installed is all we'll use. Students should have some minimal Python knowledge and Git experience (Simple commands like push, pull, add and commit). A GitHub account will also be necessary.


Build a production ready GraphQL API using Python

Wednesday 9 a.m.–12:30 p.m. in 250C

This workshop will teach you how to create a production ready GraphQL API using Python and Strawberry. We will be using using Django as our framework of choice, but most of the concept will be applicable to other frameworks too.

We'll learn how GraphQL works under the hood, and how we can leverage type hints to create end to end type safe GraphQL queries.

We'll also learn how to authenticate users when using GraphQL and how to make sure our APIs are performant.

If we have enough time we'll take a look at doing realtime APIs using GraphQL subscriptions and how to use GraphQL with frontend frameworks such as React.


Lunch

Wednesday 12:30 p.m.–1:30 p.m. in 155ABC

Meat option

Greek Chicken Power Bowl

Grilled Chicken, Mixed Greens, Cucumber, Tomato, Red Onion, Chickpeas, Feta, Quinoa, Greek Vinaigrette

Vegetarian Option

Mediterranean Spinach & Quinoa Salad

Baby Spinach, Quinoa, White Beans, Raisins, Oranges, Feta Cheese, Orange Vinaigrette

Vegan Option

Vegan Power Bowl

Butternut Squash, Roasted Beet Medley, Walnuts, Craisins, Winter Greens, Balsamic Vinaigrette

Gluten Free Option

Greek Chicken Power Bowl

Chicken, Cucumber, Tomato, Red Onion, Chickpeas, Feta, Quinoa, Greek Vinaigrette

Kosher Option

*Kosher meals are provided by a separate vendor and ingredients are subject to change but guaranteed kosher


Data analysis with SQLite and Python

Wednesday 1:30 p.m.–5 p.m. in 250C

SQLite is the world's most widely used database and has been a part of the Python standard library since 2006. It continues to evolve and offer more capabilities every year.

This tutorial will transform you into a SQLite power-user. You'll learn to use SQLite with Python, level up your SQL skills and take advantage of libraries such as sqlite-utils and tools such as Datasette to explore and analyze data in all kinds of shapes and sizes.

This hands-on tutorial will cover:

  • The sqlite3 module in the Python standard library
  • A review of SQL basics, plus advanced SQL features available in SQLite
  • Using sqlite-utils for advanced manipulation of SQLite databases
  • Datasette as a tool for exploring, analyzing and publishing data
  • Applying the Baked Data architectural pattern to build a data application using Datasette and deploy it to the cloud

This tutorial is aimed at beginner-to-intermediate Python users with some previous exposure to basic SQL.

Attendees will leave this workshop with a powerful new set of tools for productively exploring, analyzing and publishing data.


Writing Serverless Python Web Apps with PyScript

Wednesday 1:30 p.m.–5 p.m. in 250D

Python web applications running in the browser or mobile without a Server? What seemed to be a dream just a few months ago is not possible thanks to PyScript. It is now possible to write websites, apps, and games running entirely on the browser.

This Tutorial is an introduction to PyScript and will walk you through learning about basic concepts, how to set up your project and create amazing applications. More specifically: - Create your project configuration - Define a python environment with all the dependencies to run your code - Loading and manipulating with user data - Writing your python code - Accessing the DOM and other browser features - Using Javascript libraries from your Python application - Optimizing your application - Look at what’s different between “standard” python vs. Python on the browser - Have a lot of fun and hack together on your ideas!


Comprehending comprehensions

Wednesday 1:30 p.m.–5 p.m. in 250AB

Comprehensions are one of the most important — and misunderstood — parts of Python. In this tutorial, I'll walk you through comprehensions, including how to write them, and why you would want to do so. By the time you finish this tutorial, you'll fully understand list, set and dict comprehensions, as well as nested comprehensions and generator expressions. You'll understand the differences between regular "for" loops and comprehensions, and where to use them.


Streamlit for Python - How to create beautiful interactive GUIs and web apps

Wednesday 1:30 p.m.–5 p.m. in 250F

In this 3.5 hour tutorial, attendees will learn how to use the streamlit library in Python to create interactive graphical user interfaces (GUIs) for their data science projects. Through a series of hands-on exercises, attendees will gain practical experience using streamlit to build and customize their own interactive GUIs.

The tutorial will begin by introducing attendees to the basics of streamlit, including how to install and set up the library, as well as the key concepts and components of streamlit applications. Attendees will then learn how to use streamlit to create simple, yet effective, GUIs for their data science projects, including how to display and interact with data, add text and images, and create custom layouts and widgets.

As the tutorial progresses, attendees will have the opportunity to work on more advanced topics, such as using streamlit to create custom interactive plots and charts, and integrating streamlit with other popular libraries such as Pandas and Altair. By the end of the tutorial, attendees will have a solid understanding of how to use streamlit to create effective and engaging interactive GUIs for their own data science projects.

The tutorial will be led by an experienced data scientist with a strong background in Python and streamlit, and will include plenty of hands-on exercises to help attendees apply what they learn in a practical setting. Attendees will also have access to detailed tutorial materials and code samples, as well as support from the instructor and other attendees throughout the tutorial.


Beyond the Basics: Data Visualization in Python

Wednesday 1:30 p.m.–5 p.m. in 251AB

The human brain excels at finding patterns in visual representations, which is why data visualizations are essential to any analysis. Done right, they bridge the gap between those analyzing the data and those consuming the analysis. However, learning to create impactful, aesthetically-pleasing visualizations can often be challenging. This session will equip you with the skills to make customized visualizations using Python.

Section 1: Getting Started With Matplotlib

While there are many plotting libraries to choose from, the prolific Matplotlib library is always a great place to start. Since various Python data science libraries utilize Matplotlib under the hood, familiarity with Matplotlib itself gives you the flexibility to fine tune the resulting visualizations (e.g., add annotations, animate, etc.). Moving beyond the default options, we will explore how to customize various aspects of our visualizations. Afterward, you will be able to generate plots using the Matplotlib API directly, as well as customize the plots that other libraries create for you.

Section 2: Moving Beyond Static Visualizations

While static visualizations are limited in how much information they can show, animations make it possible for our visualizations to tell a story through movement of the plot components (e.g., bars, points, lines), which can encode another dimension of the data. In this section, we will focus on creating animated visualizations before moving on to create interactive visualizations in the next section.

Section 3: Building Interactive Visualizations for Data Exploration

When exploring our data, interactive visualizations can provide the most value. Without having to create multiple iterations of the same plot, we can use mouse actions (e.g., click, hover, zoom, etc.) to explore different aspects and subsets of the data. In this section, we will learn how to use HoloViz to create interactive visualizations for exploring our data utilizing the Bokeh backend.


Feature Engineering is for Everyone!

Wednesday 1:30 p.m.–5 p.m. in 250E

In Machine Learning, features are the inputs for a machine learning model. Feature Engineering is the process of creating features from raw data and selecting the best features for a model. However, it is not just a tool for Data Scientists - Data Analysts and Developers can use it too.

In this tutorial, we will create features that can be used for creating Data Visualizations, Rules Based Automations, and Machine Learning Models. Attendees will learn how to explore, create and select features from various data types such as “discrete/categorical” and “continuous” data. Attendees will learn techniques such as One-hot encodings for categories, text vectorization, date manipulation and more.

By the end of this tutorial, attendees will understand how to create features for their projects.


Exploring Eco topics with Python

Thursday 9 a.m.–12:30 p.m. in 250D

From Deforestation to Wildlife Trade to Carbon Polluters, learn how to use Python to explore current Eco topics!

As Earth's sustainability edges ever closer to tipping point, it has never been more important for us inhabitants to be aware of the impact we have on the environment, and the deteriorating state of our planet. This tutorial will democratize access to practical Python skills for the relevant sciences, applying these skills to pressing Eco issues, and ultimately empower non-subject experts with working proficiency of relevant open-source tools for discovering more facts about our natural world, at a time when disinformation is rife.

  • Key Python takeaways: intro to and/or application of numpy, pandas, matplotlib, networkx, geopandas, xarray, and rioxarray.
  • Format: interactive computer lab, with attendees working hands-on through pre-prepared Jupyter Notebook content at a group pace led by the instructor.
  • Audience: no prior Eco/Scientific domain knowledge or experience with the Python packages being taught required, but attendees must have basic Python programming proficiency and ability to set-up access to JupyterLab with the required mainstream dependencies (as per instructions provided in advance).

The How and Why of Object-oriented Programming in Python

Thursday 9 a.m.–12:30 p.m. in 250AB

Python supports multiple programming paradigms. You can write procedural programs, use functional programming features, or use the full power object-oriented programming.

In this tutorial you will learn how to get the most out of object-oriented programming. After this tutorial you will able to:

  • design you own objects
  • take advantage of inheritance for code re-use
  • implement special methods for pythonic objects
  • convert programs from a procedural to a object-oriented approach

This tutorial is based on a small but comprehensive example. Starting from a procedural solution with lists and dictionaries, the tutorial gradually introduces how to create own objects to solve the same problem. The next steps introduce the concepts of inheritance and special methods to take full advantage of object-oriented programming.


Power up your work with compiling and profiling

Thursday 9 a.m.–12:30 p.m. in 250C

Have you been troubled by Python code that took too long to run? Do you want to know why and how to improve?

In this workshop, we will introduce Numba - a JIT compiler that is designed to speed up numerical calculations. Most people found all of it is like a mystery - It sounds like magic, but how does it work? Under what conditions does it work? And because of it, new users found it hard to start using it and it requires a steep learning curve to get the hang of it. This workshop will provide all the knowledge that you need to make Numba works for you.

This workshop is for Data scientists or developers who have math-heavy code that would like to speed up with the benefit of Numpy and Numba.


Building human-first and machine-friendly CLI applications

Thursday 9 a.m.–12:30 p.m. in 250E

Command line tools have 2 audiences:

  • humans using it directly
  • other tools, scripts working together with it

In this tutorial, you'll learn how to build CLI applications for both of these user groups. We'll get to know the Command Line Interface Guidelines (https://clig.dev/), an open source, language-agnostic collection of principles and guidelines. We'll build an application following those principles in Python, using typer and Rich.

Our short-term goal for this workshop is to build a CLI catalogue of past PyCon talks. The long-term goal is to provide tools (incl. code snippets and checklists) that you can use for your own CLI applications.


Getting Started with Polars

Thursday 9 a.m.–12:30 p.m. in 250F

Have you heard of this Polars thing? How is it different from Pandas? Do you want to check it out?

In this workshop you will get exposed to Polars with a real-world dataset.

You will see:

  • Common operations
  • Differences with Pandas
  • How to speed up your data pipeline
  • Feature gaps that you might miss coming from Pandas

You will be provided with a notebook and labs to explore Polars.


Publishing your Python project, the conda way

Thursday 9 a.m.–12:30 p.m. in 251AB

Conda is an open source, language agnostic package and environment management system for Linux, macOS, and Windows. The conda ecosystem, including the conda-forge package repository, is widely used to install, run and update packages and their dependencies.

In this tutorial you will learn how to create a full-fledged and easy to install Python software package using conda. We will start by introducing software packaging and packaging concepts (package dependencies, open source licensing, ...), and an introduction to the conda ecosystem and how conda implements software packaging.

Most of our time will be spent doing a hands-on walk through of how to prepare a Python software package for conda, and then how to submit that package to the conda-forge, a widely used community driven package repository.

The workshop is a hands-on workshop, where participants use their own laptops to prepare a full-fledged Python software package that is submission-ready for the conda-forge package repository. Participants need to bring a WiFi enabled laptop with a web browser, a command line interface, a text editor program, and git and/or a GitHub client already installed.

Workshop participants will gain a basic understanding of software packaging, and how to prepare and publish their packages in the conda ecosystem.


Lunch

Thursday 12:30 p.m.–1:30 p.m. in 155ABC

Meat Option

Beef Stir Fry

Beef, Stir Fried Vegetables, Rice w/Peas & Carrots, (GF) Teriyaki Sauce

Vegetarian Option

Roasted Vegetable Stir Fry

Stir Fried Vegetables, Rice w/Peas & Carrots, (GF) Teriyaki Sauce

Vegan Option

Marinated Tofu Stir Fry

Marinated Tofu, Stir Fried Vegetables, Rice, (GF) Teriyaki Sauce

Gluten Free Option

Ginger Steak & Quinoa Salad

Beef, Romaine, Endive, Red Lead Lettuce, Quinoa, Carrots, Green Onions, Asparagus, Green Olives, Pickled Red Onion, Lemon Oregano Vinaigrette, Parsley, Garlic, Thyme, Rosemary, Olive Oil, Rice Vinegar, Chili Flakes

Kosher Option

*Kosher meals are provided by a separate vendor and ingredients are subject to change but guaranteed kosher


How To Troubleshoot and Monitor Production Applications using OpenTelemetry

Thursday 1:30 p.m.–5 p.m. in 250D

OpenTelemetry is a free, open-source Observability Protocol. OpenTelemetry sits at the application layer, and exports Traces, Metrics, and Logs to a backend for observing. It is extremely helpful and beneficial to developers in mean "time-to-detection" and "time-to-resolution" of bugs and issues that occur at the application layer; this ranges from detecting and alerting for errors raised (such as TypeError), to finding that a specific microservice (such as AWS Lambda) ran for twice as long as usual, all the way to seeing the output of a service and comparing it to the expected output to find a bug in the logic of the service.
This tutorial is geared towards beginner/intermediate Python developers, who have some experience in Python, its syntax, and very minimal experience in Requests and Flask is needed (extremely popular libraries, with 50k and 60k stars on GitHub, respectively). No OpenTelemetry experience is needed at all. This is a total and complete introduction into OpenTelemetry, consisting instrumenting your first application, viewing your first traces and metrics, and if time-allows then deploying your first Jaeger instance locally (no experience is needed, only Docker desktop), to allow students of this workshop tutorial to build their own in-house observability platform, be-it for their selves or employers.
It is important that every developer have at least a solid understanding of Traces, Metrics, and Logs, which we know today as the three pillars of observability. These are the foundational building blocks for monitoring Production environments at the application layer. The extended base workshop is available here and the base slides are available here. Thank you.


Building a Model Prediction Server

Thursday 1:30 p.m.–5 p.m. in 250E

In predictive modeling, training a model is only half the battle; predictions typically need to be “served” to other systems in production via an API or similar interface.

In this tutorial we’ll start with a trained scikit-learn model and build a working FastAPI application to deliver its predictions in realtime. No prior experience with API development is expected.


Eroding Coastlines: A Geospatial & Computer Vision Analysis

Thursday 1:30 p.m.–5 p.m. in 250F

Attendees will gain hands-on experience exploring satellite imagery and using Python tools for geospatial data analysis. They will apply what they’ve learned to identify & analyze instances of coastal erosion, one of the most pressing environmental & humanitarian challenges facing our planet today.


Introduction to Property-Based Testing

Thursday 1:30 p.m.–5 p.m. in 251AB

Has testing got you down? Ever spent a day writing tests, only to discover that you missed a bug because of some edge case you didn’t know about? Does it ever feel like writing tests is just a formality - that you already know your test cases will pass?

Property-based testing might be just what you need!

After this introduction to property-based testing, you’ll be comfortable with Hypothesis, a friendly but powerful property-based testing library. You’ll also known how to check and enforce robust properties in your code, and will have hands-on experience finding real bugs.

Where traditional example-based tests require you to write out each exact scenario to check - for example, assert divide(3, 4) == 0.75 - property-based tests are generalised and assisted. You describe what kinds of inputs are allowed, write a test that should pass for any of them, and Hypothesis does the rest!

from hypothesis import given, strategies as st

@given(a=st.integers(), b=st.integers())
def test_divide(a, b):
   result = a / b
   assert a == b * result

There’s the obvious ZeroDivisionError, fixable with b = st.integers().filter(lambda b: b != 0), but there’s another bug lurking. Can you see it? Hypothesis can!

Audience: This tutorial is for anybody who regularly writes tests in Python, and would like an easier and more effective way to do so. We assume that you are comfortable with traditional unit tests - reading, running, and writing; as well as familar with ideas like assertions. Most attendees will have heard "given, when, then" and "arrange, act, assert". You may or may not have heard of pre- and post-conditions - we will explain what "property-based" means without reference to Haskell or anything algebraic.


Introduction to Decorators: Power Up Your Python Code

Thursday 1:30 p.m.–5 p.m. in 250AB

You can use decorators in your Python code to change the behavior of one or several functions. Many popular libraries are based on decorators. For example, you can use decorators to register functions as web endpoints, mark functions for JIT compilation, or profile your functions.

Using decorators makes your code simpler and more readable. However, to unlock the full capability of decorators, you should also be comfortable writing your own. In this tutorial, you'll learn how decorators work under the hood, and you'll get plenty of practice writing your own decorators.

You'll be introduced to necessary background information about how functions are first-class objects in Python and how you can define inner functions. You'll learn how to unwrap the @decorator syntactic sugar and how to write solid decorators that you can use in your code.

Being comfortable with using and creating decorators will make you a more efficient Python programmer.


You CAN teach an old doc new tricks: Automate your project documentation using Sphinx & GitHub Actions

Thursday 1:30 p.m.–5 p.m. in 250C

You've built an awesome API; time to give it some exposure! But, how do you keep a documentation website up-to-date as your code evolves? This tutorial will teach you how to write, generate, host, automate and version your documentation easily so it becomes part of your software development life cycle.


Building scalable web applications in Python

This talk is for developers who've written a web application and are ready for it to hit the big time. In this tutorial, we'll take an existing web application and redesign it to be scalable for thousands of users. We'll cover how to design scalable web apps, how to test user load, how to refactor applications safely and how to optimise web applications with caching.


Writing Your First Interpreter With Python

To appreciate how programming languages and more specifically interpreters like Python work, this tutorial is a quick step-by-step guide into the basics of the implementation of an interpreter using Python, another high level language.

We will use the RPython framework from the PyPy project to easily prototype an interpreter in a few lines of code. The interpreter will automatically get a free just in time (JIT) compiler and garbage collector from the framework.

The tutorial will cover introduction theory to programming language design and implementation, Just in time compilation and garbage collection. This theory will in turn be used in a hands-on implementation session of a simple Python interpreter implemented in Python using the RPython framework.

Participants are required to have a laptop with Python or PyPy installed. This tutorial is intended for Python programmers who are curious about the internals of Python and interested in understanding these internals through a practical session by building a simple interpreter.

Participants should be familiar with Python syntax at an intermediate level. Knowledge on compilers and programming languages is not required, we will try to have a quick introduction of programming language theory.

Participants will leave the tutorial with an understanding of the basics of implementation of a programming language including garbage collection and JIT compilation.