PyCon Pittsburgh. April 15-23, 2020.

Sponsor Workshop: Microsoft Azure: Easy Data Processing on Azure with Serverless Functions

Presented by:

Tania Allard, PhD

Description

Serverless computing (also known as function as a service, FaaS) is a design pattern where applications are hosted by a third-party service (i.e. Azure) eliminating the need for server software and hardware management by the developer.
Serverless can be an excellent alternative for Pythonistas interested in data processing as it allows them to focus on their code rather than the cloud infrastructure. This workshop will introduce attendees to Azure Functions for data processing scenarios (including data acquisition, cleaning and transformation and storage for subsequent usage).
After this tutorial, attendees will have had practical experience with Azure functions for data processing scenarios. Also, they will leave the workshop with a basic function for data processing that could be further modified/extended to suit their needs/requirements.
Outline
1. Introduction
a. Setup and install troubleshooting
b. Introduction to serverless and Azure functions - why serverless can be an excellent alternative for data processing scenarios
2. Creating your first Azure function
a. Creating a simple scheduled function - 101 azure function for Python developers
b. Using the VS Code extension - simplify your workflow with the VS Code extension and start with a solid base
c. Understanding triggers and bindings - how to schedule tasks with the “timer” trigger
d. Deploying to Azure - using the VS code extension and familiarising with the Azure portal
3. Data processing with serverless
a. Updating your scheduled function to collect data from third-parties APIs
b. Data cleaning, aggregation and storage - going from raw data to usable, clean data that can be readily accessed by your team members
4. Improving your Azure function experience - optional
a. CI / CD for Azure functions - using GitHub actions to deploy your functions automatically
b. Azure functions for reporting - integrate with Slack or Teams
c. Monitoring and troubleshooting functions
Pre-requisites

This workshop is aimed at folks interested in data processing, data engineering or data science. The goal is to provide a practical introduction to serverless for data processing scenarios.

We assume that attendees:
- Have intermediate Python knowledge
- Have some experience with data wrangling and/or data processing (not extensive experience required but have, for example, used libraries like pandas and requests for data wrangling and API access)
- Are comfortable using the command line/terminal (no need to be an expert but should be comfortable enough to navigate file systems and perform necessary Git tasks)

Software related:

  • Python 3.7 installed
  • VSCode installed
  • Azure Functions VS Code extension installed
  • Git installed
  • GitHub account

A detailed guide of the workshop setup/install instructions will be sent before the workshop.

Video

Watch on YouTube