top band

Thursday 9 a.m.–12:20 p.m.

Introduction to Spark with python

Orlando Karam

Audience level:
Intermediate
Category:
Other

Description

In this tutorial we will cover the basics of writing spark programs in python (initially from the pyspark shell, later with independent applications). We will also discuss some of the theory behind spark, and some performance considerations when using spark in a cluster.

Abstract

Spark is a distributed computing (big data) framework, considered by many as the successor to Hadoop. You can write Spark programs in Java, Scala or Python. Spark uses a functional approach, similar to Hadoop’s Map-Reduce. In this tutorial we will cover the basics of writing spark programs in python (initially from the pyspark shell, later with independent applications). We will also discuss some of the theory behind spark, and some performance considerations when using spark in a cluster.

Student Handout

No handouts have been provided yet for this tutorial

bottom band background