PyCon Pittsburgh. April 15-23, 2020.

Talk: Polyglot data with python: Introducing Pandas and Apache Arrow

Presented by:

Robson Luis Monteiro Junior

Description

Nowadays Python is synonymous of data, but not necessarily the best choice for some data tasks. For example, exchange data between different ecosystems is one of the challenges for Python. Pandas and NumPy are very efficient and de facto tools to deal with a reasonable amount of data with performance, but they are limited outside of the Python ecosystem. Acquire and exchange data might be painful due to the problem to write slow conversion code or generated unnecessary large files to talk with other ecosystems, likes large CSV files. Apache Arrow playing with Pandas is a great option as technologies that handle these problems with an excellent performance playing natively with Python. This talk aims to show how to work in a heterogeneous environment with data coming from another ecosystem, be handled inside the Python ecosystem and send back to another ecosystem transparently.

Video

Watch on YouTube