Change the future

Wednesday 5:15 p.m.–6:45 p.m.

Scalability is for Suckers, Performance is King: How to Speed Up A Python Program 114,000X (Disney)

David Schachter

Description

In 2011, David was asked to scale a big data reporting tool to handle growing data volume. Instead of clusterizing the algorithm and buying more servers, he employed basic, mechanical techniques to achieve a total speedup of 114,000X on one machine. The resulting system handles predicted data volumes for several years out, avoiding the need to run on a cluster and the resulting additional failure modes.

Abstract

In 2011, David was asked to scale a big data reporting tool to handle growing data volume. Instead of clusterizing the algorithm and buying more servers, he employed basic, mechanical techniques to achieve a total speedup of 114,000X on one machine. The resulting system handles predicted data volumes for several years out, avoiding the need to run on a cluster and the resulting additional failure modes. It is maintainable, extensible, and reliable, running for more than a year with no unscheduled downtime. Increasing the efficiency of the program saved the company time, money, and 3 am phone calls from the Operation Monitoring Center.

In this talk, David reminds experienced engineers of techniques for improving code efficiency, teaches them how to identify performance bottlenecks in their programs, and explains how to think about performance in the context of modern computer architecture. While teaching these techniques, he also relates important lessons from his 39 years of industry experience: that efficient code is a competitive advantage, that clusters are harder than we think, and that if the business works in real-time then the data should too.