Friday 3:15 p.m.–3:45 p.m. in Global Center Ballroom AB

Analyzing Data: What pandas and SQL Taught Me About Taking an Average

Alex Petralia


“So tell me,” my manager said, “what is an average?” There’s probably nothing worse than that sinking feeling when you finish an analysis, email it to your manager or client to review, and they point out a mistake so basic you can’t even fathom how you missed it. This talk is about mine: how to take an average. Averages are something we use everywhere - it’s a simple np.mean() in pandas or AVG() in SQL. But recently I’ve come to appreciate just how easy it is to calculate this statistic incorrectly. We learn once - in middle school no less - how to take an average, and never revisit it. Then, when we are faced with multidimensional datasets (ie. pretty much every dataset out there), we never reconsider whether we should be taking an average the same way. In this talk, we follow my arduous and humbling journey of learning how to properly take an average with multidimensional data. We will cover how improperly calculating it can produce grossly incorrect figures, which can slip into publications, research analyses and management reports.