Talks: It might look normal but this distribution will ruin your stats

Saturday - April 22nd, 2023 4:15 p.m.-4:45 p.m. in 255DEF

Presented by:


Experience Level:

Some experience

Description

Abstract

Some refer to the normal distribution as "God's curve" because of its supposed presence in nature when enough observations are collected. But what if I told you that there is a non-normal distribution that looks so normal that even experts can't see the difference? And beyond looks, it's a curve that is both prevalent in nature and likely to cause false negatives when testing hypotheses.

If you use Python for data analysis (e.g., summaries, explanations, predictions) this talk will (1) introduce you to surprising results and (2) provide you with the tools to overcome limitations with traditional hypothesis testing approaches.

Outline

  1. (5 min) This talk will begin with a short background on the normal curve and how it compares visually to a contaminated normal curve. This set's the stage for a live and interactive demonstration.

  2. (10 min) During the live demo, I'll use simple terms and easy-to-understand code to illustrate the effect of contamination on common statistics (e.g., mean, traditional hypothesis tests). Participants will be able to interact with the code by clicking a link.

  3. (10min) I will conclude by introducing Hypothesize: a peer-reviewed, open-source Python library for robust statistics based on Wilcox's package in R.

Hypothesize is the only Python library dedicated solely to robust statistics—and it is based on decades of curated research on statistics. Using modern resampling techniques and robust measures of central tendency, Hypothesize helps researchers minimize the effects of contamination and skew in their populations. These methods do not assume normality and are important tools for data scientists to have in their repertoire—they substantially improve power and accuracy when making predictions and explaining effects.