When you're working with real data, determining causality, which why something happened, or how things could have happened, is more advanced than a simple association and prediction. It requires making assumptions, a great dataset, and appropriate methods for a situation. In this tutorial, we'll introduce and define what causality means, the delicate assumptions you'll need to make about your data, and use Python to start answering causal questions.
We'll start by defining the goals of causal inference and treatment effect estimation, discussing this in the context of randomized experiments (the gold standard, where you have treatment and control groups), before moving to the challenging setting of observational data. We'll discuss when you can directly compute treatment effect estimates, and when you need to use statistical models, which can be simple linear regressions (finding a best-fit line for data). We'll then move to other popular methods for causal inference, like propensity score weighting. Along the way, we'll pay close attention to statistical definitions that make some methods better than others, and challenges to watch out for in real world datasets, or when running real world experiments. We'll also talk about scientific writing, and how to carefully communicate results. This tutorial will feature a mix of instruction, discussing the mathematical background of causal inference, and hands-on exploration of datasets and real world questions while introducing popular data science packages, like pandas, numpy, statsmodels or scikit-learn in Jupyter notebooks. Minimal Python and minimal math/stats background is fine to join this tutorial!