We all want to do great work with real data, determine causality and generate predictions, but determining the appropriate methods for a situation, and doing estimation on the relationships between variables is hard. In this tutorial, we’ll take a deep dive into regression analysis, which is a set of statistical methods for estimating relationships between dependent (or outcome) variables and independent variables (or covariates or regressors or predictors).
We’ll start with the method of linear regression (finding a best-fit line for a data), then move to regressions more appropriate for settings where the dependent variable takes a particular given set of values, such as binary outcomes and logistic regressions. Along the way, we’ll pay close attention to statistical definitions that teach us what we can and can’t infer from our data regarding real world causality. We’ll also talk about scientific writing, and how to accurately document and communicate our results. The tutorial will feature a mix of instruction and discussing the mathematical background of each regression model, and hands-on exploration of real datasets and real world data questions using Jupyter notebooks.