Propensity-Score Matching Is the Bedrock of Causal Statistics

And how to get started with it using Python

Dec 20, 2024

∙ Paid

TL;DR: Data can tell us a lot about what’s happening in the world. It tells us less about why it happened—unless one uses dedicated frameworks to understand this. Propensity-Score Matching (PSM) is a well-established approach to explaining causality in data. It is not without its limitations, but overall it is very powerful. We will cover its theory, how to implement it in Python, and some recent advancements in causal statistics that complement PSM.

Can training programs cause more economic prosperity? Image generated with Leonardo AI

Much of contemporary data science answers the question “What’s going on?” At my firm, for example, we often try to spot how well a company is performing, and how one performance indicator is tied to another through correlations.

A more powerful question worth answering would be “Why is this happening?” For example, if we detect a significant correlation between the presence of women in management and a company’s revenues, what is cause and what is effect here? Or, if people undergo a training program, will this cause their performance to improve? Or would better-performing people want to undergo a training program, and hence we only see an effect due to selection bias?

Several approaches exist to pinpoint causal relationships in data science. Propensity-Score Matching (PSM) is one of the older ones, having emerged around 40 years ago. Other methods like Structural Equation Modeling arose at the same time. Approaches like Instrumental Variables arose several decades before. Causal statistics is still a very active field, with many new methods being developed.

A key advantage of PSM is that it allows researchers to work with real-world data. In particular, it allows to work with non-randomized data. This refers to data where a certain treatment has not been applied randomly. For example, high performers might seek a professional training more often than low performers. Hence, all the performance gains cannot be attributed to the training alone. PSM allows to take such effects into account.

This is extremely valuable in real-world settings where one can’t cheaply devise an experiment to apply a certain treatment. Recruiting thousands of people to follow through some professional training and gauging their success against a control group that did not take this training just takes too long and is too costly.

Data professionals who work with real-world data but do not have their grips on causal statistics are really missing out on a lot of potential insights. Such insights can be transformative for many companies. In many ways, learning causal statistics is not dissimilar to learning other branches of data science; in other words, it is well worth pursuing.

We’ll look into the theoretical underpinnings of PSM and apply the thinking to a key dataset (you guessed correctly that it’s about professional training programs). We will then discuss how to implement this practically with Python before finishing off with some perspectives for the future of PSM.

Propensity-Score Matching, or How to Estimate Treatment Effects

Propensity-Score Matching is not as elemental and easy-to-grasp as a linear regression. As such, it is important to first understand what propensity scores are, and how to match them. We will also discuss some advantages and limitations of PSM at this point.

Propensity Scores

A propensity score is quite simply the probability of receiving some treatment T given a set of observed covariates X (i.e. variables that influence this probability). Mathematically expressed, one can write:

\(e(X) = P(T=1|X),\)

where e is the propensity score for receiving treatment (T=1 means treatment received, T=0 no treatment).

If one conditions the propensity score, then one can make the treatment and the non-treatment group comparable. This relies on good knowledge of the influence of covariates X on T though.

One can estimate such scores with simple logistic regression models, or, when a lot of data is available, machine learning algorithms like decision trees or boosted vector machines.

Matching

Now, one can mathematically control for the influence of X. This is called matching. Several matching techniques are available:

Nearest Neighbor Matching: Each treated unit is matched with the untreated unit with the closest propensity score.
Caliper Matching: Only matches an untreated unit with a treated unit if the latter is in the range (also called caliper) of this unit’s propensity score.
Stratification or subclassification: The space of possible propensity scores is divided into bins and the treatment effects within each bin (also called stratum) are estimated separately.

The goal of all these techniques is to tease out the treatment effect by keeping the influence X as similar as possible between treatment and control units.

Advantages and Limitations of PSM

PSM is powerful because it avoids selection bias; all kinds of covariates X can be taken into account after all. This makes applicable in all kinds of domains, ranging from economics over public policy to healthcare.

Keep reading with a 7-day free trial

Subscribe to Wangari Digest to keep reading this post and get 7 days of free access to the full post archives.