Understanding The "Why": 10 Techniques for Causal Inference
With the right tools you can get some pretty deep insights
TL;DR: Understanding causation is crucial for making actionable business decisions, especially in complex fields like finance or sustainability, where correlation alone falls short. This article showcases 10 powerful causal inference techniques, from Granger Causality and Directed Acyclic Graphs to Propensity Score Matching and Counterfactual Analysis. Each technique is tailored for different types of data and questions. By leveraging these methods, data scientists can uncover the "why" behind observed relationships. The result of such methods are smarter strategies and more impactful business decisions.
In analyzing data at Wangari, one question has kept coming up over the past few months, both internally and from our clients: why?
While studying the correlations between the sustainability efforts of companies and their financials, we had found that in male-dominated industries, enlarging the percentage of women in management increases profitability.
The question is why. Do women managers somehow work harder or differently, thus causing higher profits? Or do higher profits motivate a company to hire more women in management? Or perhaps both are influenced by an underlying factor, like a progressive company culture?
Understanding causation, not just correlation, is essential for making informed business decisions. In fields like finance and sustainability, where data is abundant but complex, causal inference provides the tools to dig deeper into this.
It is already valuable for us to be able to go to several different companies (and their bankers) and tell them that, from a statistical point of view, they would be increasing their chances for future success by hiring more women managers. However, being able to tell them about an actual causal relationship would make this argument even more compelling.
We figured that we are not the only ones who had to go beyond a Data Science 101 and dig deeper into causal inference. In this article, we introduce ten powerful and popular techniques that help data scientists move from the "what" to the "why." We also cover general best practices for causal inference, and when to use which technique.
As we start getting our hands dirty ourselves, we will be sharing our experience with specific techniques (and combinations thereof, and many of our research findings!) in subsequent articles.
Causal Inference Is Challenging
Causal inference is fundamentally harder than correlation analysis because it requires untangling the intricate web of relationships between variables. In real-world datasets, multiple factors often interact, making it difficult to determine what’s driving what.
For instance, if higher profits and more women in management roles are correlated, the relationship could involve reverse causality, confounding factors, or even pure coincidence (the latter is unlikely though because this correlation is statistically significant). Identifying the true causal mechanism requires a deeper understanding of the context and structure of the underlying data.
One of the biggest hurdles is confounding variables—unobserved factors that influence both the cause and effect. For example, company culture might simultaneously drive diversity in leadership and better financial performance.
However, no company that we know of has found a good quantitative measure to assess company culture—thus it is hard to include it explicitly in our analysis. We have tried to involve proxies like R&D spending, turnover rates, employee productivity metrics, CEO compensation, strategic partnerships, health and safety metrics, and philanthropy metrics to rudimentarily quantify this.
Another challenge is the limitation of observational data. Unlike randomized controlled trials, which allow for clear causal interpretations, real-world data often lacks the experimental control needed to isolate cause and effect. Time-series data, common in finance and sustainability, adds another layer of complexity because trends and seasonality can create spurious correlations. We corrected for this by de-trending our data (more on that in a separate post).
Finally, many causal inference techniques rely on strong assumptions. Whether it's assuming stationarity in time-series analysis or the validity of instrumental variables, these methods depend heavily on how well the data aligns with their underlying principles. Navigating these challenges requires not only technical expertise but also a nuanced understanding of the domain to ensure that the insights are robust and actionable.
How to Approach Causal Inference
At its core, causal inference can be categorized into two broad approaches: experimental and observational. Experimental methods, like Randomized Control Trials (RCTs), manipulate the environment to isolate causal effects, but these are rare outside controlled settings. Observational methods, on the other hand, analyze existing data to identify causal relationships, often using statistical tools to adjust for confounders and biases. (We use the latter in our research.)
Modern causal inference draws on a diverse toolbox to handle various data structures and research questions. These include time-series methods like Granger Causality for temporal precedence, graphical models like Directed Acyclic Graphs (DAGs) to visualize causal pathways, and counterfactual approaches that simulate alternative outcomes. Machine learning has also expanded the horizons of causal inference, enabling techniques like Causal Forests to estimate heterogeneous effects across complex datasets.
Each method comes with its own assumptions and limitations. The key to success lies in choosing the right approach, or the right combination of approaches, for your data and research questions.
The 10 Techniques for Causal Inference
1/10: Granger Causality
Granger Causality is a statistical technique used to determine whether one time series can predict another. While it doesn’t establish true causality in the philosophical sense, it provides evidence of temporal precedence—whether changes in one variable consistently occur before changes in another.
The method involves comparing two models:
A model where the dependent variable (e.g., profits) is explained solely by its own past values (lagged terms).
A model where the dependent variable is explained by its own past values and the past values of another variable (e.g., percentage of women in management).
If including the second variable significantly improves the model's predictive accuracy, then that variable is said to "Granger-cause" the dependent variable. Mathematically, it’s tested using an F-statistic to compare the goodness-of-fit between the two models.
Granger Causality is ideal for time-series data with consistent intervals (e.g., annual, or quarterly corporate reports). It is good at identifying predictive relationships between variables over time, and at exploring hypotheses where temporal precedence is critical, such as "Does hiring more women in management lead to higher profits?"
Before implementing this technique, one must be mindful of some key assumptions:
Stationarity: The time series should have a constant mean and variance over time. If not, you may need to de-trend or difference the data.
Lagged Relationships: The causation operates through time-lagged effects. You must specify the number of lags to test, often determined via statistical criteria like AIC or BIC.
No Confounding: It doesn’t account for unobserved confounders that could drive both variables.
In comparison to simple time lag correlations, where one might calculate the correlation between the percentage of women in management and the profits a few years after the percentage has changed, Granger Causality offers a more robust and rigorous framework for analyzing predictive relationships. Its strength is its ability to account for autocorrelation, test multiple lags, and evaluate statistical significance. This makes it superior for uncovering causal (predictive) dynamics in time-series data.
As with all good things, this technique does come with some limitations. Granger Causality only implies predictive causation, and hence does not prove causal mechanisms. If an unobserved third variable (a so-called confounder) drives both series, the results can be misleading. Finally, the results can vary based on the number of lags selected (for example, two or three years). Careful testing is thus needed to make sure that any result is sound.
2/10: Difference-in-Differences (DiD)
Difference-in-Differences (DiD) is a causal inference method used to estimate the effect of a treatment or intervention by comparing changes in outcomes over time between a treated group and a control group. Widely used in economics and social sciences, DiD is particularly valuable for evaluating policy changes or corporate interventions when randomized experiments are infeasible.
In essence, DiD compares a before-state and an after-state to pinpoint the causal effect of some treatment. There is a treated group which experiences the intervention, e.g., a company implementing a gender diversity program. And there is a control group which does not experience the intervention but is otherwise similar to the treated group.
If there are significant differences in the before- and after-state of the two groups, then these can be attributed to the intervention. Its significance can be tested with a p-value. The basic formula for the treatment effect is:
The variables in the formula are explained the following way:
Y: Outcome of interest (e.g., profits).
treated: Group that received the treatment/intervention.
control: Group that did not receive the treatment/intervention.
post: Outcome after the intervention.
pre: Outcome before the intervention.
DiD is useful for evaluating the causal impact of interventions like policy adoption or regulatory changes. It shines when scenarios where treatment is not randomized but there’s a clear division between treated and untreated groups. Best for it is time-series or panel data with observations before and after the intervention.
Some limitations of this technique are parallel trends, the non-accounting for spillover effects, and the need for consistent composition. That the treated and control group must have experienced the same outcome in the absence of treatment, or else the DiD technique is not valid (parallel trends). If one group influences the other, DiD cannot work either (no spillover effects). And the two groups must remain comparable over time (consistent composition).
Keep reading with a 7-day free trial
Subscribe to Wangari Digest to keep reading this post and get 7 days of free access to the full post archives.