Measuring Platform Sustainability – Quantifiably
How to build data pipelines that capture the true impact of digital ecosystems

If you have ever tried to measure the environmental or social impact of a digital platform, you know the headache. It is one thing to calculate the direct emissions of your own servers, your office buildings, or your employee travel. Those are bounded problems. You gather the utility bills, you apply standard emission factors, and you get a number. It is an entirely different beast to quantify the impact of the thousands, or millions, of users interacting within your ecosystem.
At Wangari, we frequently encounter this challenge when modeling ESG data for financial institutions. A platform might look incredibly “green” on paper because its direct footprint—its Scope 1 and 2 emissions—is vanishingly small. But if its core business logic incentivizes unsustainable behavior among its users, that platform is carrying hidden systemic risks. Think of an e-commerce marketplace that optimizes its algorithms purely for rapid consumption and next-day delivery, regardless of the carbon cost, or a social network whose engagement model inadvertently rewards polarization.
The problem is that traditional sustainability metrics were designed for linear supply chains, not multi-sided digital ecosystems. In a linear model, a widget moves from factory to warehouse to consumer, and you can track the carbon at each step. In a platform model, value is created through interactions between users, and the platform’s primary role is orchestration. To truly understand a platform’s impact, we need data pipelines that can capture indirect effects, network behaviors, and complex causal relationships.
In this post, I will walk through how we approach this problem technically. We will move from the foundational challenge of data extraction to more sophisticated ecosystem modeling, looking at three specific techniques: network-based attribution, causal inference for platform interventions, and natural language processing for qualitative assessment.
The Challenge of Ecosystem Data
The first hurdle is simply getting the data. Platform ecosystems are notoriously messy. You are dealing with unstructured data from user reviews, inconsistent reporting from third-party vendors, fragmented API endpoints, and data silos that refuse to talk to one another.
Before we can even begin to model impact, we need to wrangle this data into a usable format. This often involves extracting data from PDFs (like vendor sustainability reports) or scraping web data. As I have written about before, automating this extraction is crucial for building scalable pipelines. You cannot rely on manual data entry when you are dealing with thousands of ecosystem participants.
Once we have the raw data, the real work begins: attributing impact. If a user buys a product on an e-commerce platform, how much of the carbon footprint of that transaction belongs to the platform, and how much belongs to the seller or the buyer? If a platform’s algorithm recommends a high-carbon product over a low-carbon alternative, how do we quantify that algorithmic influence? This is where we need to move beyond simple accounting and start thinking about causal inference and network dynamics.
Technique 1: Network-Based Attribution
One approach to the attribution problem is to use network analysis to model the flow of impact through the platform. By representing the platform as a graph, where nodes are users or vendors and edges are transactions, we can start to quantify how the platform’s design influences overall ecosystem behavior.
This is particularly useful for identifying “super-spreaders” of impact—nodes in the network that have a disproportionate influence on the ecosystem’s overall footprint. In a financial context, this might be a specific asset manager whose portfolio choices ripple through the market. In an e-commerce context, it might be a high-volume vendor with inefficient logistics.
Here is a simplified example of how you might structure this using Python and the NetworkX library. We will build a directed graph from transaction data and calculate node centrality to find our high-impact participants.
import networkx as nx
import pandas as pd
import numpy as np
# 1. Load transaction data (simplified for demonstration)
# In a real scenario, this would be pulled from a data warehouse
# Columns: buyer_id, seller_id, transaction_value, estimated_carbon
data = {
'seller_id': ['S1', 'S1', 'S2', 'S3', 'S1', 'S2'],
'buyer_id': ['B1', 'B2', 'B1', 'B3', 'B4', 'B4'],
'transaction_value': [100, 150, 200, 50, 300, 120],
'estimated_carbon': [10, 15, 25, 5, 35, 12] # kg CO2e
}
transactions = pd.DataFrame(data)
# 2. Create a directed graph
# We use a DiGraph because transactions have a clear direction (seller to buyer)
G = nx.from_pandas_edgelist(
transactions,
source='seller_id',
target='buyer_id',
edge_attr=['transaction_value', 'estimated_carbon'],
create_using=nx.DiGraph()
)
# 3. Calculate node centrality
# Degree centrality measures the number of connections a node has.
# In this context, a high out-degree for a seller means they supply many buyers.
centrality = nx.degree_centrality(G)
# 4. Calculate carbon-weighted influence
# Simple centrality isn't enough; we need to weight it by the actual impact.
carbon_influence = {}
for node in G.nodes():
if G.out_degree(node) > 0: # Focus on sellers
# Sum the carbon of all outgoing edges
total_carbon = sum([G[u][v]['estimated_carbon'] for u, v in G.out_edges(node)])
carbon_influence[node] = total_carbon * centrality[node]
# 5. Identify high-impact nodes
threshold = np.percentile(list(carbon_influence.values()), 75) # Top 25%
high_impact_nodes = {k: v for k, v in carbon_influence.items() if v >= threshold}
print(f"Identified {len(high_impact_nodes)} high-impact nodes in the ecosystem.")
for node, score in high_impact_nodes.items():
print(f"Node {node}: Influence Score {score:.2f}")Caveats: This approach assumes you have reliable transaction-level data, which is often not the case. It also simplifies the attribution problem by treating all edges equally, whereas in reality, the platform’s influence might vary significantly depending on the type of transaction. Furthermore, network analysis can become computationally expensive as the graph scales to millions of nodes, requiring distributed computing frameworks like Apache Spark or specialized graph databases.
Research Shoutout — Scaling Sustainable Digital Platforms
We are conducting academic research on how sustainable digital platforms grow and scale responsibly. If your company embeds environmental or social goals into its core business model, we’d love to speak with you.
The study involves 2–3 short interviews with key employees. Participation is anonymous, confidential, and low time commitment — and you’ll receive early access to our findings.
Interested? Reach out to us directly:
Ari Joury, Cofounder & CEO, Wangari Global — ari.joury@wangari.global
Melanie Gertschen, PhD Candidate, University of Bern — melanie.gertschen@unibe.ch
Technique 2: Difference-in-Differences for Platform Interventions
If a platform introduces a new feature designed to promote sustainability—say, a “green shipping” option at checkout, or a dashboard that shows users their carbon footprint—how do we know if it actually worked? Did it change behavior, or did users just ignore it? This is a classic causal inference problem.
We cannot simply look at the total carbon footprint before and after the feature launch. Other factors might have changed simultaneously—a seasonal dip in sales, a broader economic downturn, or a change in the underlying energy grid. To isolate the causal effect of the platform’s intervention, we need a more rigorous statistical approach.
We can use a Difference-in-Differences (DiD) model. This technique compares the behavior of users who were exposed to the new feature (the treatment group) with those who were not (the control group), both before and after the intervention. By comparing the difference in their trajectories, we can strip away external confounding factors.
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np
# 1. Simulate user behavior data
# In reality, this requires careful experimental design (A/B testing) or quasi-experimental setup
np.random.seed(42)
n_users = 1000
data = pd.DataFrame({
'user_id': np.repeat(np.arange(n_users), 2),
'time_period': np.tile([0, 1], n_users), # 0=pre-intervention, 1=post-intervention
'treated': np.repeat(np.random.binomial(1, 0.5, n_users), 2) # 50% in treatment group
})
# Simulate the outcome variable (e.g., carbon footprint per user)
# Base footprint + time trend + treatment effect + noise
base_footprint = 50
time_trend = -5 * data['time_period'] # General downward trend for everyone
treatment_effect = -10 * (data['time_period'] * data['treated']) # The actual impact of our feature
noise = np.random.normal(0, 5, len(data))
data['carbon_footprint'] = base_footprint + time_trend + treatment_effect + noise
# 2. Create the interaction term
# This term isolates the effect of being in the treatment group AFTER the intervention
data['did'] = data['time_period'] * data['treated']
# 3. Run the DiD regression
# We control for time period and treatment group assignment
model = smf.ols('carbon_footprint ~ time_period + treated + did', data=data).fit()
print(model.summary())
# The coefficient for 'did' represents the causal impact of the intervention.
# In our simulation, we expect it to be close to -10.Caveats: DiD relies heavily on the “parallel trends” assumption—that the treatment and control groups would have followed the exact same trajectory if the intervention had not happened. In dynamic platform environments, this assumption is often violated by network effects. If the treatment group changes their behavior, they might influence the control group (spillover effects), muddying the results. Validating the parallel trends assumption using historical data is a critical prerequisite before trusting a DiD model.
Technique 3: NLP for Qualitative Impact Assessment
Not all impact can be quantified in tons of carbon or dollars. Much of a platform’s social impact is qualitative, found in the messy, unstructured text of user reviews, community forums, support tickets, or social media mentions. Does the platform foster a sense of community, or does it drive isolation? Are vendors feeling squeezed by algorithmic changes?
We can use Natural Language Processing (NLP) to extract sentiment and thematic trends from this unstructured text, providing a proxy for social impact that quantitative metrics often miss.
from transformers import pipeline
import pandas as pd
# 1. Load user reviews (simulated)
reviews_data = {
'review_id': [1, 2, 3, 4],
'text': [
"The new sustainable packaging is great, but the shipping took forever.",
"I love the community features, I've met so many great local sellers.",
"The algorithm keeps pushing cheap, disposable items. It's frustrating.",
"Customer support was incredibly helpful when I had an issue with my return."
]
}
reviews = pd.DataFrame(reviews_data)
# 2. Initialize sentiment analysis pipeline
# We use a pre-trained model from Hugging Face for demonstration
# In production, you would likely fine-tune a model on your specific domain
sentiment_analyzer = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
# 3. Apply to the reviews
sample_reviews = reviews['text'].tolist()
results = sentiment_analyzer(sample_reviews)
# 4. Aggregate and analyze results
reviews['sentiment_label'] = [r['label'] for r in results]
reviews['sentiment_score'] = [r['score'] for r in results]
print(reviews[['text', 'sentiment_label']])
positive_count = sum(1 for r in results if r['label'] == 'POSITIVE')
print(f"\nOverall Positive sentiment ratio: {positive_count / len(results):.2f}")Caveats: Out-of-the-box sentiment models often struggle with the nuanced language of specific domains. As seen in the first simulated review (”The new sustainable packaging is great, but the shipping took forever”), a single piece of text can contain mixed sentiments about different aspects of the platform. A simple positive/negative classification is insufficient here. To truly capture qualitative impact, you need aspect-based sentiment analysis, which identifies what the user is talking about (packaging vs. shipping) before assigning a sentiment score.
Comparing the Approaches
To build a comprehensive measurement strategy, it is essential to understand where each technique excels and where it falls short.
| Technique | Best For | Key Limitation | Data Requirements |
| ------------------------- | --------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
| Network Analysis | Understanding ecosystem structure, identifying key influencers, and mapping systemic risk. | Computationally expensive at scale; assumes equal weight of connections unless carefully tuned. | Granular, transaction-level data mapping relationships between entities. |
| Difference-in-Differences | Evaluating the causal impact of specific platform features, policy changes, or interventions. | Relies on strict parallel trends assumptions; vulnerable to spillover effects in highly connected networks. | Longitudinal data with clear pre/post intervention periods and distinct treatment/control groups. |
| NLP / Sentiment Analysis | Capturing qualitative social impact, user perception, and emerging thematic issues. | Struggles with domain-specific nuance, sarcasm, and aspect-level attribution without significant fine-tuning. | Large volumes of unstructured text data (reviews, forums, support tickets). |Combining Approaches for Robust Measurement
The reality is that no single technique is sufficient for measuring platform sustainability. The most robust data pipelines combine these approaches into a cohesive architecture.
For example, you might start by using NLP to identify a recurring issue in user reviews—perhaps a sudden spike in complaints about excessive packaging waste from third-party vendors. You could then use network analysis to trace which specific clusters of vendors are driving this issue, identifying the structural bottlenecks in the ecosystem. Finally, if the platform implements a new packaging policy targeted at those specific vendors, you would use a Difference-in-Differences model to measure the causal effectiveness of that policy over time.
This multi-layered approach is what we strive for at Wangari. It is computationally intensive, requires careful data engineering, and demands a deep understanding of both statistical assumptions and platform dynamics. But it is the only way to move beyond superficial ESG metrics.
The Bottom Line
Measuring platform sustainability is fundamentally a data engineering and causal inference challenge. It requires moving beyond the boundaries of the firm to analyze the entire ecosystem.
By leveraging tools like network analysis to map structure, causal models to isolate impact, and NLP to capture qualitative nuance, we can start to quantify the unquantifiable. This provides the rigorous, defensible evidence needed to build truly resilient digital platforms—platforms that don’t just optimize for engagement, but optimize for enduring value.



"Super-spreaders" is great coinage lol.
Very interesting topic and framing. I do suspect that nuclear power fares better than "renewables" if measured this way.