Gather Tons of Financial Data With Python APIs (Safely)
Pull huge datasets with Python APIs without getting blocked

Say you want to find out how the creative industry is doing post-AI. Maybe you even want to know how it fared in other disruptive times. You put together a list of companies in the creative sector—Netflix, Warner Bros, Spotify, Adobe, and so on.
You have established a long list of companies in this sector, and now you just need their financial data to figure out how well (or unwell) they’re doing these days. The quickest way to get this data is through APIs like yfinance
and others. You start building your data pipeline. But before you realize it, you’ve hit the limit and are asked to pay—or are blocked entirely.
Ew. Your project is dead in the water.
Financial APIs are powerful, but they come with strict limitations on how much data you can pull at once. If you hit those limits too often, you’ll find yourself getting throttled, banned, or forced onto an expensive premium plan.
It’s fine that these data providers charge for their services—they need to keep the lights on too. Often, however, you’re not getting half of what you could be on the free tier. Tooling around the free tier can be complex and require plenty of knowledge and research.
I work at a small startup, so we’ve invested the time and energy to find out exactly how to get around all these limitations. I’m here to break it down for you. From batching requests, caching data, smart rate-limiting techniques, API key rotation and proxy use, I got you covered.
Why Data Providers Mess With You
There’s no such thing as a free lunch, they say. I don’t think that’s true, but if you want a big lunch for free you’d better be smart about obtaining it.
Without the right smarts, you’ll hit limits fast when gathering large datasets. To figure out how to get around these limits, it’s helpful to understand why data providers put them up in the first place:
Protecting their infrastructure: Many API providers process millions of requests per day. Without limits, servers would overload. Single users might monopolize resources and slow access for everyone else. So really it’s a question of fairness to everyone.
Encouraging paid subscriptions: Data providers need to eat, too. Many will therefore give you just enough data for you to realize that you need more of it. And then they’ll make you pay. Luckily, there are some—legitimate and safe—ways around this, though, at least if you don’t control a large budget.
Compliance and legal restrictions: Finance is a heavily regulated industry, and data is a competitive advantage. Financial data providers have to comply with exchange agreements and regulatory rules (e.g., SEC, FINRA). That’s why exchange rates, and entities that depend on exchange rates, often charge for their data—it gives them a better idea of who’s using their data, and helps them make sure they’re not giving an unfair advantage to one user over another.
All this being said, financial data providers are not all the same. Different types of data exist, and some providers are more generous with it than others. You’ll find a full overview in the table below.
Among the data providers above, yfinance
is easily the most generous one, allowing 2,000 API calls per day for just about any data they provide. It’s great for stock & historical data. IEX Cloud and EOD Historical Data (EODHD) are generous, too. They are good for real-time market data and high-speed requests, respectively.
Alpha Vantage, Polygon.io, and Quandl are somewhat generous, but do not offer quite as much data on the free tier as yfinance
, IEX Cloud and EODHD. Because of their higher limits, they are better suited for casual use and not for serious business (unless you’re smart and get around their limits—more on that below).
Alpaca and Codat are the least generous data providers. Alpaca is great for trading bots and algo traders. It offers a lot of calls on its free plan, but you need to pay to get hold of real-time data. Codat’s access scales with the number of Active Connected Companies, i.e., the business entities that you have linked to its system. (It’s in Codat’s interest to get its hands on as much data as possible through its users!)
Using a combination of APIs can be a good option to get around some of these limits. However, the downside of this is that you need more code and maintenance to ensure access to each. Plus, data structure is not identical across different APIs—you therefore have more work on your plate. Below are some more sophisticated strategies to avoid API rate limits.
Keep reading with a 7-day free trial
Subscribe to Wangari Digest to keep reading this post and get 7 days of free access to the full post archives.