Last week at work, I rewrote an API to use FastAPI and as much async Python as possible. Then I benchmarked the old and new versions of the API to see which was truly fastest. The summary? FastAPI and async Python made my API much faster.
Check the rest of this article for the longer story, including how the rewrite went, bugs I ran into, and benchmarks.
About FastAPI
FastAPI is an async Python web framework.
It was the #3 web framework in the 2020 Python Developers Survey by Jetbrains.
The inventor of FastAPI is Sebastián Ramírez. He’s a man with a fabulous mustache and an intense — some might say feverish — need to express himself with emojis.
Falcon vs. FastAPI
I’m embarrassed that FastAPI is so fast. I’ve preferred the Falcon web framework when I wanted speed and didn’t need Django’s ORM. “Async Python is a crock,” I mumbled to myself as I fell asleep each night.
Nothing can beat a Cythonized Falcon deployment, right?
It turns out I was wrong.
NOTE: All comparisons between Falcon and FastAPI in this post are between Falcon 2 running as a synchronous WSGI application and FastAPI running as an async ASGI application. As of version 3.0, Falcon now supports async Python, wihch I’m sure would do better in a head-to-head with FastAPI.
The Impetus
Why rewrite a production API, you might ask? Not just for fun.
My impetus for the rewrite was that I started maintaining aioredis-py, working alongside some great engineers who were already contributing to the project. The maintainers of aioredis-py were testing a new version, 2.0, which was a massive port of redis-py to support asyncio. We wanted more people to try version 2.0 in running systems before we graduated the release from alpha to beta.
My team at work wasn’t using aioredis, so I found an API I could make async with low risk: a full-text search API that indexes Redis Labs web sites and exposes an API for searching them with Redis. Using Falcon as the web framework and Redis with RediSearch for the data layer, this API was already fast and low-latency, usually returning results in under 50ms.
But could it be faster if I used FastAPI and async Python?
The Rewrite
Read through my PR if you want to see the nitty-gritty details.
The PR includes many changes, but the most relevant is the /search
API.
NOTE: If you’re an experienced async Python person who has the time to read that PR, let me know any mistakes I’m making – I’ve only done a little async Python!
I ran into three unexpected issues during this project, two of which appeared to be FastAPI bugs:
-
Using
Depends()
for dependency injection the way the docs recommend reduced the speed of the API by 75%, so I had to skip doing that for now. Why on earth did this happen? I don’t know yet! -
After serializing a Pydantic
BaseSettings
object to Redis in the way the docs recommend, I could not deserialize it due to what looked like a bug – so I had to abandon usingBaseSettings
. -
FastAPI token authentication was poorly documented, and what I ended up with is more confusing than I’d like.
Aside from those issues, I ran into the things I expected to go wrong:
-
The redisearch-py client library does not support aioredis-py, despite the new release of aioredis-py matching the redis-py APIs. This is because aioredis-py’s async version of the Redis client returns coroutines.
-
The same was true for other libraries that used redis-py, like
redis-queue
. So my project ended up with one set of components that used redis-py and then the web APIs, which all used aioredis-py. This didn’t bother me. Async is good at concurrent IO, but when you don’t need that, why bother?
Benchmarks
Let’s look at the final numbers in a benchmark between the two versions of the API. Both benchmarks were taken against the API running on a single GCP e2-medium node close to my home (the us-west region):
First, the Falcon version:
$ wrk -c 60 -t 20 "<staging URL>/search?q=python"
Running 10s test @ <staging URL>/search?q=python
20 threads and 60 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 214.01ms 71.38ms 493.73ms 70.17%
Req/Sec 14.21 6.56 40.00 88.77%
2768 requests in 10.09s, 15.93MB read
Requests/sec: 274.22
Transfer/sec: 1.58MB
While this API was fast in real-world use, you can see that it slowed down as concurrent requests increased (60 simultaneous connections and 20 threads).
Here’s the FastAPI version – again, on the same GCP node as the Falcon test and using the same Redis instance. This was the moment I realized that FastAPI was not a fad:
$ wrk -c 60 -t 20 "<staging URL>/search?q=python"
Running 10s test @ <staging URL>/search?q=python
20 threads and 60 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 70.33ms 52.08ms 596.22ms 93.01%
Req/Sec 46.60 13.90 90.00 74.94%
9180 requests in 10.10s, 51.01MB read
Requests/sec: 908.78
Transfer/sec: 5.05MB
Average latency dropped from 214.01ms to 70.33ms, and we achieved 900 requests/second. Not bad!
The usual caveat: Benchmarks are usually biased and completely wrong. As soon as I publish this, someone will email that I should have used a different gunicorn worker or something. Buyer beware!
Takeaways
Ok, let’s wrap this up. Here are my takeaways from this project:
- For IO-bound concurrency, async Python has real benefits – and web programming is mostly IO-bound!
- FastAPI is, in fact, fast.
- Migrating to FastAPI was not painless.
As this article already discussed, I ran into bugs and other challenges that stemmed from libraries failing to support a Redis client that behaved like redis-py but was async. However, bugs and oddities aside, ~270 requests to ~910 requests/second is excellent, and latency in real-world use of the API is lower — not by a lot, but still lower.
I also liked using FastAPI, and it has supplanted Falcon as my “I need speed” Python framework.