Rewriting an API to Use FastAPI: Benchmarks and Lessons Learned

A question has bothered me for the past year or more: is FastAPI a fad, or should I learn it?

Now I finally know the answer. Last week at work, I rewrote an API to use FastAPI and as much async Python as possible. Then I benchmarked the old and new versions of the API to see which was truly fastest.

The summary? FastAPI and async Python were much faster than my old API.

Check the rest of this article for the longer story, including how the rewrite went, bugs I ran into, and benchmarks.

About FastAPI

FastAPI is an async Python web framework.

It was the #3 web framework in the 2020 Python Developers Survey by Jetbrains.

The inventor of FastAPI is Sebastián Ramírez. He’s a man with a fabulous mustache and an intense — some might say feverish — need to express himself with emojis.

Falcon vs. FastAPI

I’m embarrassed that FastAPI is so fast. I’ve preferred the Falcon web framework when I wanted speed and didn’t need Django’s ORM. “Async Python is a crock of shit,” I mumbled to myself as I fell asleep each night.

Nothing can beat a Cythonized Falcon deployment, right?

It turns out I was wrong.

NOTE: All comparisons between Falcon and FastAPI in this post are between Falcon 2 running as a synchronous WSGI application and FastAPI running as an async ASGI application.

The Impetus

Why rewrite a production API, you might ask? Not just for fun.

My impetus for the rewrite was that I started maintaining aioredis-py, working alongside some great engineers who were already contributing to the project. The maintainers of aioredis-py were testing a new version, 2.0, which was a massive port of redis-py to support asyncio. We wanted more people to try version 2.0 in running systems before we graduated the release from alpha to beta.

My team at work wasn’t using aioredis, so I found an API I could make async with low risk: a full-text search API that indexes Redis Labs web sites and exposes an API for searching them with Redis. Using Falcon as the web framework and Redis with RediSearch for the data layer, this API was already fast and low-latency, usually returning results in under 50ms.

But could it be faster if I used FastAPI and async Python?

The Rewrite

Read through my PR if you want to see the nitty-gritty details.

The PR includes many changes, but the most relevant is the /search API.

NOTE: If you’re an experienced async Python person who has the time to read that PR, let me know any mistakes I’m making – I’ve only done a little async Python!

I ran into three unexpected issues during this project, two of which appeared to be FastAPI bugs:

  1. Using Depends() for dependency injection the way the docs recommend reduced the speed of the API by 75%, so I had to skip doing that for now. Why on earth did this happen? I don’t know yet!

  2. After serializing a Pydantic BaseSettings object to Redis in the way the docs recommend, I could not deserialize it due to what looked like a bug – so I had to abandon using BaseSettings.

  3. FastAPI token authentication was poorly documented, and what I ended up with is more confusing than I’d like.

Aside from those issues, I ran into the things I expected to go wrong:

  1. The redisearch-py client library does not support aioredis-py, despite the new release of aioredis-py matching the redis-py APIs. This is because aioredis-py’s async version of the Redis client returns coroutines.

  2. The same was true for other libraries that used redis-py, like redis-queue. So my project ended up with one set of components that used redis-py and then the web APIs, which all used aioredis-py. This didn’t bother me. Async is good at concurrent IO, but when you don’t need that, why bother?

Benchmarks

Let’s look at the final numbers in a benchmark between the two versions of the API. Both benchmarks were taken against the API running on a single GCP e2-medium node close to my home (the us-west region):

First, the Falcon version:

$ wrk -c 60 -t 20 "<staging URL>/search?q=python"
Running 10s test @ <staging URL>/search?q=python
  20 threads and 60 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   214.01ms   71.38ms 493.73ms   70.17%
    Req/Sec    14.21      6.56    40.00     88.77%
  2768 requests in 10.09s, 15.93MB read
Requests/sec:    274.22
Transfer/sec:      1.58MB

While this API was fast in real-world use, you can see that it slowed down as concurrent requests increased (60 simultaneous connections and 20 threads).

Here’s the FastAPI version – again, on the same GCP node as the Falcon test and using the same Redis instance. This was the moment I realized that FastAPI was not a fad:

$ wrk -c 60 -t 20 "<staging URL>/search?q=python"
Running 10s test @ <staging URL>/search?q=python
  20 threads and 60 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    70.33ms   52.08ms 596.22ms   93.01%
    Req/Sec    46.60     13.90    90.00     74.94%
  9180 requests in 10.10s, 51.01MB read
Requests/sec:    908.78
Transfer/sec:      5.05MB

Average latency dropped from 214.01ms to 70.33ms, and we achieved 900 requests/second. Not bad!

The usual caveat: Benchmarks are usually biased and completely wrong. As soon as I publish this, someone will email that I should have used a different gunicorn worker or something. Buyer beware!

Takeaways

Ok, let’s wrap this up. Here are my takeaways from this project:

  • For IO-bound concurrency, async Python has real benefits – and web programming is mostly IO-bound!
  • FastAPI is, in fact, fast.
  • Migrating to FastAPI was not painless.

As this article already discussed, I ran into bugs and other challenges that stemmed from libraries failing to support a Redis client that behaved like redis-py but was async. However, bugs and oddities aside, ~270 requests to ~910 requests/second is excellent, and latency in real-world use of the API is lower — not by a lot, but still lower.

I also liked using FastAPI, and it has supplanted Falcon as my “I need speed” Python framework.