Learn TLA+ > Concurrency > Example: Rate Limiting

Example: Rate Limiting

The Problem

As part of your project, you need to call a third party API. There are two types of calls you need to make:

A GET for a collection of objects. Each call will return some results and, potentially, a pagination token.
A GET for a single object, followed by, if necessary, a PUT to update it.

There is a known rate limit of N calls per unit of time on this API, and you do not want to exceed this. The first type may make an arbitrary number of calls, but it’s less critical and you are comfortable waiting between calls. For the second, it will make either 1 or 2 calls; however, being higher priority, you want to immediately update the object if necessary. With these constraints in mind, how do you guarantee you never exceed the rate limit?

Initial Design

We start with the minimum possible case.

---- MODULE api ---- 
EXTENDS Integers, TLC
(* --algorithm api
variables made_calls = 0, max_calls \in 5..10;

macro make_calls(n)
begin
  made_calls := made_calls + n;
  assert made_calls <= max_calls;
end macro;

process get_collection = 0
begin 
  Request:
    make_calls(1);
    either goto Request 
    or skip 
    end either;
end process;

process get_put \in 1..3
begin
  Call:
    with c \in {1, 2} do
      make_calls(c)
    end with;
end process;

end algorithm; *)
====

Since most of the business logic is irrelevant to our use case, we simply leave it out. get_collection makes an arbitrary number of calls, represented by the either goto. Every time TLC reaches that branch point it will explore both branches, so we know this can be an arbitrary number of calls. Similarly, the get_put makes either one or two calls in a transaction. Finally, every time the made_calls changes we check that we haven’t gone over the maximum.

If we run this, we’ll get an error: the assert fails pretty trivially. We can fix this by adding a check in the macro, so that we only make a call if we have enough rate left. One fix is this:

macro make_calls(n)
begin
  if made_calls <= max_calls - n then
    made_calls := made_calls + n;
    assert made_calls <= max_calls;
  end if;
end macro;

Which makes the call fail silently if it would take us past the limit.

Deadlocks

Our solution works, but misses the business case: we don’t want to not make the calls, we want to wait on them until the rate limit refreshes. We can simulate that with an await:

macro make_calls(n)
begin
  await made_calls <= max_calls - n;
  made_calls := made_calls + n;
  assert made_calls <= max_calls;
end macro;

If we run this, though, we get a deadlock. That’s because all of our processes are waiting for the limit to refresh, which we never actually coded. Here’s one way of doing this:

process reset_limit = -1
begin
  Reset:
    while TRUE do
      made_calls := 0;
    end while
end process

If we add that, the spec passes as normal.

Asynchronous Workers

Let’s add a complication: all of the processes are running on different workers, so they don’t automatically know the number of calls made. In order to avoid going past the limit, we need to share that information, which takes time. One way we can do this is by having each process query a central cache that only move forward if there’s enough calls remaining. One representation would be

EXTENDS Integers, TLC
(* --algorithm api
variables made_calls = 0, max_calls \in 5..10;

macro make_calls(n)
begin
    made_calls := made_calls + n;
    assert made_calls <= max_calls;
end macro;

process reset_limit = -1
begin
  Reset:
    while TRUE do
      made_calls := 0;
    end while
end process


process get_collection = 0
begin 
  GCGetCalls:
    await made_calls <= max_calls - 1;
  Request:
    make_calls(1);
    either goto GCGetCalls 
    or skip 
    end either;
end process;

process get_put \in 1..3
begin
  GPGetCalls:
    await made_calls <= max_calls - 2;
  Call:
    with c \in {1, 2} do
      make_calls(c)
    end with;
end process;

end algorithm; *)

=========

When we run this, we see it fails, as between the get and the request another process can make a call. If we want to resolve this we need some form of locking or priority. This will work, but it also explodes the diameter. Every process needs to wait on every other process, even when there’s plenty of calls to go around. We should instead look for a more optimized way of handling this issue.

Semaphores

One solution we can use is to reserve calls: when a process checks that there are calls remaining, it reserves N calls that are considered made in our cache but not in the API endpoint. Then, once we make the appropriate calls, we return the necessary reserves.

---- MODULE api ----
EXTENDS Integers, TLC
(* --algorithm api
variables made_calls = 0, max_calls \in 5..10, reserved_calls = 0;


macro make_calls(n) begin
  made_calls := made_calls + n;
  assert made_calls <= max_calls;
end macro;


macro reserve(n) begin
  await made_calls + reserved_calls + n <= max_calls;
  reserved_calls := reserved_calls + n;
end macro

process reset_limit = -1
begin
  Reset:
    while TRUE do
      made_calls := 0;
    end while
end process

process get_collection = 0
begin
  GCGetCalls:
    reserve(1);
  Request:
    make_calls(1);
    reserved_calls := reserved_calls - 1;
    either goto GCGetCalls 
    or skip 
    end either;
end process;

process get_put \in 1..3
begin
  GPGetCalls:
    reserve(2);  
  Call:
    with c \in {1, 2} do
      make_calls(c)
    end with;
    reserved_calls := reserved_calls - 2;
end process;

end algorithm; *)