Tiered API Rate Limits in Rails 8.2 with Dynamic rate_limit

Q: How do I rate-limit Rails API requests by user plan?

Pass a method name or proc to the to: and within: options of rate_limit and have it return values based on current_user.plan. Rails 8.2 evaluates these on every request, so the limit reflects the user's current plan.

Q: Can I have multiple rate_limit declarations in one controller?

Yes. Add a unique name: option to each one (for example name: 'burst' and name: 'sustained'). Without a name, Rails raises when you declare more than one in the same controller.

Q: What's the difference between Rails rate_limit and Rack::Attack?

rate_limit runs inside the Rails request cycle, so it has access to current_user, controller methods, and routing context. Rack::Attack runs at the Rack layer before Rails boots the request, which makes it the right tool for IP allow/blocklists and abuse rules. Use both: Rack::Attack for low-level IP filtering, rate_limit for per-user, plan-aware policy.

Q: How do I send a Retry-After header from rate_limit?

Use the with: option to provide a custom callback that sets response.headers['Retry-After'] and renders your 429 response. The default behavior raises ActionController::TooManyRequests, which produces a plain 429 with no header.

Note: This feature is coming in Rails 8.2. It’s merged to main but not yet released. You can view the source on GitHub or try it by pointing your Gemfile at the main branch.

Since Rails 8.0, you’ve had a rate_limit macro in your controllers. The catch was that to: and within: had to be hardcoded numbers, which made tiered pricing awkward. Rails 8.2 lets both options take procs or method names, so you can finally do “Free 100/min, Pro 1k/hour, Enterprise unlimited” without bringing in Rack::Attack.

Why rate_limit was static before Rails 8.2

Before 8.2, rate_limit was static-only:

class Api::BaseController < ApplicationController
  rate_limit to: 100, within: 1.minute
end

Every user got the same limit. If you wanted per-plan tiers before, your options were:

Skip rate_limit and reach for Rack::Attack, where the per-user lookup happens before Rails has loaded current_user.
Hand-roll a before_action that counted requests in the cache and rendered 429 yourself.
Define one controller per tier and route subclasses based on the plan.

All three duplicated logic Rails already had. The piece that was missing was the ability to compute the limit from request context.

How dynamic rate_limit works in Rails 8.2

In Rails 8.2 (PR #56128), to: and within: accept procs and method names alongside static values, just like by: already did. The rules are simple:

A symbol is dispatched as a no-argument method on the controller (send(:max_requests)).
A proc or lambda runs as if it were a method on the controller, so current_user, params, request, and session are all in scope. Procs take no arguments.

That’s it. The rest of the post is what you can build with it.

Pattern 1: Free / Pro / Enterprise Tiers

If you’re building multi-tenant SaaS, this is the pattern you’ve been waiting for. Here’s how it looks:

class Api::BaseController < ApplicationController
  before_action :authenticate_api_user

  rate_limit to: :max_requests,
             within: :rate_window,
             by: -> { current_user.id }

  private
    def max_requests
      case current_user.plan
      when "enterprise" then 10_000
      when "pro"        then 1_000
      else                   100
      end
    end

    def rate_window
      current_user.plan == "free" ? 1.minute : 1.hour
    end
end

On every request, Rails calls those two methods on the controller, so the limit reflects whatever plan the user is on right now. The by: proc keys the counter on the user’s ID, giving each user their own bucket.

Testing tiered limits

Here is how you test rate limit configurations. Two things to handle: clear the cache between runs, and time-travel across the window.

require "test_helper"

class Api::BaseControllerTest < ActionDispatch::IntegrationTest
  include ActiveSupport::Testing::TimeHelpers

  setup { Rails.cache.clear }

  test "free users hit 100 requests per minute" do
    sign_in_as users(:free_user)

    100.times { get api_widgets_url }
    assert_response :success

    get api_widgets_url
    assert_response :too_many_requests

    travel 1.minute do
      get api_widgets_url
      assert_response :success
    end
  end
end

For parallel test runs, give each worker its own cache, or swap to :null_store and assert the macro is configured rather than asserting on the count.

Pattern 2: Burst + Sustained Limits

1,000 requests an hour sounds generous until a user fires all of them in three seconds. Layer a burst limit on top of the sustained one:

class Api::BaseController < ApplicationController
  before_action :authenticate_api_user

  rate_limit to: :sustained_limit,
             within: 1.hour,
             name: "sustained",
             by: -> { current_user.id }

  rate_limit to: :burst_limit,
             within: 10.seconds,
             name: "burst",
             by: -> { current_user.id }

  private
    def sustained_limit = current_user.plan == "pro" ? 1_000 : 100
    def burst_limit     = current_user.plan == "pro" ? 50 : 10
end

Once you have more than one rate_limit in a controller, name: becomes required. That’s what tells Rails which counter is which in the cache.

Pattern 3: Anonymous vs Authenticated

If your endpoint is public, there’s no current_user to key on. Drop down to IP and a lower ceiling. The by: proc handles both cases in one shot, and the same trick works for API keys:

class Api::PublicController < ApplicationController
  rate_limit to: -> { current_user ? 1_000 : 20 },
             within: 1.minute,
             by: -> { current_user&.id || request.remote_ip }
end

class Api::KeyAuthedController < ApplicationController
  rate_limit to: 5_000,
             within: 1.hour,
             by: -> { request.headers["X-Api-Key"] }
end

A gotcha to be aware of: if you’re behind Cloudflare or another proxy, request.remote_ip will be the proxy’s IP unless config.action_dispatch.trusted_proxies is set. Skip that and every anonymous request hashes to the same bucket, which makes your rate limit fairly useless.

Pattern 4: Friendly 429 Responses (with Instrumentation)

By default, hitting the limit raises ActionController::TooManyRequests, which Action Dispatch returns as a plain 429. That’s fine for HTML, but API clients usually want JSON and a Retry-After header. While you’re customizing the response, the same with: block is a good place to fire an ActiveSupport::Notifications event for your APM, since the macro doesn’t emit one on its own.

class Api::BaseController < ApplicationController
  rate_limit to: :max_requests,
             within: :rate_window,
             by: -> { current_user.id },
             with: -> {
               ActiveSupport::Notifications.instrument(
                 "rate_limit.exceeded",
                 user_id: current_user.id,
                 plan: current_user.plan,
                 controller: self.class.name
               )

               response.headers["Retry-After"] = rate_window.to_i.to_s
               render json: {
                 error: "rate_limit_exceeded",
                 plan: current_user.plan,
                 retry_after: rate_window.to_i
               }, status: :too_many_requests
             }
end

Subscribe to rate_limit.exceeded in an initializer and you’ve got a feed for Datadog, Honeycomb, or wherever your traffic dashboards live. Clients (or your background jobs using smart retry strategies) can read Retry-After and back off accordingly.

Rails rate_limit caveats and edge cases

A few things the macro doesn’t tell you up front:

Cache store choice. rate_limit uses Rails.cache. :memory_store is per-process, so two Puma workers won’t share counts; :file_store won’t work across servers. In production, use :solid_cache_store, :redis_cache_store, or :mem_cache_store. With any of those, cache_store.increment is atomic, so concurrent requests across multiple app servers stay correct without extra coordination.
Fixed-window, not sliding. Counters reset on window boundaries. A user can fire limit requests at 12:59:59 and another limit at 13:00:00. If that bothers you, layer a tighter burst limit (Pattern 2).
Plan upgrades take effect on the next window. The limit is recomputed each request, but the in-flight window’s count carries over. A free user at 99/100 who upgrades to Pro will still get 429’d on their next request that minute. Including current_user.plan in the by: key sidesteps this if you really need an instant upgrade.
Procs run in the controller’s binding. current_user, params, and request work; local variables defined where you wrote rate_limit do not. Stick to controller methods and instance state.
Symbol methods take no arguments. They’re dispatched via send(name), so just return the value. Don’t try to accept the request as a parameter.
Cross-controller shared limits. Pass scope: to make multiple controllers share a bucket. Handy for “all writes count against one limit” setups.
Rack::Attack still has a place. Keep it for IP allow/blocklists, fail2ban-style abuse rules, and anything that needs to run before Rails has booted the request. rate_limit is the right tool for per-controller, per-user, plan-aware policy where you already have controller context.
Performance. A method call or proc invocation per request is in the microsecond range. Don’t waste time worrying about it.

FAQ

How do I rate-limit Rails API requests by user plan?

Pass a method name or proc to the to: and within: options of rate_limit and have it return values based on current_user.plan. Rails 8.2 evaluates these on every request, so the limit reflects the user’s current plan.

Can I have multiple rate_limit declarations in one controller?

Yes. Add a unique name: to each one (for example name: "burst" and name: "sustained"). Without a name, Rails raises when you declare more than one in the same controller.

Does Rails rate_limit work across multiple application servers?

Yes, as long as the cache store is shared. Solid Cache, Redis, and Memcached all support atomic increments, so concurrent requests across servers stay correctly counted. The default :memory_store is per-process and won’t work across servers or even multiple Puma workers.

What’s the difference between Rails rate_limit and Rack::Attack?

rate_limit runs inside the Rails request cycle, so it has access to current_user, controller methods, and routing context. Rack::Attack runs at the Rack layer before Rails boots the request, which makes it the right tool for IP allow/blocklists and abuse rules. Use both: Rack::Attack for low-level IP filtering, rate_limit for per-user, plan-aware policy.

How do I send a Retry-After header from rate_limit?

Use the with: option to provide a custom callback that sets response.headers["Retry-After"] and renders your 429 response. The default behavior raises ActionController::TooManyRequests, which produces a plain 429 with no header.

Tiered rate limits in production

This change makes rate_limit a lot more useful. What used to be a middleware concern (or a custom before_action) now fits into a few lines on your Api::BaseController. Tiered pricing, burst plus sustained limits, anonymous fallbacks, API-key buckets, observability hooks, all of it just lives in the macro now.

The implementation lives in PR #56128 if you want to read the source. If you’re shipping a tiered API for real, pair this with smart retry strategies so your clients respect the Retry-After headers you’re now returning. And if you’re keying off API tokens instead of session users, bearer token authentication works well with this.