用 Rails 8.2 的动态 rate_limit 实现分级 API 速率限制

Q: How do I rate-limit Rails API requests by user plan?

给 rate_limit 的 to: 和 within: 选项传一个方法名或 proc，让它根据 current_user.plan 返回值。Rails 8.2 会在每次请求时求值这些选项，所以限制会反映用户当前的套餐。

Q: Can I have multiple rate_limit declarations in one controller?

可以。给每个声明加一个唯一的 name: 选项（例如 name: 'burst' 和 name: 'sustained'）。没有 name 时，在同一个 controller 中声明多个会让 Rails 抛异常。

Q: What's the difference between Rails rate_limit and Rack::Attack?

rate_limit 运行在 Rails 请求周期内部，所以它能访问 current_user、controller 方法和路由上下文。Rack::Attack 运行在 Rails 启动请求之前的 Rack 层，这使它成为处理 IP 允许/封禁列表和滥用规则的合适工具。两者都用：Rack::Attack 做底层 IP 过滤，rate_limit 做按用户、感知套餐的策略。

Q: How do I send a Retry-After header from rate_limit?

用 with: 选项提供一个自定义回调，在其中设置 response.headers['Retry-After'] 并渲染你的 429 响应。默认行为是抛出 ActionController::TooManyRequests，产生一个不带头的纯 429。

注意： 这个特性将在 Rails 8.2 中到来。它已经合并到 main，但尚未发布。你可以在 GitHub 上查看源码，或者把你的 Gemfile 指向 main 分支来试用它。

从 Rails 8.0 起，你的 controller 里就有了一个 rate_limit 宏。问题在于 to: 和 within: 必须是写死的数字，这让分级定价变得别扭。Rails 8.2 让这两个选项都能接受 proc 或方法名，所以你终于可以做到「Free 100/分钟，Pro 1k/小时，Enterprise 无限制」，而不必引入 Rack::Attack。

Rails 8.2 之前 rate_limit 为什么是静态的

在 8.2 之前，rate_limit 只能是静态的：

class Api::BaseController < ApplicationController
  rate_limit to: 100, within: 1.minute
end

每个用户都拿到相同的限制。如果你之前想要按套餐分级，可选方案是：

放弃 rate_limit，转而用 Rack::Attack，但按用户的查找发生在 Rails 加载 current_user 之前。
手写一个 before_action，在缓存里数请求数，自己渲染 429。
为每个层级定义一个 controller，根据套餐把子类路由过去。

这三种都重复了 Rails 已经有的逻辑。缺失的那一块，是从请求上下文计算出限制值的能力。

动态 rate_limit 在 Rails 8.2 中如何工作

在 Rails 8.2（PR #56128）中，to: 和 within: 在静态值之外也接受 proc 和方法名，就像 by: 已经支持的那样。规则很简单：

一个 symbol 会作为 controller 上的无参方法被分派（send(:max_requests)）。
一个 proc 或 lambda 会像 controller 上的方法一样运行，所以 current_user、params、request 和 session 都在作用域内。proc 不接受参数。

就这么多。本文剩下的部分，是你能用它构建出什么。

模式 1：Free / Pro / Enterprise 分级

如果你在做多租户 SaaS，这就是你一直在等的模式。它长这样：

class Api::BaseController < ApplicationController
  before_action :authenticate_api_user

  rate_limit to: :max_requests,
             within: :rate_window,
             by: -> { current_user.id }

  private
    def max_requests
      case current_user.plan
      when "enterprise" then 10_000
      when "pro"        then 1_000
      else                   100
      end
    end

    def rate_window
      current_user.plan == "free" ? 1.minute : 1.hour
    end
end

每次请求时，Rails 都会在 controller 上调用这两个方法，所以限制反映用户此刻所在的套餐。by: 这个 proc 把计数器按用户的 ID 来分键，给每个用户各自的桶。

测试分级限制

下面是测试速率限制配置的方法。要处理两件事：在每次运行之间清空缓存，以及跨越窗口做时间旅行。

require "test_helper"

class Api::BaseControllerTest < ActionDispatch::IntegrationTest
  include ActiveSupport::Testing::TimeHelpers

  setup { Rails.cache.clear }

  test "free users hit 100 requests per minute" do
    sign_in_as users(:free_user)

    100.times { get api_widgets_url }
    assert_response :success

    get api_widgets_url
    assert_response :too_many_requests

    travel 1.minute do
      get api_widgets_url
      assert_response :success
    end
  end
end

对于并行测试运行，给每个 worker 各自的缓存，或者切换到 :null_store 并断言宏被正确配置，而不是断言计数。

模式 2：突发 + 持续限制

每小时 1,000 个请求听起来很慷慨，直到某个用户在三秒内把它们全部打出去。在持续限制之上叠一层突发限制：

class Api::BaseController < ApplicationController
  before_action :authenticate_api_user

  rate_limit to: :sustained_limit,
             within: 1.hour,
             name: "sustained",
             by: -> { current_user.id }

  rate_limit to: :burst_limit,
             within: 10.seconds,
             name: "burst",
             by: -> { current_user.id }

  private
    def sustained_limit = current_user.plan == "pro" ? 1_000 : 100
    def burst_limit     = current_user.plan == "pro" ? 50 : 10
end

一旦一个 controller 里有超过一个 rate_limit，name: 就成了必需的。正是它告诉 Rails 缓存中哪个计数器对应哪一个。

模式 3：匿名 vs 已认证

如果你的端点是公开的，就没有 current_user 可以用来分键。退而以 IP 加上一个更低的上限。by: 这个 proc 一次性处理两种情况，同样的招数也适用于 API key：

class Api::PublicController < ApplicationController
  rate_limit to: -> { current_user ? 1_000 : 20 },
             within: 1.minute,
             by: -> { current_user&.id || request.remote_ip }
end

class Api::KeyAuthedController < ApplicationController
  rate_limit to: 5_000,
             within: 1.hour,
             by: -> { request.headers["X-Api-Key"] }
end

要注意一个坑：如果你在 Cloudflare 或别的代理后面，除非设置了 config.action_dispatch.trusted_proxies，否则 request.remote_ip 会是代理的 IP。忽略这一点，每个匿名请求都会哈希到同一个桶里，这让你的速率限制基本没用。

模式 4：友好的 429 响应（带埋点）

默认情况下，触达限制会抛出 ActionController::TooManyRequests，Action Dispatch 把它返回为一个纯 429。这对 HTML 没问题，但 API 客户端通常想要 JSON 和一个 Retry-After 头。趁你在定制响应的时候，同一个 with: 块也是个为你的 APM 触发 ActiveSupport::Notifications 事件的好地方，因为这个宏本身不会发出事件。

class Api::BaseController < ApplicationController
  rate_limit to: :max_requests,
             within: :rate_window,
             by: -> { current_user.id },
             with: -> {
               ActiveSupport::Notifications.instrument(
                 "rate_limit.exceeded",
                 user_id: current_user.id,
                 plan: current_user.plan,
                 controller: self.class.name
               )

               response.headers["Retry-After"] = rate_window.to_i.to_s
               render json: {
                 error: "rate_limit_exceeded",
                 plan: current_user.plan,
                 retry_after: rate_window.to_i
               }, status: :too_many_requests
             }
end

在一个 initializer 里订阅 rate_limit.exceeded，你就有了一条馈送，可以送往 Datadog、Honeycomb，或者你的流量仪表盘所在的任何地方。客户端（或者你用智能重试策略的后台任务）可以读取 Retry-After 并相应地退避。

Rails rate_limit 的注意事项与边界情况

有几件这个宏不会预先告诉你的事：

缓存存储的选择。 rate_limit 使用 Rails.cache。:memory_store 是按进程的，所以两个 Puma worker 不会共享计数；:file_store 跨服务器不起作用。生产环境用 :solid_cache_store、:redis_cache_store 或 :mem_cache_store。用其中任何一个，cache_store.increment 都是原子的，所以跨多个应用服务器的并发请求无需额外协调就能保持正确。
固定窗口，不是滑动窗口。 计数器在窗口边界重置。一个用户可以在 12:59:59 打出 limit 个请求，又在 13:00:00 打出 limit 个。如果这让你不安，叠一层更紧的突发限制（模式 2）。
套餐升级在下一个窗口才生效。 限制每次请求都会重新计算，但进行中窗口的计数会延续。一个在 99/100 的 free 用户升级到 Pro，在那一分钟内的下一个请求仍然会拿到 429。如果你真的需要即时升级，把 current_user.plan 包含进 by: 的键里可以绕开这一点。
proc 在 controller 的绑定中运行。 current_user、params 和 request 都可用；你写 rate_limit 那个地方定义的局部变量不可用。坚持使用 controller 方法和实例状态。
symbol 方法不接受参数。 它们通过 send(name) 分派，所以只管返回值。别想着把请求当参数接收进来。
跨 controller 的共享限制。 传 scope: 让多个 controller 共享一个桶。对「所有写操作算进同一个限制」这类设置很方便。
Rack::Attack 仍有用武之地。 把它留给 IP 允许/封禁列表、fail2ban 风格的滥用规则，以及任何需要在 Rails 启动请求之前运行的东西。rate_limit 是处理按 controller、按用户、感知套餐的策略的合适工具，因为那时你已经有了 controller 上下文。
性能。 每次请求一次方法调用或 proc 调用是在微秒级别。别浪费时间为它担心。

FAQ

How do I rate-limit Rails API requests by user plan?

给 rate_limit 的 to: 和 within: 选项传一个方法名或 proc，让它根据 current_user.plan 返回值。Rails 8.2 会在每次请求时求值这些选项，所以限制会反映用户当前的套餐。

Can I have multiple rate_limit declarations in one controller?

可以。给每个声明加一个唯一的 name:（例如 name: "burst" 和 name: "sustained"）。没有 name 时，在同一个 controller 中声明多个会让 Rails 抛异常。

Does Rails rate_limit work across multiple application servers?

只要缓存存储是共享的就可以。Solid Cache、Redis 和 Memcached 都支持原子 increment，所以跨服务器的并发请求会被正确计数。默认的 :memory_store 是按进程的，跨服务器甚至跨多个 Puma worker 都不起作用。

What’s the difference between Rails rate_limit and Rack::Attack?

rate_limit 运行在 Rails 请求周期内部，所以它能访问 current_user、controller 方法和路由上下文。Rack::Attack 运行在 Rails 启动请求之前的 Rack 层，这使它成为处理 IP 允许/封禁列表和滥用规则的合适工具。两者都用：Rack::Attack 做底层 IP 过滤，rate_limit 做按用户、感知套餐的策略。

How do I send a Retry-After header from rate_limit?

用 with: 选项提供一个自定义回调，在其中设置 response.headers["Retry-After"] 并渲染你的 429 响应。默认行为是抛出 ActionController::TooManyRequests，产生一个不带头的纯 429。

生产中的分级速率限制

这个改动让 rate_limit 有用多了。曾经是 middleware 关注点（或一个自定义 before_action）的事情，现在在你的 Api::BaseController 上几行就装下了。分级定价、突发加持续限制、匿名兜底、API key 桶、可观测性 hook，全都现在就住在这个宏里。

如果你想读源码，实现就在 PR #56128。如果你真要上线一个分级 API，把它和智能重试策略搭配，让你的客户端尊重你现在返回的 Retry-After 头。而如果你按 API token 而不是会话用户来分键，bearer token 认证和这个配合得很好。