Welcome to the Jose Madrid Salsa developer docs — explore features, APIs, and deployment guides.
Jose Madrid SalsaJMS Docs

Rate Limiting

API rate limits per endpoint category, response headers, and best practices for handling rate-limited requests

Rate Limiting

The API enforces rate limits to prevent abuse and ensure fair usage. Limits are applied per client IP address by default, or per authenticated user ID for certain endpoints.

In-Memory Store

Rate limit state is stored in-memory on the server process. In a multi-server deployment, each server maintains its own rate limit counters independently. This means effective limits may be higher than documented when requests are distributed across multiple servers.

Rate Limit Tiers

TierMax RequestsWindowIdentifierEndpoints
General API10060 secondsIP addressMost API endpoints
AI Chat2060 secondsIP address/api/chat, AI-powered endpoints
AI Chat (Authenticated)5060 secondsUser ID/api/chat (logged-in users)
Auth Login515 minutesIP address/api/auth/callback/credentials
Password Reset360 minutesIP address/api/auth/forgot-password, /api/auth/reset-password

How Tiers Work

  • General API (100/min): The default tier applied to most endpoints using the withRateLimit middleware or commonRateLimits.standard preset.
  • AI Chat (20/min guest, 50/min user): AI endpoints use a stricter limit for unauthenticated users. When useUserId is enabled, authenticated users are tracked by their user ID instead of IP, giving them a higher allowance.
  • Auth Login (5/15min): Protects against brute-force password attacks. Applies per IP address.
  • Password Reset (3/hr): Prevents abuse of password reset emails. Applies per IP address.

Response Headers

Every rate-limited response includes these headers:

HeaderDescriptionExample
X-RateLimit-LimitCurrent request count in the window15
X-RateLimit-RemainingRequests remaining before throttling85
X-RateLimit-ResetSeconds until the current window resets42

These headers are included on both successful and rejected responses, so clients can proactively track their usage.

Rate Limit Exceeded Response

When a client exceeds the rate limit, the API returns:

429 Too Many Requests
{
  "error": "Rate limit exceeded",
  "retryAfter": 42
}
FieldTypeDescription
errorstringAlways "Rate limit exceeded"
retryAfternumberSeconds until the rate limit window resets

The response also includes the standard rate limit headers with X-RateLimit-Remaining: 0.

Client Identification

The rate limiter identifies clients using the following priority:

X-Forwarded-For Header

If present (typical behind a proxy or load balancer), the first IP in the comma-separated list is used.

X-Real-IP Header

If X-Forwarded-For is not set, the X-Real-IP header is used.

Fallback

If neither header is present, a generic unknown-ip identifier is used. This means all clients without identifiable IPs share a single rate limit bucket.

For endpoints configured with useUserId: true, authenticated users are tracked by user-{userId} instead of their IP address, giving each user an independent rate limit.

Mobile App Bypass

Mobile App Traffic

Requests with a User-Agent header containing JoseMadridSalsaMobileApp bypass rate limiting entirely. This allows the proprietary mobile app to make unlimited API requests without being throttled.

Applying Rate Limits

Rate limits are applied to API route handlers using the withRateLimit middleware wrapper.

Direct Usage

Custom rate limit
import { withRateLimit } from '@/lib/middleware/api-helpers'

export const POST = withRateLimit(async (request) => {
  return NextResponse.json({ success: true })
}, {
  maxRequests: 10,
  windowSeconds: 60,
  useUserId: false,
})

Using Presets

PresetLimitDescription
commonRateLimits.standard100/minStandard API endpoints
commonRateLimits.aiChat20/min (guest) or 50/min (user)AI chat endpoints, tracks by user ID
commonRateLimits.auth5/15minAuthentication endpoints
commonRateLimits.passwordReset3/hrPassword reset endpoints
Using a preset
import { commonRateLimits } from '@/lib/middleware/api-helpers'

export const POST = commonRateLimits.auth(async (request) => {
  // This handler is limited to 5 requests per 15 minutes per IP
  return NextResponse.json({ success: true })
})

Composing with Authentication

Rate limiting can be combined with authentication using the compose helper:

Auth + rate limiting
import { compose, withAuth, withRateLimit } from '@/lib/middleware/api-helpers'
import { RATE_LIMITS } from '@/lib/rate-limiter'
import { UserRole } from '@prisma/client'

export const POST = compose(
  (h) => withAuth(h, { roles: [UserRole.ADMIN] }),
  (h) => withRateLimit(h, RATE_LIMITS.API_GENERAL)
)(async (request) => {
  return NextResponse.json({ success: true })
})

Middleware is applied right to left: the request hits rate limiting first, then authentication.

Best Practices

Track Headers Proactively

Monitor X-RateLimit-Remaining on every response. Begin throttling client-side before hitting zero to avoid rejected requests.

Implement Exponential Backoff

When you receive a 429, wait for retryAfter seconds before retrying. If the retry also fails, increase the delay exponentially.

Authenticate for Higher Limits

AI chat endpoints grant 2.5x more requests to authenticated users (50/min vs 20/min). Sign in to get a higher rate limit.

Use User-Scoped Limits

For endpoints behind authentication, enable useUserId: true so that each user gets an independent rate limit bucket rather than sharing one with all users behind the same IP.

How is this guide?

Edit on GitHub

Last updated on

On this page