Rate Limiting
API rate limits per endpoint category, response headers, and best practices for handling rate-limited requests
Rate Limiting
The API enforces rate limits to prevent abuse and ensure fair usage. Limits are applied per client IP address by default, or per authenticated user ID for certain endpoints.
In-Memory Store
Rate limit state is stored in-memory on the server process. In a multi-server deployment, each server maintains its own rate limit counters independently. This means effective limits may be higher than documented when requests are distributed across multiple servers.
Rate Limit Tiers
| Tier | Max Requests | Window | Identifier | Endpoints |
|---|---|---|---|---|
| General API | 100 | 60 seconds | IP address | Most API endpoints |
| AI Chat | 20 | 60 seconds | IP address | /api/chat, AI-powered endpoints |
| AI Chat (Authenticated) | 50 | 60 seconds | User ID | /api/chat (logged-in users) |
| Auth Login | 5 | 15 minutes | IP address | /api/auth/callback/credentials |
| Password Reset | 3 | 60 minutes | IP address | /api/auth/forgot-password, /api/auth/reset-password |
How Tiers Work
- General API (100/min): The default tier applied to most endpoints using the
withRateLimitmiddleware orcommonRateLimits.standardpreset. - AI Chat (20/min guest, 50/min user): AI endpoints use a stricter limit for unauthenticated users. When
useUserIdis enabled, authenticated users are tracked by their user ID instead of IP, giving them a higher allowance. - Auth Login (5/15min): Protects against brute-force password attacks. Applies per IP address.
- Password Reset (3/hr): Prevents abuse of password reset emails. Applies per IP address.
Response Headers
Every rate-limited response includes these headers:
| Header | Description | Example |
|---|---|---|
X-RateLimit-Limit | Current request count in the window | 15 |
X-RateLimit-Remaining | Requests remaining before throttling | 85 |
X-RateLimit-Reset | Seconds until the current window resets | 42 |
These headers are included on both successful and rejected responses, so clients can proactively track their usage.
Rate Limit Exceeded Response
When a client exceeds the rate limit, the API returns:
{
"error": "Rate limit exceeded",
"retryAfter": 42
}| Field | Type | Description |
|---|---|---|
error | string | Always "Rate limit exceeded" |
retryAfter | number | Seconds until the rate limit window resets |
The response also includes the standard rate limit headers with X-RateLimit-Remaining: 0.
Client Identification
The rate limiter identifies clients using the following priority:
X-Forwarded-For Header
If present (typical behind a proxy or load balancer), the first IP in the comma-separated list is used.
X-Real-IP Header
If X-Forwarded-For is not set, the X-Real-IP header is used.
Fallback
If neither header is present, a generic unknown-ip identifier is used. This means all clients without identifiable IPs share a single rate limit bucket.
For endpoints configured with useUserId: true, authenticated users are tracked by user-{userId} instead of their IP address, giving each user an independent rate limit.
Mobile App Bypass
Mobile App Traffic
Requests with a User-Agent header containing JoseMadridSalsaMobileApp bypass rate limiting entirely. This allows the proprietary mobile app to make unlimited API requests without being throttled.
Applying Rate Limits
Rate limits are applied to API route handlers using the withRateLimit middleware wrapper.
Direct Usage
import { withRateLimit } from '@/lib/middleware/api-helpers'
export const POST = withRateLimit(async (request) => {
return NextResponse.json({ success: true })
}, {
maxRequests: 10,
windowSeconds: 60,
useUserId: false,
})Using Presets
| Preset | Limit | Description |
|---|---|---|
commonRateLimits.standard | 100/min | Standard API endpoints |
commonRateLimits.aiChat | 20/min (guest) or 50/min (user) | AI chat endpoints, tracks by user ID |
commonRateLimits.auth | 5/15min | Authentication endpoints |
commonRateLimits.passwordReset | 3/hr | Password reset endpoints |
import { commonRateLimits } from '@/lib/middleware/api-helpers'
export const POST = commonRateLimits.auth(async (request) => {
// This handler is limited to 5 requests per 15 minutes per IP
return NextResponse.json({ success: true })
})Composing with Authentication
Rate limiting can be combined with authentication using the compose helper:
import { compose, withAuth, withRateLimit } from '@/lib/middleware/api-helpers'
import { RATE_LIMITS } from '@/lib/rate-limiter'
import { UserRole } from '@prisma/client'
export const POST = compose(
(h) => withAuth(h, { roles: [UserRole.ADMIN] }),
(h) => withRateLimit(h, RATE_LIMITS.API_GENERAL)
)(async (request) => {
return NextResponse.json({ success: true })
})Middleware is applied right to left: the request hits rate limiting first, then authentication.
Best Practices
Track Headers Proactively
Monitor X-RateLimit-Remaining on every response. Begin throttling client-side before hitting zero to avoid rejected requests.
Implement Exponential Backoff
When you receive a 429, wait for retryAfter seconds before retrying. If the retry also fails, increase the delay exponentially.
Authenticate for Higher Limits
AI chat endpoints grant 2.5x more requests to authenticated users (50/min vs 20/min). Sign in to get a higher rate limit.
Use User-Scoped Limits
For endpoints behind authentication, enable useUserId: true so that each user gets an independent rate limit bucket rather than sharing one with all users behind the same IP.
How is this guide?
Last updated on