Why am I getting 429 on Serverless, and how long does it last?
Last updated: May 11, 2026
If you see HTTP 429, your request rate is temporarily above the model's effective limit for your account at that moment.
This is usually not a permanent account issue.
What 429 means
- 429 = rate limiting (too many requests/tokens in the current window)
- It is different from:
- 503 = service/model temporarily unavailable
- 500 = internal server error
- 402 = billing/account state issue
How long does 429 last?
There is no single fixed cooldown for all serverless 429s.
Recovery time depends on:
- how bursty traffic is
- how far traffic is above the current limit
- model demand/capacity at that time
- whether retries use proper backoff
In many cases, it improves quickly once traffic is smoothed and retries are spaced.
How to check your current effective limits
Use response headers as the source of truth for each request:
- x-ratelimit-*
- x-tokenlimit-*
- dynamic variants when present
These values can change over time based on usage and capacity.
How to reduce 429s
1. Smooth request bursts (avoid sudden concurrency spikes).
2. Use exponential backoff + jitter on retries.
3. Respect Retry-After header when present.
4. Monitor x-ratelimit-* and x-tokenlimit-* on both success and 429 responses.
5. For steady guaranteed throughput, consider Dedicated Endpoints.
References
- [Rate limits](https://docs.together.ai/docs/rate-limits)
- [Usage limits and account access](https://docs.together.ai/docs/billing-usage-limits)