Why is my fine-tuned model returning a 503 error?
Last updated: November 14, 2025
While the error code 503 typically means that our servers are overloaded or not ready, this response is also used as a 'catch all' in some cases.
together.ai currently only supports a limited selection of base models for Serverless LoRA Inference. The strings for these base models are:
meta-llama/Llama-4-Maverick-17B-128E-Instruct
meta-llama/Meta-Llama-3.1-8B-Instruct-Reference
meta-llama/Meta-Llama-3.1-70B-Instruct-Reference
Qwen/Qwen2.5-14B-Instruct
Qwen/Qwen2.5-72B-Instruct
Note: Only Qwen2.5 14B and 72B Instruct models are supported for serverless LoRA inference. Earlier Qwen2 models (such as Qwen2-72B-Instruct) are not supported and will result in a 503 error when attempting serverless inference.
If you didn't use one of these as your base model and try to call the model using our API, you will get a 503 response.
To make requests to your fine tuned model you will need to deploy it on a dedicated endpoint.
You can do this from the models page on your dashboard.