Why is my fine-tuned model returning a 503 error?

Last updated: November 14, 2025

While the error code 503 typically means that our servers are overloaded or not ready, this response is also used as a 'catch all' in some cases.

together.ai currently only supports a limited selection of base models for Serverless LoRA Inference. The strings for these base models are:

meta-llama/Llama-4-Maverick-17B-128E-Instruct

meta-llama/Meta-Llama-3.1-8B-Instruct-Reference
meta-llama/Meta-Llama-3.1-70B-Instruct-Reference
Qwen/Qwen2.5-14B-Instruct
Qwen/Qwen2.5-72B-Instruct

Note: Only Qwen2.5 14B and 72B Instruct models are supported for serverless LoRA inference. Earlier Qwen2 models (such as Qwen2-72B-Instruct) are not supported and will result in a 503 error when attempting serverless inference.

If you didn't use one of these as your base model and try to call the model using our API, you will get a 503 response.

To make requests to your fine tuned model you will need to deploy it on a dedicated endpoint.

You can do this from the models page on your dashboard.