Fine-Tuning Pricing
Last updated: November 14, 2025
Fine-tuning pricing at Together AI is based on the total number of tokens processed during your job. This includes both training and validation processes, and varies based on the model size, fine-tuning type (Supervised Fine-tuning or DPO), and implementation method (LoRA or Full Fine-tuning).
How Pricing Works
The total cost of a fine-tuning job is calculated using:
Model size (e.g., Up to 16B, 16.1-69B, etc.)
Fine-tuning type (Supervised Fine-tuning or Direct Preference Optimization (DPO))
Implementation method (LoRA or Full Fine-tuning)
Total tokens processed = (n_epochs × n_tokens_per_training_dataset) + (n_evals × n_tokens_per_validation_dataset)
Each combination of fine-tuning type and implementation method has its own pricing. For current rates, refer to our fine-tuning pricing page.
Token Calculation
The tokenization step is part of the fine-tuning process on our API. The exact token count and final price of your job will be available after tokenization completes. You can find this information in:
Dedicated Endpoint charges for Fine-Tuned Models
After your fine-tuning job completes, additional hosting charges apply if you create a dedicated endpoint for your fine-tuned model. These charges are incurred per minute, even when the model is not actively being used.
Important: Hosting charges are separate from the initial fine-tuning job cost and will continue to accrue until you stop the endpoint.
To avoid unexpected charges:
Monitor your active endpoints in the models dashboard
Stop endpoints when not in use to prevent ongoing charges
Consider using the auto-shutdown feature to automatically stop inactive endpoints
Review hourly hosting rates on our pricing page
Note: Frequently starting and stopping models may cause deployment delays, so plan your usage accordingly.