Fine-Tuning Pricing

Last updated: November 14, 2025



Fine-tuning pricing at Together AI is based on the total number of tokens processed during your job. This includes both training and validation processes, and varies based on the model size, fine-tuning type (Supervised Fine-tuning or DPO), and implementation method (LoRA or Full Fine-tuning).

How Pricing Works

The total cost of a fine-tuning job is calculated using:

  • Model size (e.g., Up to 16B, 16.1-69B, etc.)

  • Fine-tuning type (Supervised Fine-tuning or Direct Preference Optimization (DPO))

  • Implementation method (LoRA or Full Fine-tuning)

  • Total tokens processed = (n_epochs × n_tokens_per_training_dataset) + (n_evals × n_tokens_per_validation_dataset)

Each combination of fine-tuning type and implementation method has its own pricing. For current rates, refer to our fine-tuning pricing page.

Token Calculation

The tokenization step is part of the fine-tuning process on our API. The exact token count and final price of your job will be available after tokenization completes. You can find this information in:

  • Your jobs dashboard

  • Or by running together fine-tuning retrieve $JOB_ID in the CLI

Dedicated Endpoint charges for Fine-Tuned Models

After your fine-tuning job completes, additional hosting charges apply if you create a dedicated endpoint for your fine-tuned model. These charges are incurred per minute, even when the model is not actively being used.

Important: Hosting charges are separate from the initial fine-tuning job cost and will continue to accrue until you stop the endpoint.

To avoid unexpected charges:

  • Monitor your active endpoints in the models dashboard

  • Stop endpoints when not in use to prevent ongoing charges

  • Consider using the auto-shutdown feature to automatically stop inactive endpoints

  • Review hourly hosting rates on our pricing page

Note: Frequently starting and stopping models may cause deployment delays, so plan your usage accordingly.