Supervised Fine-Tuning (SFT) & Preference Fine-Tuning

Last updated: August 29, 2025

Supervised fine-tuning (SFT) is the default method on our platform. The recommended approach is to first perform SFT followed up by preference tuning as follows:

First perform supervised fine-tuning (SFT) on your data.
Then refine with preference fine-tuning using continued fine-tuning on your SFT checkpoint.

Performing SFT on your dataset prior to DPO can significantly increase the resulting model quality, especially if your training data differs significantly from the data the base model observed during pretraining. To perform SFT, you can concatenate the context with the preferred output and use one of our SFT data formats .