Skip to main content
All CollectionsFine-tuning FAQs
Why was there an error while running my job?
Why was there an error while running my job?
Updated over 9 months ago

If your job fails after downloading the training file, but before training starts, the most likely source of the error is the training data. For example, your event log might look like

You can verify the formatting of your input file with the Together CLI tool with the following command:

$ together files check ~/Downloads/unified_joke_explanations.jsonl { "is_check_passed": true, "model_special_tokens": "we are not yet checking end of sentence tokens for this model", "file_present": "File found", "file_size": "File size 0.0 GB", "num_samples": 356 }

Despite our best efforts, the file checker does not catch all errors. Please contact support if your training data file passes the checks, but you are still seeing the above error conditions.

If you see an error during other steps in your training job, this may be due to internal errors in our training stack (e.g. hardware failure or bugs). We actively monitor job failures, and work as quickly as we can to resolve these issues. Once the issue has been resolved by our engineers, your job will be automatically or manually restarted. Charges for the restarted job will be refunded.

Did this answer your question?