Together AI

We use dataset packing, so sequences shorter than the max sequence length are concattenated with a token to separate them. If an example is greater than maximum sequence length, it is split so the entire example is used, but in exclusive subsets.

What happens if the training data (jsonl), has examples with token counts smaller (or longer) than the context length?

DOCS

SUPPORT

Find answers and get help from Intercom Support and Community Experts

Empty Help Center

Uh oh. That page doesn’t exist.

Disappointed

Neutral

Smiley

Thinking...

Searching through sources...

Analyzing...

Title

Track the progress of all tickets related to your company.

Tickets portal.

{assigneeName} needs more information from you

Tickets

No access to tickets portal

English

linear-gradient(to bottom right, #FFF,#F1EFED)