GPU Cluster FAQ

Last updated: February 20, 2026

This guide covers frequently asked questions about Together AI's Instant Clusters based on common customer inquiries and issues.

Getting Started

What is an Instant Cluster?

Instant Clusters are on-demand GPU clusters that can be quickly provisioned for training and inference workloads. They offer flexible, pay-as-you-go access to GPU resources without long-term commitments.

How do I access my Instant Cluster?

After creating your cluster, you can download the kubeconfig from the Together GPU cluster page. If your cluster is not visible on the page, ensure you're logged in with the correct account and that the cluster has finished provisioning.

Can I SSH into my Instant Cluster nodes?

Yes, SSH access is available for Instant Cluster nodes, refer to the docs here: https://docs.together.ai/docs/gpu-clusters-management#direct-ssh-access

Cluster Configuration

Can I create multi-node Instant Clusters?

Yes, Instant Clusters support multi-node configurations. The number of nodes available depends on capacity and your account settings. If you're only seeing single-node options, contact support to check capacity or account limitations.

How much ephemeral storage is available on Instant Clusters?

Instant Clusters typically have at least 800GB of ephemeral storage per node (typically mounted under /scratch). If you need more storage for model weights or datasets, contact support to discuss options for increased storage provisioning.

How do I scale nodes up or down in my Instant Cluster?

Node scaling can be managed through the Together UI or CLI (tcloud) . For automated scaling, you can use API keys for authentication instead of SSO login.

Billing & Pricing

How does billing work for Instant Clusters?

Instant Clusters are billed hourly based on the number and type of GPUs in use. You're charged for the time the cluster is active, regardless of whether workloads are running.

What happens if I delete my cluster before the reservation duration ends?

If you create a cluster with a reservation duration (e.g., --reservation-duration 1 for 1 day) but delete it early, you are typically charged only for the actual time used, not the full reservation period. However, verify this with your billing team as policies may vary.

On-Demand vs Reserved Clusters: Which should I choose?

On-Demand (Instant Clusters): Best for short-term training jobs, experimentation, or workloads with variable duration. Pay hourly with no long-term commitment.

Reserved Clusters: Better for long-running training jobs or production workloads where you need guaranteed capacity. Typically offers better pricing for sustained usage.

Why does my bill seem higher than expected?

Common reasons include:

  • Cluster was running longer than anticipated

  • Billing is per GPU-hour, so 8 GPUs for 2 hours = 16 GPU-hours

  • Cluster auto-renewal if not explicitly deleted

Check your usage dashboard or contact billing support to review specific charges.

Connectivity & Access Issues

Why can't I connect to my cluster with kubectl?

Common causes:

  • DNS resolution issues: The cluster API endpoint may not be resolving. Verify the endpoint in your kubeconfig.

  • Expired kubeconfig: Download a fresh kubeconfig from the dashboard.

  • Network restrictions: Check firewall rules or VPN settings that might block access.

  • Cluster not fully provisioned: Wait for cluster setup to complete before connecting.

If issues persist, contact support with the error message and cluster ID.

My cluster can't access services on my main cluster or registry

Instant Clusters may be isolated from other cluster networks by default. For cross-cluster communication or access to private registries (like Harbor), contact support to configure network routing or VPN connections.

I can't see my Instant Cluster on the GPU cluster page

Possible solutions:

  • Refresh the page and ensure you're logged into the correct account

  • Check if the cluster is still provisioning (this can take time)

  • Verify the cluster wasn't deleted or expired

  • Contact support if the cluster should exist but isn't visible

Common Issues & Troubleshooting

My node is taking more than an hour to schedule

Long scheduling times can occur due to:

  • High demand for specific GPU types

  • Cluster capacity constraints

  • Infrastructure issues

Contact support immediately if scheduling takes longer than expected, providing your cluster ID and timestamp.

I'm getting NCCL errors that I didn't see before

NCCL errors often indicate networking or GPU communication issues:

  • Verify InfiniBand is properly configured (if required)

  • Check that all GPUs are on the bus (run nvidia-smi)

  • Ensure network drivers and NCCL versions are compatible

  • Contact support if errors persist, providing full logs

My cluster went down unexpectedly

If your cluster experiences unplanned downtime:

  • Check the status page for ongoing infrastructure issues

  • Review cluster expiration settings - clusters may shut down at the end of the reservation period

  • Contact support immediately for urgent production issues

We take cluster stability seriously. If downtime impacts critical workloads, escalate to your account team.

Image pulling is stuck or timing out

Common causes:

  • Large images: Container images (like sglang) can take time to download

  • Registry authentication: Ensure image pull secrets are configured correctly

  • Network issues: Temporary connectivity problems to registries

Check pod events with kubectl describe pod <pod-name> for specific error details.

A GPU is off the bus on one of my nodes

If a GPU is not detected (off the bus), this is a hardware issue. Contact support with the node name, and we'll provision a replacement node. We can swap nodes once you give the OK to minimize workload disruption.

Advanced Configuration

Can I use InfiniBand with Instant Clusters?

Yes, InfiniBand is supported on Instant Clusters. For automatic setup with tools like SkyPilot, you may need to configure specific environment variables. Contact support for InfiniBand configuration assistance.

How do I install custom kernel modules (like nvidia_peermem)?

Custom kernel modules like nvidia_peermem can be installed on request. Submit a ticket with:

  • Module name and purpose

  • Cluster ID and node details

  • Use case requirements

Our infrastructure team will review and install the module if compatible.

Can I deploy my own Helm charts on Instant Clusters?

Yes, you can deploy custom Helm charts. Be aware that some infrastructure components (like Traefik) may be automatically managed. Contact support if you need to customize or disable automated management for specific components.

How do I configure LoadBalancer services?

LoadBalancer services work on Instant Clusters and can be used to expose applications.

The recommended approach is to create a Traefik IngressRoute. Traefik is default ingress controller installed for the cluster.

Integration & Automation

Can I use SkyPilot with Instant Clusters?

Yes! SkyPilot works with Together AI Instant Clusters out of the box. This allows you to orchestrate AI workloads seamlessly. Refer to SkyPilot documentation for setup instructions.

How do I authenticate the CLI for automation?

For automation and programmatic access:

  • Use API keys instead of SSO login (which requires a browser)

  • Use the API to programmatically create, scale, and manage clusters

Is there an MCP server for Together AI Instant Clusters?

This is being evaluated. If you have specific requirements for MCP integration or automated cluster management, contact your account team to discuss roadmap and timelines.

Capacity & Availability

What should I do if there's a GPU shortage?

During high-demand periods:

  • Try different regions or GPU types if your workload is flexible

  • Contact support for capacity forecasts and availability estimates

  • Consider reserved clusters for guaranteed capacity

  • Join the waitlist for specific GPU types (like H200s or B200s)

Can I extend my cluster reservation?

Yes, cluster reservations can typically be extended. Contact support before your cluster expires with:

  • Cluster ID

  • Current expiration date

  • Desired extension duration

Security & Access Control

Can I disable remote access to my cluster nodes?

Yes, for enhanced security, all remote access (SSH, K8s API, etc.) can be disabled, with access only through LoadBalancer services. Submit a security configuration request to implement this.

Is anonymous authentication enabled on Instant Clusters?

Anonymous authentication settings may differ between cluster types. If you need specific authentication configurations (like --anonymous-auth=false), contact support to align settings across your clusters.

Getting Help

When should I contact support?

Contact support for:

  • Cluster provisioning issues or unexpected downtime

  • Configuration changes (CPU/RAM, storage, networking)

  • Hardware problems (GPUs off bus, node failures)

  • Billing questions or discrepancies

  • Access issues that troubleshooting steps don't resolve

  • Custom module or software installation requests

What information should I include in support tickets?

To help us resolve issues quickly, include:

  • Cluster ID and name

  • Region

  • Node names (if node-specific)

  • Error messages and logs (kubectl output, events, etc.)

  • Timestamp when the issue occurred

  • Steps to reproduce (if applicable)