GPU Cluster FAQ
Last updated: February 20, 2026
This guide covers frequently asked questions about Together AI's Instant Clusters based on common customer inquiries and issues.
Getting Started
What is an Instant Cluster?
Instant Clusters are on-demand GPU clusters that can be quickly provisioned for training and inference workloads. They offer flexible, pay-as-you-go access to GPU resources without long-term commitments.
How do I access my Instant Cluster?
After creating your cluster, you can download the kubeconfig from the Together GPU cluster page. If your cluster is not visible on the page, ensure you're logged in with the correct account and that the cluster has finished provisioning.
Can I SSH into my Instant Cluster nodes?
Yes, SSH access is available for Instant Cluster nodes, refer to the docs here: https://docs.together.ai/docs/gpu-clusters-management#direct-ssh-access
Cluster Configuration
Can I create multi-node Instant Clusters?
Yes, Instant Clusters support multi-node configurations. The number of nodes available depends on capacity and your account settings. If you're only seeing single-node options, contact support to check capacity or account limitations.
How much ephemeral storage is available on Instant Clusters?
Instant Clusters typically have at least 800GB of ephemeral storage per node (typically mounted under /scratch). If you need more storage for model weights or datasets, contact support to discuss options for increased storage provisioning.
How do I scale nodes up or down in my Instant Cluster?
Node scaling can be managed through the Together UI or CLI (tcloud) . For automated scaling, you can use API keys for authentication instead of SSO login.
Billing & Pricing
How does billing work for Instant Clusters?
Instant Clusters are billed hourly based on the number and type of GPUs in use. You're charged for the time the cluster is active, regardless of whether workloads are running.
What happens if I delete my cluster before the reservation duration ends?
If you create a cluster with a reservation duration (e.g., --reservation-duration 1 for 1 day) but delete it early, you are typically charged only for the actual time used, not the full reservation period. However, verify this with your billing team as policies may vary.
On-Demand vs Reserved Clusters: Which should I choose?
On-Demand (Instant Clusters): Best for short-term training jobs, experimentation, or workloads with variable duration. Pay hourly with no long-term commitment.
Reserved Clusters: Better for long-running training jobs or production workloads where you need guaranteed capacity. Typically offers better pricing for sustained usage.
Why does my bill seem higher than expected?
Common reasons include:
Cluster was running longer than anticipated
Billing is per GPU-hour, so 8 GPUs for 2 hours = 16 GPU-hours
Cluster auto-renewal if not explicitly deleted
Check your usage dashboard or contact billing support to review specific charges.
Connectivity & Access Issues
Why can't I connect to my cluster with kubectl?
Common causes:
DNS resolution issues: The cluster API endpoint may not be resolving. Verify the endpoint in your kubeconfig.
Expired kubeconfig: Download a fresh kubeconfig from the dashboard.
Network restrictions: Check firewall rules or VPN settings that might block access.
Cluster not fully provisioned: Wait for cluster setup to complete before connecting.
If issues persist, contact support with the error message and cluster ID.
My cluster can't access services on my main cluster or registry
Instant Clusters may be isolated from other cluster networks by default. For cross-cluster communication or access to private registries (like Harbor), contact support to configure network routing or VPN connections.
I can't see my Instant Cluster on the GPU cluster page
Possible solutions:
Refresh the page and ensure you're logged into the correct account
Check if the cluster is still provisioning (this can take time)
Verify the cluster wasn't deleted or expired
Contact support if the cluster should exist but isn't visible
Common Issues & Troubleshooting
My node is taking more than an hour to schedule
Long scheduling times can occur due to:
High demand for specific GPU types
Cluster capacity constraints
Infrastructure issues
Contact support immediately if scheduling takes longer than expected, providing your cluster ID and timestamp.
I'm getting NCCL errors that I didn't see before
NCCL errors often indicate networking or GPU communication issues:
Verify InfiniBand is properly configured (if required)
Check that all GPUs are on the bus (run nvidia-smi)
Ensure network drivers and NCCL versions are compatible
Contact support if errors persist, providing full logs
My cluster went down unexpectedly
If your cluster experiences unplanned downtime:
Check the status page for ongoing infrastructure issues
Review cluster expiration settings - clusters may shut down at the end of the reservation period
Contact support immediately for urgent production issues
We take cluster stability seriously. If downtime impacts critical workloads, escalate to your account team.
Image pulling is stuck or timing out
Common causes:
Large images: Container images (like sglang) can take time to download
Registry authentication: Ensure image pull secrets are configured correctly
Network issues: Temporary connectivity problems to registries
Check pod events with kubectl describe pod <pod-name> for specific error details.
A GPU is off the bus on one of my nodes
If a GPU is not detected (off the bus), this is a hardware issue. Contact support with the node name, and we'll provision a replacement node. We can swap nodes once you give the OK to minimize workload disruption.
Advanced Configuration
Can I use InfiniBand with Instant Clusters?
Yes, InfiniBand is supported on Instant Clusters. For automatic setup with tools like SkyPilot, you may need to configure specific environment variables. Contact support for InfiniBand configuration assistance.
How do I install custom kernel modules (like nvidia_peermem)?
Custom kernel modules like nvidia_peermem can be installed on request. Submit a ticket with:
Module name and purpose
Cluster ID and node details
Use case requirements
Our infrastructure team will review and install the module if compatible.
Can I deploy my own Helm charts on Instant Clusters?
Yes, you can deploy custom Helm charts. Be aware that some infrastructure components (like Traefik) may be automatically managed. Contact support if you need to customize or disable automated management for specific components.
How do I configure LoadBalancer services?
LoadBalancer services work on Instant Clusters and can be used to expose applications.
The recommended approach is to create a Traefik IngressRoute. Traefik is default ingress controller installed for the cluster.
Integration & Automation
Can I use SkyPilot with Instant Clusters?
Yes! SkyPilot works with Together AI Instant Clusters out of the box. This allows you to orchestrate AI workloads seamlessly. Refer to SkyPilot documentation for setup instructions.
How do I authenticate the CLI for automation?
For automation and programmatic access:
Use API keys instead of SSO login (which requires a browser)
Use the API to programmatically create, scale, and manage clusters
Is there an MCP server for Together AI Instant Clusters?
This is being evaluated. If you have specific requirements for MCP integration or automated cluster management, contact your account team to discuss roadmap and timelines.
Capacity & Availability
What should I do if there's a GPU shortage?
During high-demand periods:
Try different regions or GPU types if your workload is flexible
Contact support for capacity forecasts and availability estimates
Consider reserved clusters for guaranteed capacity
Join the waitlist for specific GPU types (like H200s or B200s)
Can I extend my cluster reservation?
Yes, cluster reservations can typically be extended. Contact support before your cluster expires with:
Cluster ID
Current expiration date
Desired extension duration
Security & Access Control
Can I disable remote access to my cluster nodes?
Yes, for enhanced security, all remote access (SSH, K8s API, etc.) can be disabled, with access only through LoadBalancer services. Submit a security configuration request to implement this.
Is anonymous authentication enabled on Instant Clusters?
Anonymous authentication settings may differ between cluster types. If you need specific authentication configurations (like --anonymous-auth=false), contact support to align settings across your clusters.
Getting Help
When should I contact support?
Contact support for:
Cluster provisioning issues or unexpected downtime
Configuration changes (CPU/RAM, storage, networking)
Hardware problems (GPUs off bus, node failures)
Billing questions or discrepancies
Access issues that troubleshooting steps don't resolve
Custom module or software installation requests
What information should I include in support tickets?
To help us resolve issues quickly, include:
Cluster ID and name
Region
Node names (if node-specific)
Error messages and logs (kubectl output, events, etc.)
Timestamp when the issue occurred
Steps to reproduce (if applicable)