This chapter covers configurations and capabilities beyond the standard single-node setup.
Multi-GPU is for operators with 2+ GPUs who want to run models that donβt fit on a single card β primarily 70B parameter models or larger, which require 40GB+ VRAM in Q4 quantization.
Self-hosting is for organizations that need full data sovereignty or want to run a private inference network without using the public control plane.
Distributed training is for everyone who wants to follow the roadmap for the next major feature: fine-tuning jobs distributed across the network.