This chapter describes the planned distributed training feature. Nothing documented here is currently available — it reflects the design direction as of April 2026.
Infernet Protocol’s inference network creates an underutilized resource: GPUs that sit idle between inference jobs. Distributed training would put that idle capacity to work, letting anyone submit a fine-tuning job and have it executed across multiple nodes.
The economic model mirrors inference: clients pay per compute used, operators earn for contributing GPU time, and all coordination is cryptographically secured.
LoRA fine-tuning: The most practical starting point. LoRA (Low-Rank Adaptation) freezes the base model weights and trains small adapter matrices. The adapter is orders of magnitude smaller than the full model, which means:
QLoRA: Combines LoRA with 4-bit quantization. Makes training even larger models practical on consumer GPUs.
Full fine-tuning: For cases where LoRA isn’t sufficient. Requires multi-GPU coordination and is more complex to distribute, but the demand exists.
Three training backends are being evaluated:
TRL (Transformer Reinforcement Learning) by Hugging Face. Strong support for instruction tuning (SFT), RLHF, DPO, and GRPO. Python-based. The most mature ecosystem for LLM fine-tuning.
# What a TRL SFT job config will look like
{
"type": "sft",
"backend": "trl",
"base_model": "Qwen/Qwen2.5-7B-Instruct",
"dataset": "ipfs://Qm...", # IPFS hash of training data
"training_args": {
"num_epochs": 3,
"learning_rate": 2e-4,
"per_device_batch_size": 4,
"lora_r": 16,
"lora_alpha": 32
}
}Axolotl: A training framework with simpler config than raw TRL. Supports a wider range of dataset formats. Preferred by many fine-tuning practitioners.
Unsloth: Extremely memory-efficient LoRA training. Claims 2x faster and 60% less VRAM than stock TRL via custom Triton kernels. Strong choice for consumer GPU nodes.
Training data will be handled via IPFS + Filecoin for decentralized storage:
This avoids any single point of control over training data and ensures reproducibility.
Large training jobs will distribute across multiple nodes using:
Data parallelism: Each node trains on a different shard of the dataset. Gradients are aggregated periodically. This is the simplest form of parallelism and works over commodity networking.
FSDP (Fully Sharded Data Parallelism): PyTorch’s approach for large model training. Each GPU holds a shard of the model weights, gradients, and optimizer states. More efficient than naive data parallelism for large models.
Coordination via the control plane: A training job has a coordinator node that aggregates gradients and distributes parameter updates. The control plane assigns the coordinator role and manages the job lifecycle.
A key challenge in distributed training is: how do you know the nodes actually ran the training instead of returning garbage?
The planned approach uses proof of training work (PoTW): each node periodically generates a proof (a deterministic checkpoint hash at specified intervals) that can be verified by the coordinator. Nodes that submit invalid checkpoints are slashed and the job is redistributed.
This is an active research area. The initial implementation will use simpler reputation-based validation (nodes with strong inference track records get training jobs, and statistical anomaly detection flags bad actors).
| Milestone | Status |
|---|---|
| LoRA fine-tuning (single node) | In development |
| Job submission API for training | Design phase |
| Multi-node data parallelism | Design phase |
| Dataset IPFS integration | Design phase |
| On-chain payment for training jobs | Design phase |
| QLoRA support | Planned |
| Full fine-tuning (multi-node FSDP) | Research phase |
The single-node LoRA fine-tuning milestone will be the first user-facing training feature. Once that’s stable, multi-node coordination follows.
If you want to run fine-tuning on your node hardware today, outside of the Infernet network:
# Install TRL + Unsloth
pip install trl unsloth
# Simple SFT with TRL
python -c "
from trl import SFTTrainer
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen2.5-7B-Instruct')
tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen2.5-7B-Instruct')
dataset = load_dataset('your_dataset_here')
trainer = SFTTrainer(model=model, tokenizer=tokenizer, train_dataset=dataset)
trainer.train()
trainer.save_model('./my-finetuned-model')
"When distributed training ships on the network, you’ll be able to submit these jobs via the Infernet API instead of running them locally.
Watch the Infernet GitHub for training-related issues and PRs. The IPIP (Infernet Protocol Improvement Proposals) repository tracks protocol design decisions, including the distributed training spec.