Modern AI applications depend almost entirely on a handful of centralized API providers. When OpenAI has an outage, thousands of products break. When Anthropic changes its pricing, startups scramble. When a model gets deprecated, months of prompt engineering disappears overnight.
Beyond reliability and cost, there’s a structural problem: the compute that runs these models is controlled by a small number of large cloud providers. GPUs are expensive. If you have one, there’s currently no straightforward way to monetize it by running inference for others. If you need inference, you’re stuck paying whatever the incumbent providers charge.
Infernet Protocol is built to fix both sides of this.
Infernet Protocol is a decentralized GPU compute network for LLM inference. It has two kinds of participants:
Node operators run GPU servers with inference software installed. They register their nodes on the network, keep models loaded, and accept inference jobs. They earn crypto payments per job completed.
Clients submit inference jobs through a unified API. Jobs are routed to available nodes, executed, and results are streamed back. Clients pay per job. They don’t need to know which node ran their job.
The network is coordinated through a control plane (a Next.js app backed by Supabase) that handles node registration, job routing, and payment accounting. Crucially, the control plane never holds private keys. All auth uses Nostr-style secp256k1 keypairs where nodes sign their own requests.
A node operator installs the CLI and runs
infernet setup. This generates a secp256k1 keypair, detects
the GPU, installs an inference backend (Ollama by default), and
registers the node on the network.
The daemon starts with
infernet start. It heartbeats to the control plane every 30
seconds, reporting which models are loaded, GPU utilization, and
availability.
A client submits a job via
POST /api/v1/jobs. The control plane routes the job to a
suitable node based on model availability, capacity, and
proximity.
The node runs inference using whichever backend is installed (Ollama, vLLM, SGLang, MAX, or llama.cpp). Results stream back to the client via Server-Sent Events.
Payment is settled. The node’s wallet receives a
Compute Payment Receipt (CPR) that can be redeemed on-chain. Operators
run infernet payout to claim earnings.
No vendor lock-in. The API is consistent regardless of which backend is running inference. You can switch from Ollama to vLLM by changing an env var without changing your client code.
Censorship resistance. Because any operator can join and any model can be served, the network isn’t dependent on any single company’s content policies or business decisions.
Real hardware. Jobs run on real GPUs owned by real people, not virtual machines. Operators who invest in better hardware can handle larger models and charge more.
Crypto-native payments. Payments are settled on-chain with no intermediary holding funds. Operators get paid directly into their wallets.
Open source. The CLI, control plane code, and protocol spec are all open source. You can self-host the entire stack if you want full independence.
Infernet Protocol is not a model hosting service. It doesn’t store model weights. Operators pull models onto their own machines and serve them from local storage.
It’s not a training platform (yet — see Chapter 6 for the roadmap).
It’s not a cloud provider. There are no SLAs, no guaranteed uptime per node, and no managed infrastructure. The network’s reliability comes from having many nodes, not from any individual node being highly available.