This chapter is for developers building applications on top of Infernet Protocol. You don’t need to run a node — you submit jobs to the network via a REST API and get inference results back.
From a developer’s perspective, Infernet Protocol looks like a standard LLM API — similar to OpenAI’s API — but backed by a decentralized network of GPU nodes. You submit a prompt, specify a model, and get tokens back. The routing, node selection, and payment happen under the hood.
The key differences from a centralized provider:
Model availability varies: nodes on the network
serve different models. If you request a model that no node currently
has loaded, the job will queue until a node with that model is available
— or fail with no_capacity if none are registered.
Streaming is the default: because responses come from distributed nodes, streaming is the most reliable way to get results. Polling a job ID also works but adds latency.
Auth uses bearer tokens from the dashboard: you get a bearer token by creating an API key in the Infernet Dashboard. This token identifies your account for billing and rate limiting.
# Set your token
export INFERNET_BEARER_TOKEN="your_token_here"
# Submit a job
curl https://infernetprotocol.com/api/v1/jobs \
-H "Authorization: Bearer $INFERNET_BEARER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5:7b",
"messages": [{"role": "user", "content": "What is 2 + 2?"}],
"stream": false
}'You should get back a JSON response with the job result in under a few seconds.