All API requests go to the control plane:
https://infernetprotocol.com/api/v1
If you’re self-hosting the control plane, replace this with your own URL.
All requests require a bearer token:
Authorization: Bearer your_token_here
Get a token by creating an API key in the Infernet Dashboard under Settings → API Keys.
Tokens are scoped to your account. Rate limits and billing are tracked per token.
Submit an inference job.
Request:
{
"model": "qwen2.5:14b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"max_tokens": 256,
"temperature": 0.7,
"stream": false
}Parameters:
| Field | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model identifier (e.g., qwen2.5:14b) |
messages |
array | Yes | Chat messages in OpenAI format |
max_tokens |
integer | No | Maximum output tokens (default: 512) |
temperature |
float | No | Sampling temperature 0.0–2.0 (default: 0.7) |
top_p |
float | No | Nucleus sampling (default: 1.0) |
stream |
boolean | No | Return SSE stream immediately (default: false) |
node_id |
string | No | Pin to a specific node (advanced) |
Response (stream: false):
{
"id": "job_9a3f2c1d",
"status": "pending",
"model": "qwen2.5:14b",
"created_at": "2026-04-30T14:23:41Z"
}When stream: true, the response is an SSE stream
directly. See Streaming Chat.
Example:
curl https://infernetprotocol.com/api/v1/jobs \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5:14b",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": false
}'Get job status and result.
Response (pending):
{
"id": "job_9a3f2c1d",
"status": "pending",
"model": "qwen2.5:14b",
"created_at": "2026-04-30T14:23:41Z"
}Response (completed):
{
"id": "job_9a3f2c1d",
"status": "completed",
"model": "qwen2.5:14b",
"node_id": "node_8f3a2c1d",
"result": {
"content": "Paris is the capital of France.",
"usage": {
"prompt_tokens": 24,
"completion_tokens": 8,
"total_tokens": 32
}
},
"created_at": "2026-04-30T14:23:41Z",
"completed_at": "2026-04-30T14:23:43Z",
"latency_ms": 2041
}Response (failed):
{
"id": "job_9a3f2c1d",
"status": "failed",
"error": "no_capacity",
"error_message": "No nodes available with model qwen2.5:14b",
"created_at": "2026-04-30T14:23:41Z"
}Open an SSE stream to receive tokens as they’re generated. This is the recommended way to deliver results in real-time applications.
See Streaming Chat for full documentation.
List models currently available on the network (at least one online node has the model loaded).
Response:
{
"models": [
{
"id": "qwen2.5:72b",
"nodes": 3,
"avg_tokens_per_second": 48
},
{
"id": "qwen2.5:14b",
"nodes": 12,
"avg_tokens_per_second": 78
},
{
"id": "qwen2.5:7b",
"nodes": 28,
"avg_tokens_per_second": 95
},
{
"id": "llama3.2:3b",
"nodes": 15,
"avg_tokens_per_second": 130
}
]
}Use this to check availability before submitting a job, especially for large models.
List your recent jobs.
Query params:
| Param | Default | Description |
|---|---|---|
limit |
20 | Number of jobs to return (max 100) |
status |
all | Filter by status: pending, processing,
completed, failed |
since |
— | ISO8601 timestamp to filter from |
Example:
curl "https://infernetprotocol.com/api/v1/jobs?limit=10&status=completed" \
-H "Authorization: Bearer $TOKEN"All errors return a JSON body with error and
error_message:
{
"error": "unauthorized",
"error_message": "Invalid or expired bearer token"
}Common error codes:
| Code | HTTP Status | Meaning |
|---|---|---|
unauthorized |
401 | Missing or invalid token |
bad_request |
400 | Missing required field or invalid parameter |
model_not_found |
404 | Model is not registered on the network |
no_capacity |
503 | Model exists but no nodes have capacity |
rate_limited |
429 | Too many requests; back off and retry |
internal_error |
500 | Something went wrong on our end |
Default rate limits for API tokens:
Rate limit headers are included in responses:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1746020400
When rate limited, you’ll receive HTTP 429. The
Retry-After header tells you when to retry:
HTTP/1.1 429 Too Many Requests
Retry-After: 13