A job moves through these states:
pending β processing β completed
β failed
| State | Description |
|---|---|
pending |
Job submitted, waiting for a node to accept it |
processing |
A node has accepted the job and is running inference |
completed |
Inference finished successfully, result available |
failed |
Job failed permanently (see error codes) |
Thereβs no cancelled state β jobs cannot be cancelled
once submitted.
For non-streaming use cases, poll the job status endpoint until the job is complete:
async function waitForJob(jobId, token, pollIntervalMs = 500) {
const url = `https://infernetprotocol.com/api/v1/jobs/${jobId}`;
while (true) {
const response = await fetch(url, {
headers: { 'Authorization': `Bearer ${token}` },
});
const job = await response.json();
if (job.status === 'completed') {
return job.result.content;
}
if (job.status === 'failed') {
throw new Error(`Job failed: ${job.error} β ${job.error_message}`);
}
// Still pending or processing
await new Promise(resolve => setTimeout(resolve, pollIntervalMs));
}
}
// Usage
const jobResponse = await fetch('https://infernetprotocol.com/api/v1/jobs', {
method: 'POST',
headers: {
'Authorization': `Bearer ${TOKEN}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'qwen2.5:14b',
messages: [{ role: 'user', content: 'What is 2 + 2?' }],
stream: false,
}),
});
const { id } = await jobResponse.json();
const result = await waitForJob(id, TOKEN);For Python:
import time
import httpx
def wait_for_job(job_id: str, token: str, poll_interval: float = 0.5) -> str:
url = f"https://infernetprotocol.com/api/v1/jobs/{job_id}"
headers = {"Authorization": f"Bearer {token}"}
with httpx.Client() as client:
while True:
job = client.get(url, headers=headers).json()
if job["status"] == "completed":
return job["result"]["content"]
if job["status"] == "failed":
raise RuntimeError(f"Job failed: {job['error']} β {job['error_message']}")
time.sleep(poll_interval)| Consideration | Polling | Streaming |
|---|---|---|
| Implementation complexity | Low | Medium |
| Time-to-first-byte | Full generation time | ~1 second |
| UX for end users | Spinner until done | Tokens appear live |
| Network usage | Multiple requests | Single long connection |
| Reliability under unstable connections | Better (can resume polling) | Requires reconnection |
| Good for | Batch processing, server-to-server | User-facing chat, demos |
For batch processing or backend jobs where a human isnβt watching, polling is fine. For any user-facing interface, streaming is strongly preferred.
Jobs have a server-side timeout of 5 minutes. If a
job is still in pending or processing state
after 5 minutes, it will transition to failed with
error: "timeout".
This covers cases where a node accepts a job but then goes offline. The control plane detects the missed heartbeat, marks the node offline, and re-queues the job on a different node β all transparently.
Set a client-side timeout somewhat longer to allow for re-queue time:
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 6 * 60 * 1000); // 6 min
try {
const response = await fetch('https://infernetprotocol.com/api/v1/jobs', {
signal: controller.signal,
// ...
});
} finally {
clearTimeout(timeoutId);
}Not all failures are permanent. Implement exponential backoff for transient errors:
const RETRYABLE_ERRORS = ['no_capacity', 'node_disconnected', 'internal_error'];
async function submitWithRetry(payload, token, maxRetries = 3) {
let lastError;
for (let attempt = 0; attempt <= maxRetries; attempt++) {
if (attempt > 0) {
const delay = Math.min(1000 * Math.pow(2, attempt - 1), 30000);
await new Promise(resolve => setTimeout(resolve, delay));
}
try {
const response = await fetch('https://infernetprotocol.com/api/v1/jobs', {
method: 'POST',
headers: {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json',
},
body: JSON.stringify(payload),
});
if (response.status === 429) {
const retryAfter = parseInt(response.headers.get('Retry-After') || '5');
await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
continue;
}
const data = await response.json();
if (response.ok) return data;
lastError = data;
if (!RETRYABLE_ERRORS.includes(data.error)) {
throw new Error(`Non-retryable error: ${data.error_message}`);
}
} catch (err) {
if (err.name === 'AbortError') throw err;
lastError = err;
}
}
throw new Error(`Failed after ${maxRetries} retries: ${lastError?.error_message || lastError?.message}`);
}| Error | Retryable | Cause | Action |
|---|---|---|---|
no_capacity |
Yes | No node available with this model | Retry with backoff; consider different model |
model_not_found |
No | Model not registered on network | Check model name spelling; use /api/v1/models |
unauthorized |
No | Invalid bearer token | Refresh token in dashboard |
rate_limited |
Yes | Too many requests | Respect Retry-After header |
bad_request |
No | Invalid parameters | Fix request payload |
node_disconnected |
Yes | Node went offline during streaming | Retry full job |
timeout |
Yes | Job exceeded 5 minute limit | Retry; consider smaller context |
internal_error |
Yes | Unexpected server error | Retry with backoff |
The time a job spends in pending depends on network load
and model availability. For popular models (qwen2.5:7b, llama3.2:3b),
jobs typically start within 1β3 seconds. For rare or large models,
pending time can be longer if all nodes with that model are busy.
You can check how many nodes are available for a model before submitting:
const models = await fetch('https://infernetprotocol.com/api/v1/models', {
headers: { 'Authorization': `Bearer ${TOKEN}` },
}).then(r => r.json());
const modelInfo = models.models.find(m => m.id === 'qwen2.5:14b');
if (modelInfo && modelInfo.nodes > 0) {
// Nodes available, low wait time
console.log(`${modelInfo.nodes} nodes available`);
} else {
console.log('Model not available or no capacity');
}For high-volume batch processing, submit multiple jobs in parallel and poll them concurrently:
async function batchInference(prompts, model, token) {
// Submit all jobs
const jobIds = await Promise.all(
prompts.map(prompt =>
fetch('https://infernetprotocol.com/api/v1/jobs', {
method: 'POST',
headers: {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model,
messages: [{ role: 'user', content: prompt }],
stream: false,
}),
})
.then(r => r.json())
.then(j => j.id)
)
);
// Poll all until complete
return Promise.all(jobIds.map(id => waitForJob(id, token)));
}
const results = await batchInference(
['What is 2+2?', 'What is the capital of France?', 'Who wrote Hamlet?'],
'qwen2.5:7b',
TOKEN
);Rate limits apply per token. For large batches, use the rate limit headers to stay within bounds.