Infernet Protocol

FAQ

Frequently asked questions.

What hardware works, which engines we support, how distributed inference and training fit together, and what we explicitly don't do. For deeper technical detail see /docs or the IPIPs.

Hardware support

What GPUs does Infernet support?

Anything Ollama or vLLM can drive. NVIDIA (CUDA), AMD (ROCm), Apple Silicon (Metal), and CPU-only all work. The installer auto-detects and provisions whichever engines fit your hardware: on NVIDIA boxes it installs both Ollama and vLLM; on AMD / Apple / CPU it installs Ollama (which natively handles those targets).

Does it work on Windows?

Yes, via WSL2. The installer is POSIX shell, so Windows operators run wsl --install -d Ubuntu, install the Windows-side NVIDIA driver (gives WSL CUDA without a separate Linux driver), and then run the same one-liner inside Ubuntu. Outbound paths (chat, remote model commands) work as-is; direct P2P inbound on :46337 needs a netsh portproxy rule on the Windows host (optional — only matters if you want direct peer connections).

What about Apple Silicon?

Fully supported via Ollama on Metal. M1/M2/M3 with 16+ GB unified memory runs 7B models comfortably; 32+ GB hits 13B-class models. vLLM is NVIDIA-only and skips on Apple Silicon.

How much disk does the install need?

With a deploy bearer (auto-bootstrap pulls a model): ~3 GB without bearer; ~10 GB on non-NVIDIA; ~25 GB on NVIDIA (Ollama CUDA libs + vLLM + a 7 GB model + headroom). The installer auto-detects and bumps the threshold based on what it'll provision; below the bar it bails loud instead of failing mid-tar-extract.

I'm on RunPod / Vast.ai / Paperspace — does the install handle the volume mount?

Yes, host-agnostic. install.sh scans df, finds the biggest writable mount that isn't $HOME's filesystem, and relocates everything (node_modules, mise data, vLLM venv, Ollama CUDA libs, model blobs) onto it. Works for RunPod /workspace, Vast.ai /data, Lambda/lambda, bare metal /mnt/*, etc. — no per-platform config needed.

Inference engines

Ollama or vLLM — when do I use which?

Ollama is the easy default: works on every GPU vendor + CPU, installs in ~30 s, manages model downloads. Use it for a single GPU, mixed hardware, dev boxes, or if you just want it to work. vLLM is the high-throughput choice for serious NVIDIA hardware: PagedAttention, request batching, native tensor + pipeline parallelism via Ray. Use it when you have multiple GPUs (or a cluster), care about p99 latency under load, or need OpenAI-compatible APIs in a busy production setup. The installer provisions both on NVIDIA so you can switch by which one is running. Auto-select picks vLLM ahead of Ollama if both are up.

What's a Ray cluster and do I need one?

Ray is the orchestration layer vLLM uses internally for multi-GPU and multi-node serving. Single-box multi-GPU works without any Ray config — vLLM spawns it implicitly. You only configure Ray explicitly when you have multiple machines and want vLLM to span them: set INFERNET_RAY_MODE=head on one box, INFERNET_RAY_MODE=worker + INFERNET_RAY_HEAD=host:6379 on the others, then run vllm serve --tensor-parallel-size N --pipeline-parallel-size M on the head.

Does it support OpenAI-compatible APIs?

Yes. vLLM serves the OpenAI chat-completions endpoint natively, and the Infernet control plane exposes /v1/chat/completions as an OpenAI-compatible gateway that routes to live providers (or falls back to NVIDIA NIM if the network is empty). Drop-in for any tool that already speaks OpenAI.

Workload classes

What are the workload classes?

Defined in IPIP-0010:
  • A — one model fits one GPU; one request → one provider. Real-time chat.
  • B — one model fits one provider's cluster, sharded via tensor + pipeline parallelism intra-LAN. Real-time chat.
  • B.5 — one model spans multiple providers via pipeline-parallel relay over WAN (Petals). Batch only — too slow for real-time.
  • C — distributed training across providers via async delta exchange (OpenDiLoCo, Hivemind). Long-running.

Can you shard one model across random P2P nodes for live chat?

No, and don't trust anyone who says otherwise. Tensor parallelism (the only way to make per-token latency reasonable for sharded inference) needs sub-millisecond GPU-to-GPU bandwidth — the public internet can't provide that. We support pipeline-parallel sharding across providers (Class B.5 via Petals), but it's batch-only at multi-second per-token latency. Real-time chat for one model needs that model to fit in one provider's hardware (Class A or B).

Does Infernet support distributed training?

Scaffold is in via @infernetprotocol/training with backends for DeepSpeed (Class B trusted-cluster), OpenRLHF (Ray + vLLM + DeepSpeed RLHF), OpenDiLoCo (Class C cross-provider async), and Petals (Class B.5 fine-tunes). Today the stub backend emits synthetic step events for end-to-end testing; the real Python integrations land incrementally. See IPIP-0011.

Can I submit batch jobs (embeddings, bulk classification)?

Spec lands as IPIP-0013: POST /api/v1/jobs/batch takes one logical job (embed N docs, classify N items, summarize N texts), the server splits it into independent chunks, BullMQ workers fan chunks out to providers in parallel, results aggregate into a manifest. Endpoint not yet live — IPIP defines the shape; implementation in flight.

Privacy & security

Who sees my prompts?

For Class A and B (single-provider) jobs: only the provider you got routed to, and only for the duration of that request. The control plane stores job metadata (timing, model, who paid) but not prompt content.

For Class B.5 (pipeline-parallel cross-provider): every provider in the chain sees the intermediate hidden states, which leak prompt content. Don't route privacy-sensitive prompts through B.5 — pin to a single trusted provider instead. The dashboard displays a "Visible to N relay peers" warning on B.5 submissions.

Does my GPU node need a public IP?

No. The control-plane-mediated paths (chat, batch jobs, remote model commands) all work outbound-only — your daemon polls and posts. Direct provider-to-provider P2P features (libp2p peering on :46337) need inbound, but if you only care about earning via routed jobs, NAT is fine. Run with --no-advertise to never publish your IP.

What credentials live on a node?

A Nostr (secp256k1 / BIP-340) keypair, generated on first run and stored at ~/.config/infernet/config.json mode 0600. That's it — no database credentials, no service-role keys. Every API call to the control plane carries an X-Infernet-Auth envelope with a Schnorr signature over method + path + timestamp + nonce + sha256(body). 60-second replay window, per-process nonce cache, pubkey must match the public_key on the target row.

Payments

Which coins / chains are supported?

BTC, BCH, ETH, SOL, POL, BNB, XRP, ADA, DOGE; plus USDT on ETH/Polygon/Solana; plus USDC on ETH/Polygon/Solana/Base. Gateway is CoinPayPortal. Providers configure a payout address per coin with infernet payout set <COIN> <ADDRESS>; clients pay invoices in whichever coin they have. There is no Infernet native token.

How does a provider get paid?

Per completed job, settled to the payout address for the coin the client paid in. Payouts batch into provider_payouts rows; you can inspect via infernet payments. There's no platform spread above market gateway fees — the protocol is the matchmaker, not a rent extractor.

Deployment

What's the fastest way to spin up a node?

Mint a 24h deploy bearer at /deploy and paste the one-liner into your provider's user-data / cloud-init / container start command. The script auto-detects the platform's volume mount, picks the right engines for your hardware, installs Node + pnpm + mise, clones the source, runs the daemon, and registers with the control plane. No SSH afterwards.

Can I run my own control plane (self-hosted)?

Yes. Clone the repo, run pnpm supabase:start + pnpm supabase:db:reset + pnpm dev, and you've got a local control plane on :3000. Point CLI nodes at it with infernet init --url http://your-host:3000. The same codebase serves the cloud at infernetprotocol.com; nothing is cloud-only.

How do I push a model update remotely without SSH?

Owner-issued remote commands: from the dashboard's "Push model" UI (or via POST /api/v1/user/nodes/<pubkey>/commands), you queue an { command: "model_install", args: { model: "..." } } for your own node. The daemon picks it up on its next outbound poll, runs ollama pull <model>, and reports completion. Auth checks pubkey ownership — only you can issue commands to your nodes.
Didn't find your question? Email hello@infernetprotocol.com or open an issue on GitHub.