Infernet Protocol supports five inference backends. The daemon auto-detects which one to use, but understanding your options lets you pick the right tool for your hardware and workload.
At startup, the daemon probes each backend in priority order by sending a health check request. The first backend that responds is used.
Default probe order: 1. Ollama
(localhost:11434/api/tags) 2. vLLM
(localhost:8000/health) 3. SGLang
(localhost:30000/health) 4. Modular MAX
(localhost:8080/health) 5. llama.cpp / llama-swap
(localhost:8080/health)
Override the selection with an env var:
export INFERNET_BACKEND=vllm
infernet startOr in your config:
{
"backend": "vllm"
}All backends speak to the daemon through a common adapter interface:
POST /generate — single inference requestPOST /generate/stream — streaming inference
requestGET /models — list loaded modelsGET /health — health checkThe daemon translates between this interface and each backend’s native API. You don’t need to know the backend’s native API unless you’re doing advanced configuration.