Walk through training a custom model from a search query to a
published artifact on both HuggingFace and Ollama. Weβll mirror the ollama.com/rockypod/svelte-coder
shape β fine-tune a coder on Svelte docs, ship as a one-line
ollama pull.
ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
β infernet train β β β infernet train β β β infernet train β β β infernet publishβ
β data β β init β β run β β β
β (crawl) β β (config) β β (fine-tune) β β (HF + Ollama) β
ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
web β JSONL JSONL β train.yml β trained model β ollama.com URL
infernet train data \
--query "svelte 5 framework documentation" \
--domains svelte.dev,kit.svelte.dev,github.com \
--num 30 \
--out ./data/svelte5.jsonlThis hits ValueSerp for the top 30 Google results, optionally filtered to a domain whitelist, then fetches each URL, extracts the readable text, and chunks it into ChatML-format JSONL:
{"messages":[{"role":"system","content":"You are an expert..."},
{"role":"user","content":"svelte 5 framework documentation β section 1"},
{"role":"assistant","content":"Svelte 5 introduces runes β the primary..."}],
"meta":{"source_url":"https://svelte.dev/docs/svelte/runes","paragraph_index":0}}Set VALUESERP_API_KEY in your env (or under
integrations.valueserp.api_key in
~/.config/infernet/config.json).
The crawler will skip paragraphs shorter than
--min-chars (default 200) and truncate paragraphs longer
than --max-chars (default 4000) so each training sample
fits comfortably in a 4K context.
infernet train init --output ./runEdit the generated run/infernet.train.yml:
name: svelte5-coder
base_model: qwen2.5-coder:7b # 7B coder is plenty for a domain-specific tune
method: qlora # 4-bit LoRA; trains on a single 24GB GPU
runtime: unsloth # fastest on consumer hardware
input:
dataset: ./data/svelte5.jsonl # what we just generated
format: chatml
validation_split: 0.1
training:
epochs: 3
learning_rate: 2.0e-4
batch_size: 4
gradient_accumulation_steps: 4
max_seq_len: 4096
lora:
rank: 16
alpha: 32
target_modules: [q_proj, v_proj, k_proj, o_proj]
resources:
min_vram_gb: 24
max_runtime_hours: 4# Local training on this box (single GPU)
infernet train run --local --config ./run/infernet.train.yml
# Or submit to the P2P network (when workload_class: C1+ is set)
infernet train run --config ./run/infernet.train.ymlLocal mode shells out to the runtime: you specified
(Unsloth, TRL, Axolotl, or the bundled microgpt for tiny demos) and
writes:
run/
βββ infernet.train.yml (frozen copy of the config used)
βββ metrics.jsonl (one line per logging step)
βββ README.md (auto-generated β what was trained, when)
βββ checkpoint-final/ (HF-shape directory: config.json + safetensors)
infernet publish ./run/checkpoint-final \
--hf InfernetProtocol/svelte5-coder \
--ollama infernet/svelte5-coder \
--quant q4_k_mWhat this does:
huggingface.co/InfernetProtocol/svelte5-coder.
Needs HUGGINGFACE_TOKEN with write scope on the org.convert_hf_to_gguf.py from your local
llama.cpp checkout (~/llama.cpp by default;
override with --llama-cpp-path). Emits
model.f16.gguf then quantizes to
model.q4_k_m.gguf.Modelfile with the ChatML template and sane defaults
(temperature=0.7, top_p=0.9).ollama create +
ollama push so it lands at ollama.com/infernet/svelte5-coder.
Run ollama signin once before the first publish.After this, anyone with Ollama can pull your model:
ollama pull infernet/svelte5-coder
ollama run infernet/svelte5-coder "How do runes work in Svelte 5?"infernet train data --query β¦ and stop. The JSONL is
portable to any HF/Unsloth/Axolotl/TRL pipeline.infernet publish <dir> --modelfile-only to generate
the Modelfile + GGUF locally without pushing anywhere.--skip-ollama. Or
Ollama only: --skip-hf.Beyond running on your own GPU, you can post a training job to the open network: any opted-in operator anywhere claims shards and trains for pay. Your own daemon hosts the dataset directly β no S3, no HF dataset, no IPFS, no third-party storage.
infernet train run --open-market \
--config ./run/infernet.train.yml \
--budget 5.00 \
--max-nodes 8The wire shape:
submitter (you) control plane operators (anywhere)
~/.infernet/training-runs/<id>/ β β
shards/shard-0.jsonl βββββββ GET /v1/training/shards/<id>/shard-0.jsonl ββββ
adapters/ βββββββ PUT /v1/training/adapters/<id>/0?token=t ββββ
manifest.json (upload_token) β β
β β
POST /api/v1/training/jobs ββββββΆ β β
{ dataset_base_url: β β
<your-daemon-url>/v1/..., β β
upload_base_url: β β
<your-daemon-url>/...?token=t } β β
β β
β POST /shards/available βββββ
β POST /shards/<id>/claim βββββ
β POST /shards/<id>/report ββββ
Operators opt in by setting INFERNET_ACCEPT_TRAINING=1
(or engine.acceptTraining: true in their config). Their
daemon polls the market every 60 seconds; on a successful claim it runs
the same Unsloth runner used by --local, then PUTs the LoRA
adapter back to your daemon. Adapters land in
~/.infernet/training-runs/<run_id>/adapters/.
When all shards report, FedAvg the adapters:
python3 ./run/_fedavg.py --out ./run/checkpoint-final \
~/.infernet/training-runs/<run_id>/adapters/*Then publish (Track 4 above). End-to-end, youβve trained a model using GPUs from operators across the world without spinning up any cloud infrastructure or paying for storage.
If your daemonβs port 8080 isnβt reachable from the public internet, expose it via cloudflared:
cloudflared tunnel --url http://localhost:8080 # prints a public URL
export INFERNET_DAEMON_ENDPOINT=https://<the-cloudflared-url>
infernet train run --open-market ...The control plane only sees URLs β never your dataset bytes.
VALUESERP_API_KEY for crawling (free tier covers
experiments)HUGGINGFACE_TOKEN with write scope on your target
orgollama signin already donellama.cpp cloned + built at
$HOME/llama.cpp for the GGUF convertinfernet model info hf:Qwen/Qwen2.5-Coder-7B-Instruct for
fit estimates