- Python 79.9%
- Shell 20.1%
v3 fails basic factual recall (claims Gerbil uses LLVM) across all
quants (Q4_K_M, Q8_0, MLX Q6) — confirms a training problem, not a
quant problem. This adds the v4 stack:
- build_dpo_facts.py + dpo_pairs_facts.jsonl: 62 fact-contrast pairs
(LLVM/runtime/tooling/origin negation), axolotl chatml format.
- runpod_train.py: cmd_dpo_facts (stacked DPO on v3 BF16),
cmd_bench (9-variant matrix base/v3/v4 x RAG x logit-bias),
cmd_publish_v4 (BF16 + Q8/Q4 GGUFs to HF, all on pod).
- eval_on_pod.py: --logit-bias (suppress LLVM/Cranelift/MLIR at gen)
and --rag (regex-gated fact snippet, KV-cost only on factual Qs).
- eval_holdout.py: F01-F15 fact-recall tests (anti-idioms match
affirmative wrong claims, not negated mentions).
- facts/01-runtime.md: canonical runtime/compilation prose.
- pod_publish_v4.sh: BF16-first upload, then F16->Q8/Q4 quantize.
- pod_gguf_pipeline.sh, upload_hf_gguf.sh, merge_dpo_fix.py,
README_hf.md, eval_pod_{base,v3}.json: prior v3-era artifacts.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|---|---|---|
| facts | ||
| .gitignore | ||
| axolotl_gerbil_cpt.yaml | ||
| axolotl_gerbil_dpo.yaml | ||
| axolotl_gerbil_sft.yaml | ||
| build_cpt_corpus.py | ||
| build_dpo_facts.py | ||
| build_dpo_pairs.py | ||
| build_ollama_gguf.sh | ||
| build_sft.py | ||
| configure_opencode.sh | ||
| convert_to_mlx.sh | ||
| cpt_corpus_v3.jsonl | ||
| dpo_pairs_facts.jsonl | ||
| dpo_pairs_v3.jsonl | ||
| eval_holdout.py | ||
| eval_on_pod.py | ||
| eval_pod_base.json | ||
| eval_pod_v3.json | ||
| merge_dpo_fix.py | ||
| pod_gguf_pipeline.sh | ||
| pod_publish_v4.sh | ||
| push_ollama.sh | ||
| README.md | ||
| README_hf.md | ||
| runpod_train.py | ||
| sft_v3_mined.jsonl | ||
| TRAINING_PIPELINE.md | ||
| upload_hf.sh | ||
| upload_hf_gguf.sh | ||
Gerbil Scheme LoRA
A fine-tune of Qwen3-Coder-30B-A3B-Instruct that knows Gerbil Scheme — a dialect of Scheme built on Gambit, with a rich :std/* standard library, an actor system, FFI, and a pattern-matching prelude. The LoRA teaches the base model Gerbil's module syntax (:std/sort, :std/iter, :std/text/json), the standard library, the actor system, the FFI, and how Gerbil's surface differs from Racket/Clojure/Common Lisp/Python/R7-RS-style SRFI imports that a generic coder model is likely to emit by default.
Quick start
Ollama
ollama pull jaimef/gerbil-qwen3-coder
ollama run jaimef/gerbil-qwen3-coder "Show me how to import :std/text/json and parse a JSON string in Gerbil."
Tags: :latest is q8_0 (~32 GB, full fidelity). :q4_k_m is the smaller quant (~18 GB). Pick :latest if it fits.
MLX (Apple Silicon)
./convert_to_mlx.sh runpod-pipeline-final gerbil-mlx-6bit-v3
python3 -m mlx_lm.generate --model gerbil-mlx-6bit-v3 \
--prompt "Show me how to import :std/text/json and parse a JSON string in Gerbil." \
--max-tokens 256
# or as an OpenAI-compatible server:
python3 -m mlx_lm.server --model gerbil-mlx-6bit-v3 --port 8080
HF (6-bit MLX): jaimef21/gerbil-qwen3-coder-30b-mlx-6bit (uploaded by upload_hf.sh).
Pipeline
Three stages on one A100 80GB pod, ~5–6h wall clock. See TRAINING_PIPELINE.md for the design rationale.
| Stage | What it learns | Data | LR / Epochs | LoRA |
|---|---|---|---|---|
| CPT | Token distribution of raw Gerbil source | cpt_corpus_v3.jsonl (3,758 records / 22.7 MB, mined from gerbil, gambit, and gerbil-mcp) |
2.0e-5 / 2 | r=32, α=64 |
| SFT | Answer Gerbil questions in chat | sft_v3_mined.jsonl (2,391 examples / 4.1 MB, from cookbooks + error-fixes + stdlib ;;-doc-comment mining) |
1.0e-4 / 2 | r=32, α=64 |
| DPO | Suppress Racket/Clojure/CL/Python/R7-SRFI surface forms | dpo_pairs_v3.jsonl (66 pairs, programmatically generated + gxi-compile-checked) |
5.0e-6 / 3 | r=32, α=64 |
Each stage's LoRA is merged into bf16 between stages so the next stage trains on absorbed weights, not stacked adapters. Targets include the fused MoE experts (experts.gate_up_proj, experts.down_proj) in addition to attention (q/k/v/o_proj) and MLP (gate/up/down_proj).
Run the pipeline
# Provision pod + push configs/data + train + pull + tear down
python3 runpod_train.py up
python3 runpod_train.py push
python3 runpod_train.py train
python3 runpod_train.py eval # base vs trained, on-pod
python3 runpod_train.py pull # → runpod-pipeline-final/
python3 runpod_train.py down
Pod state lives in .runpod_state.json (gitignored). Each stage is idempotent: re-running train skips stages whose .done marker is OK. Killing the orchestrator does not kill the training — each stage runs in its own tmux session on the pod (stage_cpt, stage_sft, stage_dpo).
Build data
python3 build_cpt_corpus.py # → cpt_corpus_v3.jsonl
python3 build_sft.py # → sft_v3_mined.jsonl
python3 build_dpo_pairs.py --validate # → dpo_pairs_v3.jsonl (gxi-checked)
The builders read from ~/mine/gerbil (source, stdlib, tests, docs), ~/mine/gambit (the Gambit runtime that Gerbil builds on), and ~/mine/gerbil-mcp (cookbooks, error-fixes, resources, features, security rules). --validate on the DPO builder pipes each chosen_response through gxi and fails the build on any syntax or import error.
Convert + publish
# 6-bit MLX bundle for Mac
./convert_to_mlx.sh runpod-pipeline-final gerbil-mlx-6bit-v3
# GGUF for ollama (Q8_0 + Q4_K_M)
./build_ollama_gguf.sh
./push_ollama.sh jaimef # pushes :latest (q8_0); add --all to also push :q4_k_m
# HuggingFace (MLX 6-bit)
./upload_hf.sh
Evaluation
Two evals; trained must beat base on both before publish.
| Eval | What it measures | Script |
|---|---|---|
| Held-out coding | Idiom-regex hits / cross-dialect hallucination penalty / code-block presence on 14 questions never seen in training | eval_holdout.py (via Ollama) or eval_on_pod.py (transformers, on-pod) |
| Per-pair similarity | Token-Jaccard + character similarity of trained-model output vs chosen vs rejected for every DPO pair |
eval_on_pod.py --pairs-file dpo_pairs_v3.jsonl |
The similarity eval is the one to trust — it's volume-invariant and per-pair, so dataset growth doesn't game it.
Use with OpenCode
./configure_opencode.sh ollama # local Ollama
./configure_opencode.sh mlx 8080 # local mlx_lm server
./configure_opencode.sh runpod <id> # RunPod serverless endpoint
Writes ~/.config/opencode/opencode.json while preserving any existing MCP config.
Repo layout
| File | Purpose |
|---|---|
runpod_train.py |
Pod lifecycle + 3-stage orchestrator (up/push/train/eval/pull/down) |
axolotl_gerbil_{cpt,sft,dpo}.yaml |
Per-stage axolotl configs (LoRA r=32, α=64, MoE expert targets, merge_method: legacy) |
build_cpt_corpus.py |
Walks ~/mine/gerbil, ~/mine/gambit, ~/mine/gerbil-mcp → CPT corpus |
build_sft.py |
Mines cookbooks, error-fixes, resource markdown, and stdlib ;;-doc-comments → SFT chats |
build_dpo_pairs.py |
15 programmatic rule generators (Racket/Clojure/CL/Python/R7-SRFI → Gerbil), gxi-validated |
eval_holdout.py |
14-test held-out idiom eval via Ollama HTTP API |
eval_on_pod.py |
Transformers-based on-pod eval (holdout + per-pair similarity) |
convert_to_mlx.sh |
mlx_lm.convert wrapper (6-bit default) |
build_ollama_gguf.sh |
Convert merged model → GGUF Q8_0 + Q4_K_M and register with Ollama |
push_ollama.sh |
Push to ollama.com (:latest = Q8_0; --all also pushes :q4_k_m) |
upload_hf.sh |
Resume-friendly per-shard upload of the MLX 6-bit bundle to HuggingFace |
configure_opencode.sh |
Write ~/.config/opencode/opencode.json (Ollama / MLX / RunPod / both) |
Iteration loop
- Add recipes to
~/mine/gerbil-mcp/cookbooks.jsonorerror-fixes.json - Rebuild data:
build_cpt_corpus.py && build_sft.py && build_dpo_pairs.py --validate - Re-run the pipeline (
runpod_train.py up/push/train/eval/pull/down) - Eval gate: trained must beat base on both holdout and similarity
- Convert + publish:
build_ollama_gguf.sh && push_ollama.sh && upload_hf.sh