LoRA training data for Gerbil Scheme

Python 79.9%
Shell 20.1%

Find a file

Jaime Fournier e370dd7bd6 Add v4 fact-contrast DPO training + on-pod publish pipeline v3 fails basic factual recall (claims Gerbil uses LLVM) across all quants (Q4_K_M, Q8_0, MLX Q6) — confirms a training problem, not a quant problem. This adds the v4 stack: - build_dpo_facts.py + dpo_pairs_facts.jsonl: 62 fact-contrast pairs (LLVM/runtime/tooling/origin negation), axolotl chatml format. - runpod_train.py: cmd_dpo_facts (stacked DPO on v3 BF16), cmd_bench (9-variant matrix base/v3/v4 x RAG x logit-bias), cmd_publish_v4 (BF16 + Q8/Q4 GGUFs to HF, all on pod). - eval_on_pod.py: --logit-bias (suppress LLVM/Cranelift/MLIR at gen) and --rag (regex-gated fact snippet, KV-cost only on factual Qs). - eval_holdout.py: F01-F15 fact-recall tests (anti-idioms match affirmative wrong claims, not negated mentions). - facts/01-runtime.md: canonical runtime/compilation prose. - pod_publish_v4.sh: BF16-first upload, then F16->Q8/Q4 quantize. - pod_gguf_pipeline.sh, upload_hf_gguf.sh, merge_dpo_fix.py, README_hf.md, eval_pod_{base,v3}.json: prior v3-era artifacts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>		2026-05-18 12:43:33 -06:00
facts	Add v4 fact-contrast DPO training + on-pod publish pipeline	2026-05-18 12:43:33 -06:00
.gitignore	Rewrite training pipeline for Qwen3-Coder-30B 3-stage Gerbil v3	2026-05-14 13:55:12 -06:00
axolotl_gerbil_cpt.yaml	Rewrite training pipeline for Qwen3-Coder-30B 3-stage Gerbil v3	2026-05-14 13:55:12 -06:00
axolotl_gerbil_dpo.yaml	Rewrite training pipeline for Qwen3-Coder-30B 3-stage Gerbil v3	2026-05-14 13:55:12 -06:00
axolotl_gerbil_sft.yaml	Rewrite training pipeline for Qwen3-Coder-30B 3-stage Gerbil v3	2026-05-14 13:55:12 -06:00
build_cpt_corpus.py	Rewrite training pipeline for Qwen3-Coder-30B 3-stage Gerbil v3	2026-05-14 13:55:12 -06:00
build_dpo_facts.py	Add v4 fact-contrast DPO training + on-pod publish pipeline	2026-05-18 12:43:33 -06:00
build_dpo_pairs.py	Add docs + configure_opencode.sh; gxi-validate DPO pairs	2026-05-14 14:54:58 -06:00
build_ollama_gguf.sh	Add docs + configure_opencode.sh; gxi-validate DPO pairs	2026-05-14 14:54:58 -06:00
build_sft.py	Rewrite training pipeline for Qwen3-Coder-30B 3-stage Gerbil v3	2026-05-14 13:55:12 -06:00
configure_opencode.sh	Add docs + configure_opencode.sh; gxi-validate DPO pairs	2026-05-14 14:54:58 -06:00
convert_to_mlx.sh	Add docs + configure_opencode.sh; gxi-validate DPO pairs	2026-05-14 14:54:58 -06:00
cpt_corpus_v3.jsonl	Rewrite training pipeline for Qwen3-Coder-30B 3-stage Gerbil v3	2026-05-14 13:55:12 -06:00
dpo_pairs_facts.jsonl	Add v4 fact-contrast DPO training + on-pod publish pipeline	2026-05-18 12:43:33 -06:00
dpo_pairs_v3.jsonl	Add docs + configure_opencode.sh; gxi-validate DPO pairs	2026-05-14 14:54:58 -06:00
eval_holdout.py	Add v4 fact-contrast DPO training + on-pod publish pipeline	2026-05-18 12:43:33 -06:00
eval_on_pod.py	Add v4 fact-contrast DPO training + on-pod publish pipeline	2026-05-18 12:43:33 -06:00
eval_pod_base.json	Add v4 fact-contrast DPO training + on-pod publish pipeline	2026-05-18 12:43:33 -06:00
eval_pod_v3.json	Add v4 fact-contrast DPO training + on-pod publish pipeline	2026-05-18 12:43:33 -06:00
merge_dpo_fix.py	Add v4 fact-contrast DPO training + on-pod publish pipeline	2026-05-18 12:43:33 -06:00
pod_gguf_pipeline.sh	Add v4 fact-contrast DPO training + on-pod publish pipeline	2026-05-18 12:43:33 -06:00
pod_publish_v4.sh	Add v4 fact-contrast DPO training + on-pod publish pipeline	2026-05-18 12:43:33 -06:00
push_ollama.sh	Add docs + configure_opencode.sh; gxi-validate DPO pairs	2026-05-14 14:54:58 -06:00
README.md	Add docs + configure_opencode.sh; gxi-validate DPO pairs	2026-05-14 14:54:58 -06:00
README_hf.md	Add v4 fact-contrast DPO training + on-pod publish pipeline	2026-05-18 12:43:33 -06:00
runpod_train.py	Add v4 fact-contrast DPO training + on-pod publish pipeline	2026-05-18 12:43:33 -06:00
sft_v3_mined.jsonl	Rewrite training pipeline for Qwen3-Coder-30B 3-stage Gerbil v3	2026-05-14 13:55:12 -06:00
TRAINING_PIPELINE.md	Add docs + configure_opencode.sh; gxi-validate DPO pairs	2026-05-14 14:54:58 -06:00
upload_hf.sh	Add docs + configure_opencode.sh; gxi-validate DPO pairs	2026-05-14 14:54:58 -06:00
upload_hf_gguf.sh	Add v4 fact-contrast DPO training + on-pod publish pipeline	2026-05-18 12:43:33 -06:00

README.md

Gerbil Scheme LoRA

A fine-tune of Qwen3-Coder-30B-A3B-Instruct that knows Gerbil Scheme — a dialect of Scheme built on Gambit, with a rich :std/* standard library, an actor system, FFI, and a pattern-matching prelude. The LoRA teaches the base model Gerbil's module syntax (:std/sort, :std/iter, :std/text/json), the standard library, the actor system, the FFI, and how Gerbil's surface differs from Racket/Clojure/Common Lisp/Python/R7-RS-style SRFI imports that a generic coder model is likely to emit by default.

Quick start

Ollama

ollama pull jaimef/gerbil-qwen3-coder
ollama run jaimef/gerbil-qwen3-coder "Show me how to import :std/text/json and parse a JSON string in Gerbil."

Tags: :latest is q8_0 (~32 GB, full fidelity). :q4_k_m is the smaller quant (~18 GB). Pick :latest if it fits.

MLX (Apple Silicon)

./convert_to_mlx.sh runpod-pipeline-final gerbil-mlx-6bit-v3
python3 -m mlx_lm.generate --model gerbil-mlx-6bit-v3 \
    --prompt "Show me how to import :std/text/json and parse a JSON string in Gerbil." \
    --max-tokens 256
# or as an OpenAI-compatible server:
python3 -m mlx_lm.server --model gerbil-mlx-6bit-v3 --port 8080

HF (6-bit MLX): jaimef21/gerbil-qwen3-coder-30b-mlx-6bit (uploaded by upload_hf.sh).

Pipeline

Three stages on one A100 80GB pod, ~5–6h wall clock. See TRAINING_PIPELINE.md for the design rationale.

Stage	What it learns	Data	LR / Epochs	LoRA
CPT	Token distribution of raw Gerbil source	`cpt_corpus_v3.jsonl` (3,758 records / 22.7 MB, mined from `gerbil`, `gambit`, and `gerbil-mcp`)	2.0e-5 / 2	r=32, α=64
SFT	Answer Gerbil questions in chat	`sft_v3_mined.jsonl` (2,391 examples / 4.1 MB, from cookbooks + error-fixes + stdlib `;;`-doc-comment mining)	1.0e-4 / 2	r=32, α=64
DPO	Suppress Racket/Clojure/CL/Python/R7-SRFI surface forms	`dpo_pairs_v3.jsonl` (66 pairs, programmatically generated + gxi-compile-checked)	5.0e-6 / 3	r=32, α=64

Each stage's LoRA is merged into bf16 between stages so the next stage trains on absorbed weights, not stacked adapters. Targets include the fused MoE experts (experts.gate_up_proj, experts.down_proj) in addition to attention (q/k/v/o_proj) and MLP (gate/up/down_proj).

Run the pipeline

# Provision pod + push configs/data + train + pull + tear down
python3 runpod_train.py up
python3 runpod_train.py push
python3 runpod_train.py train
python3 runpod_train.py eval     # base vs trained, on-pod
python3 runpod_train.py pull     # → runpod-pipeline-final/
python3 runpod_train.py down

Pod state lives in .runpod_state.json (gitignored). Each stage is idempotent: re-running train skips stages whose .done marker is OK. Killing the orchestrator does not kill the training — each stage runs in its own tmux session on the pod (stage_cpt, stage_sft, stage_dpo).

Build data

python3 build_cpt_corpus.py   # → cpt_corpus_v3.jsonl
python3 build_sft.py          # → sft_v3_mined.jsonl
python3 build_dpo_pairs.py --validate   # → dpo_pairs_v3.jsonl (gxi-checked)

The builders read from ~/mine/gerbil (source, stdlib, tests, docs), ~/mine/gambit (the Gambit runtime that Gerbil builds on), and ~/mine/gerbil-mcp (cookbooks, error-fixes, resources, features, security rules). --validate on the DPO builder pipes each chosen_response through gxi and fails the build on any syntax or import error.

Convert + publish

# 6-bit MLX bundle for Mac
./convert_to_mlx.sh runpod-pipeline-final gerbil-mlx-6bit-v3

# GGUF for ollama (Q8_0 + Q4_K_M)
./build_ollama_gguf.sh
./push_ollama.sh jaimef           # pushes :latest (q8_0); add --all to also push :q4_k_m

# HuggingFace (MLX 6-bit)
./upload_hf.sh

Evaluation

Two evals; trained must beat base on both before publish.

Eval	What it measures	Script
Held-out coding	Idiom-regex hits / cross-dialect hallucination penalty / code-block presence on 14 questions never seen in training	`eval_holdout.py` (via Ollama) or `eval_on_pod.py` (transformers, on-pod)
Per-pair similarity	Token-Jaccard + character similarity of trained-model output vs `chosen` vs `rejected` for every DPO pair	`eval_on_pod.py --pairs-file dpo_pairs_v3.jsonl`

The similarity eval is the one to trust — it's volume-invariant and per-pair, so dataset growth doesn't game it.

Use with OpenCode

./configure_opencode.sh ollama          # local Ollama
./configure_opencode.sh mlx 8080        # local mlx_lm server
./configure_opencode.sh runpod <id>     # RunPod serverless endpoint

Writes ~/.config/opencode/opencode.json while preserving any existing MCP config.

Repo layout

File	Purpose
`runpod_train.py`	Pod lifecycle + 3-stage orchestrator (`up`/`push`/`train`/`eval`/`pull`/`down`)
`axolotl_gerbil_{cpt,sft,dpo}.yaml`	Per-stage axolotl configs (LoRA r=32, α=64, MoE expert targets, `merge_method: legacy`)
`build_cpt_corpus.py`	Walks `~/mine/gerbil`, `~/mine/gambit`, `~/mine/gerbil-mcp` → CPT corpus
`build_sft.py`	Mines cookbooks, error-fixes, resource markdown, and stdlib `;;`-doc-comments → SFT chats
`build_dpo_pairs.py`	15 programmatic rule generators (Racket/Clojure/CL/Python/R7-SRFI → Gerbil), gxi-validated
`eval_holdout.py`	14-test held-out idiom eval via Ollama HTTP API
`eval_on_pod.py`	Transformers-based on-pod eval (holdout + per-pair similarity)
`convert_to_mlx.sh`	mlx_lm.convert wrapper (6-bit default)
`build_ollama_gguf.sh`	Convert merged model → GGUF Q8_0 + Q4_K_M and register with Ollama
`push_ollama.sh`	Push to ollama.com (`:latest` = Q8_0; `--all` also pushes `:q4_k_m`)
`upload_hf.sh`	Resume-friendly per-shard upload of the MLX 6-bit bundle to HuggingFace
`configure_opencode.sh`	Write `~/.config/opencode/opencode.json` (Ollama / MLX / RunPod / both)

Iteration loop

Add recipes to ~/mine/gerbil-mcp/cookbooks.json or error-fixes.json
Rebuild data: build_cpt_corpus.py && build_sft.py && build_dpo_pairs.py --validate
Re-run the pipeline (runpod_train.py up/push/train/eval/pull/down)
Eval gate: trained must beat base on both holdout and similarity
Convert + publish: build_ollama_gguf.sh && push_ollama.sh && upload_hf.sh

README.md Unescape Escape