LoRA training data for Gerbil Scheme
  • Python 79.9%
  • Shell 20.1%
Find a file
Jaime Fournier e370dd7bd6 Add v4 fact-contrast DPO training + on-pod publish pipeline
v3 fails basic factual recall (claims Gerbil uses LLVM) across all
quants (Q4_K_M, Q8_0, MLX Q6) — confirms a training problem, not a
quant problem. This adds the v4 stack:

- build_dpo_facts.py + dpo_pairs_facts.jsonl: 62 fact-contrast pairs
  (LLVM/runtime/tooling/origin negation), axolotl chatml format.
- runpod_train.py: cmd_dpo_facts (stacked DPO on v3 BF16),
  cmd_bench (9-variant matrix base/v3/v4 x RAG x logit-bias),
  cmd_publish_v4 (BF16 + Q8/Q4 GGUFs to HF, all on pod).
- eval_on_pod.py: --logit-bias (suppress LLVM/Cranelift/MLIR at gen)
  and --rag (regex-gated fact snippet, KV-cost only on factual Qs).
- eval_holdout.py: F01-F15 fact-recall tests (anti-idioms match
  affirmative wrong claims, not negated mentions).
- facts/01-runtime.md: canonical runtime/compilation prose.
- pod_publish_v4.sh: BF16-first upload, then F16->Q8/Q4 quantize.
- pod_gguf_pipeline.sh, upload_hf_gguf.sh, merge_dpo_fix.py,
  README_hf.md, eval_pod_{base,v3}.json: prior v3-era artifacts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 12:43:33 -06:00
facts Add v4 fact-contrast DPO training + on-pod publish pipeline 2026-05-18 12:43:33 -06:00
.gitignore Rewrite training pipeline for Qwen3-Coder-30B 3-stage Gerbil v3 2026-05-14 13:55:12 -06:00
axolotl_gerbil_cpt.yaml Rewrite training pipeline for Qwen3-Coder-30B 3-stage Gerbil v3 2026-05-14 13:55:12 -06:00
axolotl_gerbil_dpo.yaml Rewrite training pipeline for Qwen3-Coder-30B 3-stage Gerbil v3 2026-05-14 13:55:12 -06:00
axolotl_gerbil_sft.yaml Rewrite training pipeline for Qwen3-Coder-30B 3-stage Gerbil v3 2026-05-14 13:55:12 -06:00
build_cpt_corpus.py Rewrite training pipeline for Qwen3-Coder-30B 3-stage Gerbil v3 2026-05-14 13:55:12 -06:00
build_dpo_facts.py Add v4 fact-contrast DPO training + on-pod publish pipeline 2026-05-18 12:43:33 -06:00
build_dpo_pairs.py Add docs + configure_opencode.sh; gxi-validate DPO pairs 2026-05-14 14:54:58 -06:00
build_ollama_gguf.sh Add docs + configure_opencode.sh; gxi-validate DPO pairs 2026-05-14 14:54:58 -06:00
build_sft.py Rewrite training pipeline for Qwen3-Coder-30B 3-stage Gerbil v3 2026-05-14 13:55:12 -06:00
configure_opencode.sh Add docs + configure_opencode.sh; gxi-validate DPO pairs 2026-05-14 14:54:58 -06:00
convert_to_mlx.sh Add docs + configure_opencode.sh; gxi-validate DPO pairs 2026-05-14 14:54:58 -06:00
cpt_corpus_v3.jsonl Rewrite training pipeline for Qwen3-Coder-30B 3-stage Gerbil v3 2026-05-14 13:55:12 -06:00
dpo_pairs_facts.jsonl Add v4 fact-contrast DPO training + on-pod publish pipeline 2026-05-18 12:43:33 -06:00
dpo_pairs_v3.jsonl Add docs + configure_opencode.sh; gxi-validate DPO pairs 2026-05-14 14:54:58 -06:00
eval_holdout.py Add v4 fact-contrast DPO training + on-pod publish pipeline 2026-05-18 12:43:33 -06:00
eval_on_pod.py Add v4 fact-contrast DPO training + on-pod publish pipeline 2026-05-18 12:43:33 -06:00
eval_pod_base.json Add v4 fact-contrast DPO training + on-pod publish pipeline 2026-05-18 12:43:33 -06:00
eval_pod_v3.json Add v4 fact-contrast DPO training + on-pod publish pipeline 2026-05-18 12:43:33 -06:00
merge_dpo_fix.py Add v4 fact-contrast DPO training + on-pod publish pipeline 2026-05-18 12:43:33 -06:00
pod_gguf_pipeline.sh Add v4 fact-contrast DPO training + on-pod publish pipeline 2026-05-18 12:43:33 -06:00
pod_publish_v4.sh Add v4 fact-contrast DPO training + on-pod publish pipeline 2026-05-18 12:43:33 -06:00
push_ollama.sh Add docs + configure_opencode.sh; gxi-validate DPO pairs 2026-05-14 14:54:58 -06:00
README.md Add docs + configure_opencode.sh; gxi-validate DPO pairs 2026-05-14 14:54:58 -06:00
README_hf.md Add v4 fact-contrast DPO training + on-pod publish pipeline 2026-05-18 12:43:33 -06:00
runpod_train.py Add v4 fact-contrast DPO training + on-pod publish pipeline 2026-05-18 12:43:33 -06:00
sft_v3_mined.jsonl Rewrite training pipeline for Qwen3-Coder-30B 3-stage Gerbil v3 2026-05-14 13:55:12 -06:00
TRAINING_PIPELINE.md Add docs + configure_opencode.sh; gxi-validate DPO pairs 2026-05-14 14:54:58 -06:00
upload_hf.sh Add docs + configure_opencode.sh; gxi-validate DPO pairs 2026-05-14 14:54:58 -06:00
upload_hf_gguf.sh Add v4 fact-contrast DPO training + on-pod publish pipeline 2026-05-18 12:43:33 -06:00

Gerbil Scheme LoRA

A fine-tune of Qwen3-Coder-30B-A3B-Instruct that knows Gerbil Scheme — a dialect of Scheme built on Gambit, with a rich :std/* standard library, an actor system, FFI, and a pattern-matching prelude. The LoRA teaches the base model Gerbil's module syntax (:std/sort, :std/iter, :std/text/json), the standard library, the actor system, the FFI, and how Gerbil's surface differs from Racket/Clojure/Common Lisp/Python/R7-RS-style SRFI imports that a generic coder model is likely to emit by default.

Quick start

Ollama

ollama pull jaimef/gerbil-qwen3-coder
ollama run jaimef/gerbil-qwen3-coder "Show me how to import :std/text/json and parse a JSON string in Gerbil."

Tags: :latest is q8_0 (~32 GB, full fidelity). :q4_k_m is the smaller quant (~18 GB). Pick :latest if it fits.

MLX (Apple Silicon)

./convert_to_mlx.sh runpod-pipeline-final gerbil-mlx-6bit-v3
python3 -m mlx_lm.generate --model gerbil-mlx-6bit-v3 \
    --prompt "Show me how to import :std/text/json and parse a JSON string in Gerbil." \
    --max-tokens 256
# or as an OpenAI-compatible server:
python3 -m mlx_lm.server --model gerbil-mlx-6bit-v3 --port 8080

HF (6-bit MLX): jaimef21/gerbil-qwen3-coder-30b-mlx-6bit (uploaded by upload_hf.sh).

Pipeline

Three stages on one A100 80GB pod, ~56h wall clock. See TRAINING_PIPELINE.md for the design rationale.

Stage What it learns Data LR / Epochs LoRA
CPT Token distribution of raw Gerbil source cpt_corpus_v3.jsonl (3,758 records / 22.7 MB, mined from gerbil, gambit, and gerbil-mcp) 2.0e-5 / 2 r=32, α=64
SFT Answer Gerbil questions in chat sft_v3_mined.jsonl (2,391 examples / 4.1 MB, from cookbooks + error-fixes + stdlib ;;-doc-comment mining) 1.0e-4 / 2 r=32, α=64
DPO Suppress Racket/Clojure/CL/Python/R7-SRFI surface forms dpo_pairs_v3.jsonl (66 pairs, programmatically generated + gxi-compile-checked) 5.0e-6 / 3 r=32, α=64

Each stage's LoRA is merged into bf16 between stages so the next stage trains on absorbed weights, not stacked adapters. Targets include the fused MoE experts (experts.gate_up_proj, experts.down_proj) in addition to attention (q/k/v/o_proj) and MLP (gate/up/down_proj).

Run the pipeline

# Provision pod + push configs/data + train + pull + tear down
python3 runpod_train.py up
python3 runpod_train.py push
python3 runpod_train.py train
python3 runpod_train.py eval     # base vs trained, on-pod
python3 runpod_train.py pull     # → runpod-pipeline-final/
python3 runpod_train.py down

Pod state lives in .runpod_state.json (gitignored). Each stage is idempotent: re-running train skips stages whose .done marker is OK. Killing the orchestrator does not kill the training — each stage runs in its own tmux session on the pod (stage_cpt, stage_sft, stage_dpo).

Build data

python3 build_cpt_corpus.py   # → cpt_corpus_v3.jsonl
python3 build_sft.py          # → sft_v3_mined.jsonl
python3 build_dpo_pairs.py --validate   # → dpo_pairs_v3.jsonl (gxi-checked)

The builders read from ~/mine/gerbil (source, stdlib, tests, docs), ~/mine/gambit (the Gambit runtime that Gerbil builds on), and ~/mine/gerbil-mcp (cookbooks, error-fixes, resources, features, security rules). --validate on the DPO builder pipes each chosen_response through gxi and fails the build on any syntax or import error.

Convert + publish

# 6-bit MLX bundle for Mac
./convert_to_mlx.sh runpod-pipeline-final gerbil-mlx-6bit-v3

# GGUF for ollama (Q8_0 + Q4_K_M)
./build_ollama_gguf.sh
./push_ollama.sh jaimef           # pushes :latest (q8_0); add --all to also push :q4_k_m

# HuggingFace (MLX 6-bit)
./upload_hf.sh

Evaluation

Two evals; trained must beat base on both before publish.

Eval What it measures Script
Held-out coding Idiom-regex hits / cross-dialect hallucination penalty / code-block presence on 14 questions never seen in training eval_holdout.py (via Ollama) or eval_on_pod.py (transformers, on-pod)
Per-pair similarity Token-Jaccard + character similarity of trained-model output vs chosen vs rejected for every DPO pair eval_on_pod.py --pairs-file dpo_pairs_v3.jsonl

The similarity eval is the one to trust — it's volume-invariant and per-pair, so dataset growth doesn't game it.

Use with OpenCode

./configure_opencode.sh ollama          # local Ollama
./configure_opencode.sh mlx 8080        # local mlx_lm server
./configure_opencode.sh runpod <id>     # RunPod serverless endpoint

Writes ~/.config/opencode/opencode.json while preserving any existing MCP config.

Repo layout

File Purpose
runpod_train.py Pod lifecycle + 3-stage orchestrator (up/push/train/eval/pull/down)
axolotl_gerbil_{cpt,sft,dpo}.yaml Per-stage axolotl configs (LoRA r=32, α=64, MoE expert targets, merge_method: legacy)
build_cpt_corpus.py Walks ~/mine/gerbil, ~/mine/gambit, ~/mine/gerbil-mcp → CPT corpus
build_sft.py Mines cookbooks, error-fixes, resource markdown, and stdlib ;;-doc-comments → SFT chats
build_dpo_pairs.py 15 programmatic rule generators (Racket/Clojure/CL/Python/R7-SRFI → Gerbil), gxi-validated
eval_holdout.py 14-test held-out idiom eval via Ollama HTTP API
eval_on_pod.py Transformers-based on-pod eval (holdout + per-pair similarity)
convert_to_mlx.sh mlx_lm.convert wrapper (6-bit default)
build_ollama_gguf.sh Convert merged model → GGUF Q8_0 + Q4_K_M and register with Ollama
push_ollama.sh Push to ollama.com (:latest = Q8_0; --all also pushes :q4_k_m)
upload_hf.sh Resume-friendly per-shard upload of the MLX 6-bit bundle to HuggingFace
configure_opencode.sh Write ~/.config/opencode/opencode.json (Ollama / MLX / RunPod / both)

Iteration loop

  1. Add recipes to ~/mine/gerbil-mcp/cookbooks.json or error-fixes.json
  2. Rebuild data: build_cpt_corpus.py && build_sft.py && build_dpo_pairs.py --validate
  3. Re-run the pipeline (runpod_train.py up/push/train/eval/pull/down)
  4. Eval gate: trained must beat base on both holdout and similarity
  5. Convert + publish: build_ollama_gguf.sh && push_ollama.sh && upload_hf.sh