Inspired by karpathy/autoresearch, helix generalizes the idea of autonomous AI research loops beyond LLM training. Give an agent a codebase, a metric, and a fixed time budget. It experiments overnight. You wake up to results.
The git history is the research trail. experiments.tsv is the proof. Anyone can clone a helix,
run it on their hardware, and independently verify every result.
| Term | Meaning |
|---|---|
| helix | A git repo containing helix.yaml + program.md + a codebase the agent can modify |
helix.yaml |
Machine-readable spec: what to optimize, how to measure it, which files are editable |
program.md |
Human-written instructions for the agent: domain knowledge, constraints, techniques to try |
experiments.tsv |
Append-only ledger of every experiment: commit, metric, status, description |
helix run |
CLI command that launches an autonomous session on your hardware |
helix is agent-agnostic. Pick a backend or bring your own.
| Backend | Install | Requires |
|---|---|---|
ClaudeBackend (default) |
pip install 'helices[claude]' |
Claude Code CLI |
GeminiBackend |
pip install helices |
Gemini CLI |
| Custom | pip install helices |
Implement the AgentBackend protocol |
# from within a helix directory (one that has helix.yaml)
helix run # start a session tagged with today's date
helix run --tag exp1 # custom tag
helix status # show current best and recent experimentshelix-examples is a curated gallery of standalone helices, each in its own repo and included as a git submodule.
git clone --recurse-submodules git@github.com:VectorInstitute/helix-examples.git
cd helix-examples/inference-opt
uv run prepare.py # one-time: download model + dataset
helix runThe first example, helix-inference-opt,
optimizes inference throughput for a causal language model on WikiText-2. The agent modifies
infer.py (batching, quantization, torch.compile, etc.) and automatically merges improvements
back to main.
The typical starting point is an existing research codebase. helix init drops the helix
layer on top without touching your code.
cd my-research-project # your existing git repo
pip install 'helices[claude]'
helix init . --domain "AI/ML" --description "Optimize X for task Y."helix init is non-destructive: it skips any file that already exists, so running it
against a repo with an existing pyproject.toml or uv.lock is safe.
Then:
- Edit
helix.yaml: setscope.editableto the files the agent may modify, and setevaluate.commandto your evaluation script. - Edit
program.md: describe your codebase, goal, constraints, and techniques to try. - Run
helix run.
If you are starting from scratch:
helix init my-project --domain "AI/ML" --description "Optimize X for task Y."
cd my-project && git init
# add your codebase, fill in helix.yaml and program.md, then:
helix runname: my-helix
domain: AI/ML
description: Optimize X for task Y.
scope:
editable: [train.py]
readonly: [evaluate.py, program.md, helix.yaml]
metrics:
primary:
name: accuracy
optimize: maximize
evaluate:
command: python evaluate.py
timeout_seconds: 120
output_format: pattern
patterns:
primary: '^accuracy:\s+([\d.]+)'