nebula-8lang-7b

Fine-tuned Qwen/Qwen2.5-7B for translating Nebula — a universal code intermediate language — into 8 target programming languages: Python, JavaScript, TypeScript, Go, Swift, Kotlin, Rust, and C.

Part of the Nebula 1.0 release. Nebula is a token-efficient canonical form that compresses 16% smaller than source code on average across 8 languages, while round-tripping cleanly back to any of them.

Training

Base model Qwen/Qwen2.5-7B
Method LoRA (SFT)
LoRA rank / alpha 16 / 16
LoRA dropout 0.05
LoRA modules all-linear
Epochs 3
Learning rate 1e-5
Batch size 8
Training data electrocampbell/nebula-8lang-68k (68K pairs)
Trained on Together AI

Evaluation

HumanEval (164 problems, Nebula→Python, Pass@1):

Model Raw With Error Correction
nebula-8lang-1.5b 45.1% 79.3%
nebula-8lang-7b (this model) 67.7% 88.4%
nebula-8lang-14b 57.9% (89.0% on H100) 88.4%

MBPP (500 problems, Nebula→Python, Pass@1): 55.4%

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("electrocampbell/nebula-8lang-7b")
model = AutoModelForCausalLM.from_pretrained("electrocampbell/nebula-8lang-7b")

system = "You are a code translator. Given code in Nebula (a universal intermediate language), produce the equivalent idiomatic Python code. Output only the Python code, no explanations."
nebula_code = '''fn add(a, b): rt a + b'''

messages = [
    {"role": "system", "content": system},
    {"role": "user", "content": nebula_code},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
out = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))

Replace the system prompt's Python with any of: JavaScript, TypeScript, Go, Swift, Kotlin, Rust, C.

Citation

If you use this model, please cite the Nebula project: https://github.com/colinc86/nebula

License

Apache 2.0, inherited from the Qwen 2.5 base model.

Downloads last month
115
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for electrocampbell/nebula-8lang-7b

Base model

Qwen/Qwen2.5-7B
Finetuned
(903)
this model
Quantizations
2 models

Dataset used to train electrocampbell/nebula-8lang-7b