Update README.md

f3fcb0a verified 9 months ago

5.05 kB

	---
	language:
	- en
	base_model: Qwen/Qwen3-8B
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- axolotl
	- reasoning
	- math
	- commonsense
	- primeintellect
	license: apache-2.0
	datasets:
	- NousResearch/Hermes-3-Dataset
	- QuixiAI/dolphin
	model-index:
	- name: Delphermes-8B
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag
	type: hellaswag
	metrics:
	- type: accuracy
	value: 0.88
	name: Accuracy
	- task:
	type: text-generation
	name: Mathematical Reasoning
	dataset:
	name: GSM8K
	type: gsm8k
	metrics:
	- type: accuracy
	value: 0.89
	name: Accuracy
	- task:
	type: text-generation
	name: Theory of Mind
	dataset:
	name: TheoryPlay
	type: theoryplay
	metrics:
	- type: accuracy
	value: 0.8
	name: Accuracy
	---

	# Delphermes-8B

	This is a merged LoRA model based on Qwen/Qwen3-8B, SFT on Hermes3 + Dolphin Dataset. The model demonstrates strong performance across reasoning, mathematical problem-solving, and commonsense understanding tasks.

	## Model Details

	- Base Model: Qwen/Qwen3-8B
	- Language: English (en)
	- Library: transformers
	- Training Method: LoRA fine-tuning with Axolotl
	- Infrastructure: 8xB200 Cluster from PrimeIntellect
	- Training Framework: DeepSpeed Zero2

	## Performance

	\| Benchmark \| Score \| Description \|
	\|-----------\|-------\|-------------\|
	\| HellaSwag \| 88% \| Commonsense reasoning and natural language inference \|
	\| GSM8K \| 89% \| Grade school math word problems \|
	\| TheoryPlay \| 80% \| Theory of mind and social reasoning tasks \|


	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_name = "justinj92/Delphermes-8B"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.float16,
	device_map="auto"
	)

	# Example usage for reasoning tasks
	text = "Sarah believes that her keys are in her purse, but they are actually on the kitchen table. Where will Sarah look for her keys?"
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model.generate(
	**inputs,
	max_length=200,
	temperature=0.1,
	do_sample=True,
	pad_token_id=tokenizer.eos_token_id
	)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	### Chat Format

	This model supports the Hermes chat format:

	```python
	def format_chat(messages):
	formatted = ""
	for message in messages:
	role = message["role"]
	content = message["content"]
	if role == "system":
	formatted += f"<\|im_start\|>system\n{content}<\|im_end\|>\n"
	elif role == "user":
	formatted += f"<\|im_start\|>user\n{content}<\|im_end\|>\n"
	elif role == "assistant":
	formatted += f"<\|im_start\|>assistant\n{content}<\|im_end\|>\n"
	formatted += "<\|im_start\|>assistant\n"
	return formatted

	messages = [
	{"role": "system", "content": "You are a helpful assistant."},
	{"role": "user", "content": "Solve this math problem: A store has 45 apples. If they sell 1/3 of them in the morning and 1/5 of the remaining apples in the afternoon, how many apples are left?"}
	]

	prompt = format_chat(messages)
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_length=300, temperature=0.1)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	## Training Details

	- Training Framework: Axolotl with DeepSpeed Zero2 optimization
	- Hardware: 8x NVIDIA B200 GPUs (PrimeIntellect cluster)
	- Base Model: Qwen/Qwen3-8B
	- Training Method: Low-Rank Adaptation (LoRA)
	- Dataset: NousResearch/Hermes-3-Dataset + QuixiAI/dolphin
	- Training Duration: 28 hours
	- Learning Rate: 0.0004
	- Batch Size: 8
	- Sequence Length: 4096

	## Evaluation Methodology

	All evaluations were conducted using:
	- HellaSwag: Standard validation set with 4-way multiple choice accuracy
	- GSM8K: Test set with exact match accuracy on final numerical answers
	- TheoryPlay: Validation set with accuracy on theory of mind reasoning tasks

	## Limitations

	- The model may still struggle with very complex mathematical proofs
	- Performance on non-English languages may be limited
	- May occasionally generate inconsistent responses in edge cases
	- Training data cutoff affects knowledge of recent events

	## Ethical Considerations

	This model has been trained on curated datasets and should be used responsibly. Users should:
	- Verify important information from the model
	- Be aware of potential biases in training data
	- Use appropriate content filtering for production applications

	## Citation

	```bibtex
	@misc{Delphermes-8B,
	title={Delphermes-8B: A Fine-tuned Language Model for Reasoning Tasks},
	author={[Your Name]},
	year={2025},
	url={https://huggingface.co/justinj92/Delphermes-8B}
	}
	```

	## License

	This model is released under the Apache 2.0 license.