--- tags: - text-generation - causal-lm - instruction-tuning - chat - rag - code-generation - summarization - extraction - synthetic-data - generated_from_trainer license: other pipeline_tag: text-generation library_name: transformers language: - en base_model: - allenai/OLMo-2-0425-1B-Instruct - allenai/OLMo-3-7B-Instruct - allenai/OLMo-3.1-32B-Instruct --- # Bolt Instruct Models Bolt Instruct is a family of **instruction-tuned language models designed for high-quality generation, reasoning, and enterprise workflows**. These models are **fine-tuned from Allen Institute for AI OLMo instruct models** and optimized for: - General conversational AI - Structured and controllable generation - Retrieval-Augmented Generation (RAG) - Enterprise document understanding - Code generation and transformation --- # Model Overview Bolt Instruct models provide **strong instruction-following capabilities** across diverse tasks with robust long-context support. Key design goals: - Strong instruction adherence - High-quality structured outputs (JSON, extraction) - RAG-grounded responses - Long-context support (65k tokens for 7B and 32B) - Balanced chat, reasoning, and coding performance --- # Model Variants | Model | Base Model | Positioning | |------|------------|------------| | bolt-instruct-1b | allenai/OLMo-2-0425-1B-Instruct | Lightweight / low-latency | | bolt-instruct-7b | allenai/OLMo-3-7B-Instruct | Balanced | | bolt-instruct-32b | allenai/OLMo-3.1-32B-Instruct | Highest quality | --- # Model Details - **Type:** Causal LM (instruction-tuned) - **Max context:** 65,536 tokens (7B and 32B), 4,096 tokens (1B) - **Training context:** 32k (7B), 16k (32B), 4k (1B) ### Capabilities - Chat / multi-turn dialogue - Instruction following - Structured output (JSON) - Summarization & transformation - Extraction - RAG generation - Code generation --- # Training - **Method:** Supervised Fine-Tuning (SFT) - **Dataset size:** ~125k conversations - **Eval set:** ~10k examples - **Data mix:** public + synthetic + internal tasks ### Training Approach - 1B → full fine-tune - 7B / 32B → QLoRA (4-bit) ### Hardware - 1× A100 80GB GPU --- # Intended Use - Chat assistants - Enterprise copilots - RAG pipelines - Document processing - Structured extraction - Code assistance --- # Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "aisquared/bolt-instruct-7b" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) ``` --- # Evaluation To evaluate these models, we ran a subset of tasks using the [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness). Below are the metrics for each model. ## Language Model Evaluation Harness ### Evaluation results for aisquared/bolt-instruct-1b: | Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr| |----------------------------------------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:| |arc_challenge | 1|none | 0|acc |↑ |0.3490|± |0.0139| | | |none | 0|acc_norm |↑ |0.3823|± |0.0142| |arc_easy | 1|none | 0|acc |↑ |0.6098|± |0.0100| | | |none | 0|acc_norm |↑ |0.5560|± |0.0102| |bbh | 3|get-answer | |exact_match|↑ |0.3081|± |0.0052| | - bbh_cot_fewshot_boolean_expressions | 4|get-answer | 3|exact_match|↑ |0.5840|± |0.0312| | - bbh_cot_fewshot_causal_judgement | 4|get-answer | 3|exact_match|↑ |0.5508|± |0.0365| | - bbh_cot_fewshot_date_understanding | 4|get-answer | 3|exact_match|↑ |0.2600|± |0.0278| | - bbh_cot_fewshot_disambiguation_qa | 4|get-answer | 3|exact_match|↑ |0.3640|± |0.0305| | - bbh_cot_fewshot_dyck_languages | 4|get-answer | 3|exact_match|↑ |0.0040|± |0.0040| | - bbh_cot_fewshot_formal_fallacies | 4|get-answer | 3|exact_match|↑ |0.5040|± |0.0317| | - bbh_cot_fewshot_geometric_shapes | 4|get-answer | 3|exact_match|↑ |0.0920|± |0.0183| | - bbh_cot_fewshot_hyperbaton | 4|get-answer | 3|exact_match|↑ |0.5240|± |0.0316| | - bbh_cot_fewshot_logical_deduction_five_objects | 4|get-answer | 3|exact_match|↑ |0.1720|± |0.0239| | - bbh_cot_fewshot_logical_deduction_seven_objects | 4|get-answer | 3|exact_match|↑ |0.1080|± |0.0197| | - bbh_cot_fewshot_logical_deduction_three_objects | 4|get-answer | 3|exact_match|↑ |0.3520|± |0.0303| | - bbh_cot_fewshot_movie_recommendation | 4|get-answer | 3|exact_match|↑ |0.5040|± |0.0317| | - bbh_cot_fewshot_multistep_arithmetic_two | 4|get-answer | 3|exact_match|↑ |0.0600|± |0.0151| | - bbh_cot_fewshot_navigate | 4|get-answer | 3|exact_match|↑ |0.5560|± |0.0315| | - bbh_cot_fewshot_object_counting | 4|get-answer | 3|exact_match|↑ |0.4360|± |0.0314| | - bbh_cot_fewshot_penguins_in_a_table | 4|get-answer | 3|exact_match|↑ |0.2123|± |0.0340| | - bbh_cot_fewshot_reasoning_about_colored_objects | 4|get-answer | 3|exact_match|↑ |0.2440|± |0.0272| | - bbh_cot_fewshot_ruin_names | 4|get-answer | 3|exact_match|↑ |0.2440|± |0.0272| | - bbh_cot_fewshot_salient_translation_error_detection | 4|get-answer | 3|exact_match|↑ |0.1920|± |0.0250| | - bbh_cot_fewshot_snarks | 4|get-answer | 3|exact_match|↑ |0.3989|± |0.0368| | - bbh_cot_fewshot_sports_understanding | 4|get-answer | 3|exact_match|↑ |0.6560|± |0.0301| | - bbh_cot_fewshot_temporal_sequences | 4|get-answer | 3|exact_match|↑ |0.2760|± |0.0283| | - bbh_cot_fewshot_tracking_shuffled_objects_five_objects | 4|get-answer | 3|exact_match|↑ |0.1920|± |0.0250| | - bbh_cot_fewshot_tracking_shuffled_objects_seven_objects| 4|get-answer | 3|exact_match|↑ |0.0360|± |0.0118| | - bbh_cot_fewshot_tracking_shuffled_objects_three_objects| 4|get-answer | 3|exact_match|↑ |0.2840|± |0.0286| | - bbh_cot_fewshot_web_of_lies | 4|get-answer | 3|exact_match|↑ |0.5240|± |0.0316| | - bbh_cot_fewshot_word_sorting | 4|get-answer | 3|exact_match|↑ |0.0360|± |0.0118| |gsm8k | 3|flexible-extract| 5|exact_match|↑ |0.5072|± |0.0138| | | |strict-match | 5|exact_match|↑ |0.4943|± |0.0138| |hellaswag | 1|none | 0|acc |↑ |0.4729|± |0.0050| | | |none | 0|acc_norm |↑ |0.6181|± |0.0048| |mmlu_pro | 2|custom-extract | |exact_match|↑ |0.1435|± |0.0032| | - biology | 3|custom-extract | 5|exact_match|↑ |0.2050|± |0.0151| | - business | 3|custom-extract | 5|exact_match|↑ |0.1369|± |0.0122| | - chemistry | 3|custom-extract | 5|exact_match|↑ |0.0848|± |0.0083| | - computer_science | 3|custom-extract | 5|exact_match|↑ |0.1415|± |0.0172| | - economics | 3|custom-extract | 5|exact_match|↑ |0.1943|± |0.0136| | - engineering | 3|custom-extract | 5|exact_match|↑ |0.0929|± |0.0093| | - health | 3|custom-extract | 5|exact_match|↑ |0.1528|± |0.0126| | - history | 3|custom-extract | 5|exact_match|↑ |0.1549|± |0.0186| | - law | 3|custom-extract | 5|exact_match|↑ |0.1081|± |0.0094| | - math | 3|custom-extract | 5|exact_match|↑ |0.1414|± |0.0095| | - other | 3|custom-extract | 5|exact_match|↑ |0.1916|± |0.0130| | - philosophy | 3|custom-extract | 5|exact_match|↑ |0.1383|± |0.0155| | - physics | 3|custom-extract | 5|exact_match|↑ |0.1186|± |0.0090| | - psychology | 3|custom-extract | 5|exact_match|↑ |0.2130|± |0.0145| |truthfulqa_mc2 | 3|none | 0|acc |↑ |0.4734|± |0.0153| |winogrande | 1|none | 0|acc |↑ |0.6156|± |0.0137| ### Evaluation results for aisquared/bolt-instruct-7b: | Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr| |----------------------------------------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:| |arc_challenge | 1|none | 0|acc |↑ |0.4778|± |0.0146| | | |none | 0|acc_norm |↑ |0.4957|± |0.0146| |arc_easy | 1|none | 0|acc |↑ |0.7534|± |0.0088| | | |none | 0|acc_norm |↑ |0.7311|± |0.0091| |bbh | 3|get-answer | |exact_match|↑ |0.3038|± |0.0047| | - bbh_cot_fewshot_boolean_expressions | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | - bbh_cot_fewshot_causal_judgement | 4|get-answer | 3|exact_match|↑ |0.5668|± |0.0363| | - bbh_cot_fewshot_date_understanding | 4|get-answer | 3|exact_match|↑ |0.4480|± |0.0315| | - bbh_cot_fewshot_disambiguation_qa | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | - bbh_cot_fewshot_dyck_languages | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | - bbh_cot_fewshot_formal_fallacies | 4|get-answer | 3|exact_match|↑ |0.2240|± |0.0264| | - bbh_cot_fewshot_geometric_shapes | 4|get-answer | 3|exact_match|↑ |0.2960|± |0.0289| | - bbh_cot_fewshot_hyperbaton | 4|get-answer | 3|exact_match|↑ |0.5200|± |0.0317| | - bbh_cot_fewshot_logical_deduction_five_objects | 4|get-answer | 3|exact_match|↑ |0.0200|± |0.0089| | - bbh_cot_fewshot_logical_deduction_seven_objects | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | - bbh_cot_fewshot_logical_deduction_three_objects | 4|get-answer | 3|exact_match|↑ |0.6720|± |0.0298| | - bbh_cot_fewshot_movie_recommendation | 4|get-answer | 3|exact_match|↑ |0.1200|± |0.0206| | - bbh_cot_fewshot_multistep_arithmetic_two | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | - bbh_cot_fewshot_navigate | 4|get-answer | 3|exact_match|↑ |0.5560|± |0.0315| | - bbh_cot_fewshot_object_counting | 4|get-answer | 3|exact_match|↑ |0.1520|± |0.0228| | - bbh_cot_fewshot_penguins_in_a_table | 4|get-answer | 3|exact_match|↑ |0.4110|± |0.0409| | - bbh_cot_fewshot_reasoning_about_colored_objects | 4|get-answer | 3|exact_match|↑ |0.1880|± |0.0248| | - bbh_cot_fewshot_ruin_names | 4|get-answer | 3|exact_match|↑ |0.4800|± |0.0317| | - bbh_cot_fewshot_salient_translation_error_detection | 4|get-answer | 3|exact_match|↑ |0.4760|± |0.0316| | - bbh_cot_fewshot_snarks | 4|get-answer | 3|exact_match|↑ |0.2921|± |0.0342| | - bbh_cot_fewshot_sports_understanding | 4|get-answer | 3|exact_match|↑ |0.6760|± |0.0297| | - bbh_cot_fewshot_temporal_sequences | 4|get-answer | 3|exact_match|↑ |0.5880|± |0.0312| | - bbh_cot_fewshot_tracking_shuffled_objects_five_objects | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | - bbh_cot_fewshot_tracking_shuffled_objects_seven_objects| 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | - bbh_cot_fewshot_tracking_shuffled_objects_three_objects| 4|get-answer | 3|exact_match|↑ |0.8280|± |0.0239| | - bbh_cot_fewshot_web_of_lies | 4|get-answer | 3|exact_match|↑ |0.6560|± |0.0301| | - bbh_cot_fewshot_word_sorting | 4|get-answer | 3|exact_match|↑ |0.1400|± |0.0220| |gsm8k | 3|flexible-extract| 5|exact_match|↑ |0.7998|± |0.0110| | | |strict-match | 5|exact_match|↑ |0.7392|± |0.0121| |hellaswag | 1|none | 0|acc |↑ |0.4882|± |0.0050| | | |none | 0|acc_norm |↑ |0.6165|± |0.0049| |mmlu_pro | 2|custom-extract | |exact_match|↑ |0.4978|± |0.0044| | - biology | 3|custom-extract | 5|exact_match|↑ |0.6848|± |0.0174| | - business | 3|custom-extract | 5|exact_match|↑ |0.5729|± |0.0176| | - chemistry | 3|custom-extract | 5|exact_match|↑ |0.5380|± |0.0148| | - computer_science | 3|custom-extract | 5|exact_match|↑ |0.5878|± |0.0243| | - economics | 3|custom-extract | 5|exact_match|↑ |0.5592|± |0.0171| | - engineering | 3|custom-extract | 5|exact_match|↑ |0.2405|± |0.0137| | - health | 3|custom-extract | 5|exact_match|↑ |0.4670|± |0.0175| | - history | 3|custom-extract | 5|exact_match|↑ |0.3727|± |0.0248| | - law | 3|custom-extract | 5|exact_match|↑ |0.2525|± |0.0131| | - math | 3|custom-extract | 5|exact_match|↑ |0.7158|± |0.0123| | - other | 3|custom-extract | 5|exact_match|↑ |0.4351|± |0.0163| | - philosophy | 3|custom-extract | 5|exact_match|↑ |0.4128|± |0.0221| | - physics | 3|custom-extract | 5|exact_match|↑ |0.5142|± |0.0139| | - psychology | 3|custom-extract | 5|exact_match|↑ |0.5602|± |0.0176| |truthfulqa_mc2 | 3|none | 0|acc |↑ |0.5666|± |0.0162| |winogrande | 1|none | 0|acc |↑ |0.6385|± |0.0135| ### Evaluation results for aisquared/bolt-instruct-32b: | Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr| |----------------------------------------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:| |arc_challenge | 1|none | 0|acc |↑ |0.5776|± |0.0144| | | |none | 0|acc_norm |↑ |0.6007|± |0.0143| |arc_easy | 1|none | 0|acc |↑ |0.8333|± |0.0076| | | |none | 0|acc_norm |↑ |0.8228|± |0.0078| |bbh | 3|get-answer | |exact_match|↑ |0.3087|± |0.0048| | - bbh_cot_fewshot_boolean_expressions | 4|get-answer | 3|exact_match|↑ |0.5760|± |0.0313| | - bbh_cot_fewshot_causal_judgement | 4|get-answer | 3|exact_match|↑ |0.5882|± |0.0361| | - bbh_cot_fewshot_date_understanding | 4|get-answer | 3|exact_match|↑ |0.6640|± |0.0299| | - bbh_cot_fewshot_disambiguation_qa | 4|get-answer | 3|exact_match|↑ |0.1920|± |0.0250| | - bbh_cot_fewshot_dyck_languages | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | - bbh_cot_fewshot_formal_fallacies | 4|get-answer | 3|exact_match|↑ |0.0480|± |0.0135| | - bbh_cot_fewshot_geometric_shapes | 4|get-answer | 3|exact_match|↑ |0.2760|± |0.0283| | - bbh_cot_fewshot_hyperbaton | 4|get-answer | 3|exact_match|↑ |0.3200|± |0.0296| | - bbh_cot_fewshot_logical_deduction_five_objects | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | - bbh_cot_fewshot_logical_deduction_seven_objects | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | - bbh_cot_fewshot_logical_deduction_three_objects | 4|get-answer | 3|exact_match|↑ |0.5400|± |0.0316| | - bbh_cot_fewshot_movie_recommendation | 4|get-answer | 3|exact_match|↑ |0.6000|± |0.0310| | - bbh_cot_fewshot_multistep_arithmetic_two | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | - bbh_cot_fewshot_navigate | 4|get-answer | 3|exact_match|↑ |0.0160|± |0.0080| | - bbh_cot_fewshot_object_counting | 4|get-answer | 3|exact_match|↑ |0.5120|± |0.0317| | - bbh_cot_fewshot_penguins_in_a_table | 4|get-answer | 3|exact_match|↑ |0.2945|± |0.0379| | - bbh_cot_fewshot_reasoning_about_colored_objects | 4|get-answer | 3|exact_match|↑ |0.2280|± |0.0266| | - bbh_cot_fewshot_ruin_names | 4|get-answer | 3|exact_match|↑ |0.5120|± |0.0317| | - bbh_cot_fewshot_salient_translation_error_detection | 4|get-answer | 3|exact_match|↑ |0.5440|± |0.0316| | - bbh_cot_fewshot_snarks | 4|get-answer | 3|exact_match|↑ |0.7079|± |0.0342| | - bbh_cot_fewshot_sports_understanding | 4|get-answer | 3|exact_match|↑ |0.4880|± |0.0317| | - bbh_cot_fewshot_temporal_sequences | 4|get-answer | 3|exact_match|↑ |0.3120|± |0.0294| | - bbh_cot_fewshot_tracking_shuffled_objects_five_objects | 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | - bbh_cot_fewshot_tracking_shuffled_objects_seven_objects| 4|get-answer | 3|exact_match|↑ |0.0000|± |0.0000| | - bbh_cot_fewshot_tracking_shuffled_objects_three_objects| 4|get-answer | 3|exact_match|↑ |0.6280|± |0.0306| | - bbh_cot_fewshot_web_of_lies | 4|get-answer | 3|exact_match|↑ |0.4400|± |0.0315| | - bbh_cot_fewshot_word_sorting | 4|get-answer | 3|exact_match|↑ |0.0280|± |0.0105| |gsm8k | 3|flexible-extract| 5|exact_match|↑ |0.8795|± |0.0090| | | |strict-match | 5|exact_match|↑ |0.7801|± |0.0114| |hellaswag | 1|none | 0|acc |↑ |0.5407|± |0.0050| | | |none | 0|acc_norm |↑ |0.6763|± |0.0047| |mmlu_pro | 2|custom-extract | |exact_match|↑ |0.6340|± |0.0042| | - biology | 3|custom-extract | 5|exact_match|↑ |0.8117|± |0.0146| | - business | 3|custom-extract | 5|exact_match|↑ |0.6907|± |0.0165| | - chemistry | 3|custom-extract | 5|exact_match|↑ |0.6431|± |0.0142| | - computer_science | 3|custom-extract | 5|exact_match|↑ |0.6951|± |0.0228| | - economics | 3|custom-extract | 5|exact_match|↑ |0.7405|± |0.0151| | - engineering | 3|custom-extract | 5|exact_match|↑ |0.3447|± |0.0153| | - health | 3|custom-extract | 5|exact_match|↑ |0.6540|± |0.0166| | - history | 3|custom-extract | 5|exact_match|↑ |0.5512|± |0.0255| | - law | 3|custom-extract | 5|exact_match|↑ |0.3860|± |0.0147| | - math | 3|custom-extract | 5|exact_match|↑ |0.7979|± |0.0109| | - other | 3|custom-extract | 5|exact_match|↑ |0.6028|± |0.0161| | - philosophy | 3|custom-extract | 5|exact_match|↑ |0.5912|± |0.0220| | - physics | 3|custom-extract | 5|exact_match|↑ |0.6551|± |0.0132| | - psychology | 3|custom-extract | 5|exact_match|↑ |0.7243|± |0.0158| |truthfulqa_mc2 | 3|none | 0|acc |↑ |0.6906|± |0.0153| |winogrande | 1|none | 0|acc |↑ |0.6630|± |0.0133| --- # Limitations - May hallucinate without grounding - Performance varies by model size - Not suitable for high-risk domains without oversight --- # License Bolt Instruct is released under the [AI Squared Community License](https://docs.squared.ai/terms-of-use).