Upload folder using huggingface_hub

Browse files

Files changed (14) hide show

.gitattributes +1 -0
README.md +153 -0
chat.py +0 -0
chat_full.py +0 -0
config.json +4 -0
gemma3_monolithic_full_lut6.mlmodelc/analytics/coremldata.bin +3 -0
gemma3_monolithic_full_lut6.mlmodelc/coremldata.bin +3 -0
gemma3_monolithic_full_lut6.mlmodelc/metadata.json +590 -0
gemma3_monolithic_full_lut6.mlmodelc/model.mil +0 -0
gemma3_monolithic_full_lut6.mlmodelc/weights/weight.bin +3 -0
meta.yaml +57 -0
tokenizer.json +3 -0
tokenizer.model +3 -0
tokenizer_config.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,153 @@

+---
+license: mit
+tags:
+- coreml
+- ANE
+- LLaMA
+- Qwen
+- DeepSeek
+- Gemma
+- Apple
+- Apple Neural Engine
+- DeepHermes
+---
+# ANEMLL
+**ANEMLL** (pronounced like "animal") is an open-source project focused on accelerating the porting of Large Language Models (LLMs) to tensor processors, starting with the Apple Neural Engine (ANE).
+The goal is to provide a fully open-source pipeline from model conversion to inference for common LLM architectures running on ANE.
+This enables seamless integration and on-device inference for low-power applications on edge devices, ensuring maximum privacy and security.
+This is critical for autonomous applications, where models run directly on the device without requiring an internet connection.
+For more information, visit the [ANEMLL GitHub repository](https://github.com/anemll/anemll).
+---
+## License
+ANEMLL is licensed under the [MIT License](https://opensource.org/license/mit).
+The original model may require a separate license depending on the architecture:
+- LLaMA models: Based on Meta's LLaMA and may require Meta's license
+- Qwen models: Based on Alibaba's Qwen and may require Alibaba's license
+- Gemma models: Based on Google's Gemma and subject to Gemma Terms of Use
+- Other models: Check respective original model licenses
+This model is converted for CoreML using ANEMLL's open-source conversion pipeline. It supports multiple LLM architectures including LLaMA, Qwen, Gemma, and DeepSeek variants.
+---
+## Requirements
+- **macOS 15 (Sequoia)** or later with Apple Neural Engine and 8GB RAM or more
+- **CoreML Tools 8.x+** and **HuggingFace Transformers** libraries
+- **Python 3.9+**
+`chat.py` provides a sample inference script.
+`chat_full.py` provides a sample inference script with history and conversation management.
+**Installation**
+1. Download the model from Hugging Face:
+```bash
+# Install required tools
+pip install huggingface_hub
+# Install Git LFS (Large File Support)
+# macOS with Homebrew:
+brew install git-lfs
+# Or Ubuntu/Debian:
+# sudo apt-get install git-lfs
+# Initialize Git LFS
+git lfs install
+# Clone the repository with model files
+git clone https://huggingface.co/anemll/anemll-google-gemma-3-1b-it-ctx4096-monolithic_0.3.5
+```
+2. Extract model files:
+```bash
+# Navigate to cloned directory
+cd anemll-google-gemma-3-1b-it-ctx4096-monolithic_0.3.5
+# Pull LFS files (model weights)
+git lfs pull
+# Extract CoreML model files
+find . -type f -name "*.zip" -exec unzip {} \;
+```
+3. Install dependencies:
+```bash
+pip install coremltools transformers
+```
+**Coremltools:**
+See coremltools installation guide at https://apple.github.io/coremltools/docs-guides/source/installing-coremltools.html
+**How to Run**
+1. Basic chat interface:
+```bash
+python chat.py --meta ./meta.yaml
+```
+2. Full conversation mode with history:
+```bash
+python chat_full.py --meta ./meta.yaml
+```
+> Note: The first time the model loads, macOS will take some time to place it on the device.
+> Subsequent loads will be instantaneous.
+> Use Ctrl-D to exit, Ctrl-C to interrupt inference.
+**More Info**
+Please check following links for later updates:
+* [GitHub](https://github.com/anemll)
+* [Hugging Face Models](https://huggingface.co/anemll)
+* [Twitter/X](https://x.com/anemll)
+* [Website](https://anemll.com)
+realanemll@gmail.com
+# anemll-google-gemma-3-1b-it-ctx4096-monolithic_0.3.5
+This is a CoreML model converted using ANEMLL for Apple Neural Engine inference.
+## Available Distributions
+### Standard Distribution
+- Contains zipped MLMODELC files
+- Suitable for macOS and development
+### iOS Distribution
+- Contains unzipped MLMODELC files
+- Ready for iOS deployment
+- Includes offline tokenizer support
+## Model Information
+- Context Length: 4096
+- Batch Size: 64
+- Number of Chunks: 1
+- LUT Quantization: 6
+## Quick Start
+### Test in iOS/macOS App
+Try our sample Chat-Bot app on TestFlight:
+1. Install TestFlight from App Store
+2. Join beta test: [TestFlight Link](https://testflight.apple.com/join/jrQq1D1C)
+3. App includes a small demo model pre-installed
+4. You can add custom models via HuggingFace URLs
+> [!Note]
+> - The TestFlight app works on both iOS and macOS
+> - Demonstrates proper model integration and provides a reference implementation
+> - iOS requires unzipped MLMODELC files and config.json for offline tokenizer
+> - macOS supports both zipped and unzipped model formats

chat.py ADDED Viewed

The diff for this file is too large to render. See raw diff

chat_full.py ADDED Viewed

The diff for this file is too large to render. See raw diff

config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "tokenizer_class": "GemmaTokenizer",
+  "model_type": "gemma"
+}

gemma3_monolithic_full_lut6.mlmodelc/analytics/coremldata.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ccbbe004298fb437653080fa0c45e763b19cc7ba7d95c6fce2d7593ba890fa38
+size 243

gemma3_monolithic_full_lut6.mlmodelc/coremldata.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4e8d658544e93bde37ab740ad203068453595c50a1ab528f857e981c82a37a90
+size 1596

gemma3_monolithic_full_lut6.mlmodelc/metadata.json ADDED Viewed

	@@ -0,0 +1,590 @@

+[
+  {
+    "metadataOutputVersion" : "3.0",
+    "userDefinedMetadata" : {
+      "com.github.apple.coremltools.source" : "torch==2.5.0",
+      "com.github.apple.coremltools.version" : "9.0",
+      "com.anemll.context_length" : "4096",
+      "com.github.apple.coremltools.source_dialect" : "TorchScript",
+      "com.anemll.batch_size" : "64",
+      "com.anemll.info" : "Converted with Anemll v0.1.1",
+      "com.anemll.lut_bits" : "6"
+    },
+    "availability" : {
+      "macOS" : "15.0",
+      "tvOS" : "18.0",
+      "visionOS" : "2.0",
+      "watchOS" : "11.0",
+      "iOS" : "18.0",
+      "macCatalyst" : "18.0"
+    },
+    "inputSchema" : [
+      {
+        "hasShapeFlexibility" : "0",
+        "isOptional" : "0",
+        "dataType" : "Int32",
+        "formattedType" : "MultiArray (Int32 1 × 1)",
+        "shortDescription" : "",
+        "shape" : "[1, 1]",
+        "name" : "input_ids",
+        "type" : "MultiArray"
+      },
+      {
+        "hasShapeFlexibility" : "0",
+        "isOptional" : "0",
+        "dataType" : "Int32",
+        "formattedType" : "MultiArray (Int32 1)",
+        "shortDescription" : "",
+        "shape" : "[1]",
+        "name" : "position_ids",
+        "type" : "MultiArray"
+      },
+      {
+        "hasShapeFlexibility" : "0",
+        "isOptional" : "0",
+        "dataType" : "Float16",
+        "formattedType" : "MultiArray (Float16 1 × 1 × 1 × 4096)",
+        "shortDescription" : "",
+        "shape" : "[1, 1, 1, 4096]",
+        "name" : "causal_mask",
+        "type" : "MultiArray"
+      },
+      {
+        "hasShapeFlexibility" : "0",
+        "isOptional" : "0",
+        "dataType" : "Int32",
+        "formattedType" : "MultiArray (Int32 1)",
+        "shortDescription" : "",
+        "shape" : "[1]",
+        "name" : "current_pos",
+        "type" : "MultiArray"
+      }
+    ],
+    "outputSchema" : [
+      {
+        "hasShapeFlexibility" : "0",
+        "isOptional" : "0",
+        "dataType" : "Int32",
+        "formattedType" : "MultiArray (Int32 16)",
+        "shortDescription" : "",
+        "shape" : "[16]",
+        "name" : "argmax_idx",
+        "type" : "MultiArray"
+      },
+      {
+        "hasShapeFlexibility" : "0",
+        "isOptional" : "0",
+        "dataType" : "Float16",
+        "formattedType" : "MultiArray (Float16 16)",
+        "shortDescription" : "",
+        "shape" : "[16]",
+        "name" : "argmax_val",
+        "type" : "MultiArray"
+      }
+    ],
+    "modelParameters" : [
+    ],
+    "storagePrecision" : "Mixed (Float16, Palettized (12 bits), Palettized (14 bits), Palettized (15 bits), Palettized (17 bits), Palettized (18 bits), Palettized (24 bits), UInt6, UInt8)",
+    "method" : "predict",
+    "functions" : [
+      {
+        "inputSchema" : [
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Int32",
+            "formattedType" : "MultiArray (Int32 1 × 1)",
+            "shortDescription" : "",
+            "shape" : "[1, 1]",
+            "name" : "input_ids",
+            "type" : "MultiArray"
+          },
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Int32",
+            "formattedType" : "MultiArray (Int32 1)",
+            "shortDescription" : "",
+            "shape" : "[1]",
+            "name" : "position_ids",
+            "type" : "MultiArray"
+          },
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Float16",
+            "formattedType" : "MultiArray (Float16 1 × 1 × 1 × 4096)",
+            "shortDescription" : "",
+            "shape" : "[1, 1, 1, 4096]",
+            "name" : "causal_mask",
+            "type" : "MultiArray"
+          },
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Int32",
+            "formattedType" : "MultiArray (Int32 1)",
+            "shortDescription" : "",
+            "shape" : "[1]",
+            "name" : "current_pos",
+            "type" : "MultiArray"
+          }
+        ],
+        "computePrecision" : "Mixed (Float16, Int32, UInt16)",
+        "storagePrecision" : "Mixed (Float16, Palettized (12 bits), Palettized (14 bits), Palettized (15 bits), Palettized (17 bits), Palettized (18 bits), Palettized (24 bits), UInt6, UInt8)",
+        "stateSchema" : [
+          {
+            "dataType" : "Float16",
+            "isOptional" : "0",
+            "formattedType" : "State (Float16 44 × 1 × 512 × 256)",
+            "shortDescription" : "",
+            "shape" : "[44, 1, 512, 256]",
+            "name" : "model_model_kv_cache_local",
+            "type" : "State"
+          },
+          {
+            "dataType" : "Float16",
+            "isOptional" : "0",
+            "formattedType" : "State (Float16 8 × 1 × 4096 × 256)",
+            "shortDescription" : "",
+            "shape" : "[8, 1, 4096, 256]",
+            "name" : "model_model_kv_cache_global",
+            "type" : "State"
+          }
+        ],
+        "outputSchema" : [
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Int32",
+            "formattedType" : "MultiArray (Int32 16)",
+            "shortDescription" : "",
+            "shape" : "[16]",
+            "name" : "argmax_idx",
+            "type" : "MultiArray"
+          },
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Float16",
+            "formattedType" : "MultiArray (Float16 16)",
+            "shortDescription" : "",
+            "shape" : "[16]",
+            "name" : "argmax_val",
+            "type" : "MultiArray"
+          }
+        ],
+        "name" : "infer",
+        "mlProgramOperationTypeHistogram" : {
+          "Ios18.expandDims" : 53,
+          "Ios18.softmax" : 26,
+          "Ios18.mul" : 523,
+          "Ios18.matmul" : 52,
+          "Identity" : 1,
+          "Ios18.greaterEqual" : 2,
+          "Select" : 2,
+          "Ios18.readState" : 54,
+          "Tile" : 52,
+          "Ios18.gather" : 5,
+          "Ios18.add" : 133,
+          "Ios18.layerNorm" : 157,
+          "Ios18.sliceUpdate" : 52,
+          "Ios18.writeState" : 52,
+          "Ios18.reshape" : 108,
+          "Ios18.reduceArgmax" : 16,
+          "Ios16.reduceMax" : 16,
+          "Ios18.constexprLutToDense" : 199,
+          "Ios18.conv" : 198,
+          "Ios18.concat" : 299,
+          "Ios18.transpose" : 173,
+          "Ios18.cast" : 3,
+          "Ios18.gelu" : 26,
+          "Ios18.sliceByIndex" : 314,
+          "Ios18.squeeze" : 46
+        }
+      },
+      {
+        "inputSchema" : [
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Int32",
+            "formattedType" : "MultiArray (Int32 1 × 1)",
+            "shortDescription" : "",
+            "shape" : "[1, 1]",
+            "name" : "input_ids",
+            "type" : "MultiArray"
+          },
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Int32",
+            "formattedType" : "MultiArray (Int32 1)",
+            "shortDescription" : "",
+            "shape" : "[1]",
+            "name" : "position_ids",
+            "type" : "MultiArray"
+          },
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Float16",
+            "formattedType" : "MultiArray (Float16 1 × 1 × 1 × 4096)",
+            "shortDescription" : "",
+            "shape" : "[1, 1, 1, 4096]",
+            "name" : "causal_mask",
+            "type" : "MultiArray"
+          },
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Int32",
+            "formattedType" : "MultiArray (Int32 1)",
+            "shortDescription" : "",
+            "shape" : "[1]",
+            "name" : "current_pos",
+            "type" : "MultiArray"
+          }
+        ],
+        "computePrecision" : "Mixed (Float16, Int32, UInt16)",
+        "storagePrecision" : "Mixed (Float16, Palettized (12 bits), Palettized (14 bits), Palettized (15 bits), Palettized (17 bits), Palettized (18 bits), Palettized (24 bits), UInt6, UInt8)",
+        "stateSchema" : [
+          {
+            "dataType" : "Float16",
+            "isOptional" : "0",
+            "formattedType" : "State (Float16 44 × 1 × 512 × 256)",
+            "shortDescription" : "",
+            "shape" : "[44, 1, 512, 256]",
+            "name" : "model_model_kv_cache_local",
+            "type" : "State"
+          },
+          {
+            "dataType" : "Float16",
+            "isOptional" : "0",
+            "formattedType" : "State (Float16 8 × 1 × 4096 × 256)",
+            "shortDescription" : "",
+            "shape" : "[8, 1, 4096, 256]",
+            "name" : "model_model_kv_cache_global",
+            "type" : "State"
+          }
+        ],
+        "outputSchema" : [
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Int32",
+            "formattedType" : "MultiArray (Int32 16)",
+            "shortDescription" : "",
+            "shape" : "[16]",
+            "name" : "argmax_idx",
+            "type" : "MultiArray"
+          },
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Float16",
+            "formattedType" : "MultiArray (Float16 16)",
+            "shortDescription" : "",
+            "shape" : "[16]",
+            "name" : "argmax_val",
+            "type" : "MultiArray"
+          }
+        ],
+        "name" : "infer_rotate",
+        "mlProgramOperationTypeHistogram" : {
+          "Ios18.expandDims" : 53,
+          "Ios18.softmax" : 26,
+          "Ios18.mul" : 523,
+          "Ios18.matmul" : 52,
+          "Identity" : 1,
+          "Ios18.greaterEqual" : 2,
+          "Select" : 2,
+          "Ios18.readState" : 54,
+          "Tile" : 52,
+          "Ios18.gather" : 5,
+          "Ios18.add" : 133,
+          "Ios18.layerNorm" : 157,
+          "Ios18.sliceUpdate" : 52,
+          "Ios18.writeState" : 52,
+          "Ios18.reshape" : 108,
+          "Ios18.reduceArgmax" : 16,
+          "Ios16.reduceMax" : 16,
+          "Ios18.constexprLutToDense" : 199,
+          "Ios18.conv" : 198,
+          "Ios18.concat" : 271,
+          "Ios18.transpose" : 173,
+          "Ios18.cast" : 3,
+          "Ios18.gelu" : 26,
+          "Ios18.sliceByIndex" : 402,
+          "Ios18.squeeze" : 46
+        }
+      },
+      {
+        "inputSchema" : [
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Int32",
+            "formattedType" : "MultiArray (Int32 1 × 64)",
+            "shortDescription" : "",
+            "shape" : "[1, 64]",
+            "name" : "input_ids",
+            "type" : "MultiArray"
+          },
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Int32",
+            "formattedType" : "MultiArray (Int32 64)",
+            "shortDescription" : "",
+            "shape" : "[64]",
+            "name" : "position_ids",
+            "type" : "MultiArray"
+          },
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Float16",
+            "formattedType" : "MultiArray (Float16 1 × 1 × 64 × 4096)",
+            "shortDescription" : "",
+            "shape" : "[1, 1, 64, 4096]",
+            "name" : "causal_mask",
+            "type" : "MultiArray"
+          },
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Int32",
+            "formattedType" : "MultiArray (Int32 1)",
+            "shortDescription" : "",
+            "shape" : "[1]",
+            "name" : "current_pos",
+            "type" : "MultiArray"
+          }
+        ],
+        "computePrecision" : "Mixed (Float16, Int32, UInt16)",
+        "storagePrecision" : "Mixed (Float16, Palettized (12 bits), Palettized (14 bits), Palettized (15 bits), Palettized (17 bits), Palettized (24 bits), UInt6, UInt8)",
+        "stateSchema" : [
+          {
+            "dataType" : "Float16",
+            "isOptional" : "0",
+            "formattedType" : "State (Float16 44 × 1 × 512 × 256)",
+            "shortDescription" : "",
+            "shape" : "[44, 1, 512, 256]",
+            "name" : "model_model_kv_cache_local",
+            "type" : "State"
+          },
+          {
+            "dataType" : "Float16",
+            "isOptional" : "0",
+            "formattedType" : "State (Float16 8 × 1 × 4096 × 256)",
+            "shortDescription" : "",
+            "shape" : "[8, 1, 4096, 256]",
+            "name" : "model_model_kv_cache_global",
+            "type" : "State"
+          }
+        ],
+        "outputSchema" : [
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Float16",
+            "formattedType" : "MultiArray (Float16 1 × 1 × 1152)",
+            "shortDescription" : "",
+            "shape" : "[1, 1, 1152]",
+            "name" : "output_hidden_states",
+            "type" : "MultiArray"
+          }
+        ],
+        "name" : "prefill",
+        "mlProgramOperationTypeHistogram" : {
+          "Ios18.expandDims" : 52,
+          "Ios18.softmax" : 26,
+          "Ios18.mul" : 523,
+          "Ios18.matmul" : 52,
+          "Ios18.greaterEqual" : 2,
+          "Select" : 2,
+          "Ios18.readState" : 54,
+          "Tile" : 52,
+          "Ios18.gather" : 5,
+          "Ios18.add" : 133,
+          "Ios18.layerNorm" : 157,
+          "Ios18.sliceUpdate" : 52,
+          "Ios18.writeState" : 52,
+          "Ios18.reshape" : 186,
+          "Ios18.constexprLutToDense" : 183,
+          "Ios18.conv" : 182,
+          "Ios18.concat" : 297,
+          "Ios18.transpose" : 238,
+          "Ios18.cast" : 1,
+          "Ios18.gelu" : 26,
+          "Ios18.sliceByIndex" : 315,
+          "Ios18.squeeze" : 26
+        }
+      },
+      {
+        "inputSchema" : [
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Int32",
+            "formattedType" : "MultiArray (Int32 1 × 64)",
+            "shortDescription" : "",
+            "shape" : "[1, 64]",
+            "name" : "input_ids",
+            "type" : "MultiArray"
+          },
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Int32",
+            "formattedType" : "MultiArray (Int32 64)",
+            "shortDescription" : "",
+            "shape" : "[64]",
+            "name" : "position_ids",
+            "type" : "MultiArray"
+          },
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Float16",
+            "formattedType" : "MultiArray (Float16 1 × 1 × 64 × 4096)",
+            "shortDescription" : "",
+            "shape" : "[1, 1, 64, 4096]",
+            "name" : "causal_mask",
+            "type" : "MultiArray"
+          },
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Int32",
+            "formattedType" : "MultiArray (Int32 1)",
+            "shortDescription" : "",
+            "shape" : "[1]",
+            "name" : "current_pos",
+            "type" : "MultiArray"
+          }
+        ],
+        "computePrecision" : "Mixed (Float16, Int32, UInt16)",
+        "storagePrecision" : "Mixed (Float16, Palettized (12 bits), Palettized (14 bits), Palettized (15 bits), Palettized (17 bits), Palettized (24 bits), UInt6, UInt8)",
+        "stateSchema" : [
+          {
+            "dataType" : "Float16",
+            "isOptional" : "0",
+            "formattedType" : "State (Float16 44 × 1 × 512 × 256)",
+            "shortDescription" : "",
+            "shape" : "[44, 1, 512, 256]",
+            "name" : "model_model_kv_cache_local",
+            "type" : "State"
+          },
+          {
+            "dataType" : "Float16",
+            "isOptional" : "0",
+            "formattedType" : "State (Float16 8 × 1 × 4096 × 256)",
+            "shortDescription" : "",
+            "shape" : "[8, 1, 4096, 256]",
+            "name" : "model_model_kv_cache_global",
+            "type" : "State"
+          }
+        ],
+        "outputSchema" : [
+          {
+            "hasShapeFlexibility" : "0",
+            "isOptional" : "0",
+            "dataType" : "Float16",
+            "formattedType" : "MultiArray (Float16 1 × 1 × 1152)",
+            "shortDescription" : "",
+            "shape" : "[1, 1, 1152]",
+            "name" : "output_hidden_states",
+            "type" : "MultiArray"
+          }
+        ],
+        "name" : "prefill_rotate",
+        "mlProgramOperationTypeHistogram" : {
+          "Ios18.expandDims" : 52,
+          "Ios18.softmax" : 26,
+          "Ios18.mul" : 523,
+          "Ios18.matmul" : 52,
+          "Identity" : 1,
+          "Ios18.greaterEqual" : 2,
+          "Select" : 2,
+          "Ios18.readState" : 54,
+          "Tile" : 52,
+          "Ios18.gather" : 5,
+          "Ios18.add" : 132,
+          "Ios18.layerNorm" : 157,
+          "Ios18.sliceUpdate" : 52,
+          "Ios18.writeState" : 52,
+          "Ios18.reshape" : 186,
+          "Ios18.constexprLutToDense" : 183,
+          "Ios18.conv" : 182,
+          "Ios18.concat" : 253,
+          "Ios18.transpose" : 238,
+          "Ios18.cast" : 1,
+          "Ios18.gelu" : 26,
+          "Ios18.sliceByIndex" : 403,
+          "Ios18.squeeze" : 26
+        }
+      }
+    ],
+    "version" : "0.1.1",
+    "isUpdatable" : "0",
+    "defaultFunctionName" : "infer",
+    "specificationVersion" : 9,
+    "stateSchema" : [
+      {
+        "dataType" : "Float16",
+        "isOptional" : "0",
+        "formattedType" : "State (Float16 44 × 1 × 512 × 256)",
+        "shortDescription" : "",
+        "shape" : "[44, 1, 512, 256]",
+        "name" : "model_model_kv_cache_local",
+        "type" : "State"
+      },
+      {
+        "dataType" : "Float16",
+        "isOptional" : "0",
+        "formattedType" : "State (Float16 8 × 1 × 4096 × 256)",
+        "shortDescription" : "",
+        "shape" : "[8, 1, 4096, 256]",
+        "name" : "model_model_kv_cache_global",
+        "type" : "State"
+      }
+    ],
+    "computePrecision" : "Mixed (Float16, Int32, UInt16)",
+    "mlProgramOperationTypeHistogram" : {
+      "Ios18.expandDims" : 53,
+      "Ios18.softmax" : 26,
+      "Ios18.mul" : 523,
+      "Ios18.matmul" : 52,
+      "Identity" : 1,
+      "Ios18.greaterEqual" : 2,
+      "Select" : 2,
+      "Ios18.readState" : 54,
+      "Tile" : 52,
+      "Ios18.gather" : 5,
+      "Ios18.add" : 133,
+      "Ios18.layerNorm" : 157,
+      "Ios18.sliceUpdate" : 52,
+      "Ios18.writeState" : 52,
+      "Ios18.reshape" : 108,
+      "Ios18.reduceArgmax" : 16,
+      "Ios16.reduceMax" : 16,
+      "Ios18.constexprLutToDense" : 199,
+      "Ios18.conv" : 198,
+      "Ios18.concat" : 299,
+      "Ios18.transpose" : 173,
+      "Ios18.cast" : 3,
+      "Ios18.gelu" : 26,
+      "Ios18.sliceByIndex" : 314,
+      "Ios18.squeeze" : 46
+    },
+    "shortDescription" : "Anemll Model: Multifunction Combined",
+    "generatedClassName" : "gemma3_monolithic_full_lut6",
+    "author" : "Converted with Anemll v0.1.1",
+    "modelType" : {
+      "name" : "MLModelType_mlProgram"
+    }
+  }
+]

gemma3_monolithic_full_lut6.mlmodelc/model.mil ADDED Viewed

The diff for this file is too large to render. See raw diff

gemma3_monolithic_full_lut6.mlmodelc/weights/weight.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:335ccb97bc9e3aa1e08dd41615c295cea927b01531e66222c11db070042cdd6d
+size 1125524736

meta.yaml ADDED Viewed

	@@ -0,0 +1,57 @@

+model_info:
+  name: anemll-google-gemma-3-1b-it-ctx4096-monolithic
+  version: 0.3.5
+  description: |
+    Monolithic model running google-gemma-3-1b-it on Apple Neural Engine
+    Context length: 4096
+    Batch size: 64
+    Type: Monolithic (single file with embed+FFN+lm_head)
+  license: MIT
+  author: Anemll
+  framework: Core ML
+  language: Python
+  architecture: gemma3_text
+  model_type: monolithic
+  parameters:
+    context_length: 4096
+    batch_size: 64
+    lut_bits: 6
+    lut_per_channel: 4
+    lut_embeddings: 8
+    lut_embeddings_per_channel: 4
+    model_prefix: gemma3
+    monolithic_model: gemma3_monolithic_full_lut6.mlmodelc
+    split_lm_head: 16
+    argmax_in_model: true
+    vocab_size: 262144
+    lm_head_chunk_sizes: [16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384]
+    prefill_dynamic_slice: true
+    functions:
+      - infer
+      - prefill
+# =============================================================================
+# Conversion Parameters (for troubleshooting)
+# =============================================================================
+# Generated: 2026-02-12 19:35:18
+#
+# model_path: /Users/anemll/.cache/huggingface/hub/models--google--gemma-3-1b-it/snapshots/dcc83ea841ab6100d6b47a070329e1ba4cf78752
+# output_dir: /Volumes/Models/ANE/gemma3_1b_mono_argmax_lut6_ctx4096
+# command_line: "./anemll/utils/convert_monolith.sh --model google/gemma-3-1b-it --output /Volumes/Models/ANE/gemma3_1b_mono_argmax_lut6_ctx4096 --context 4096 --batch 64 --lut 6\\,4 --lut-embeddings 8\\,4 --lut-lmhead 6\\,4 --prefix gemma3 --argmax --restart 2c"
+# context_length: 4096
+# batch_size: 64
+# prefix: gemma3
+# architecture: gemma3_text
+# argmax_in_model: true
+# sliding_window: 512
+# single_cache: false
+# dynamic_prefill_slice: true
+# monolithic: true
+# anemll_version: 0.3.5
+# lut_bits: 6,4
+# lut_embeddings: 8,4
+# lut_lmhead: 6,4
+# rotate: true
+# vocab_size: 262144
+# lm_head_chunk_sizes: "[16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384]"
+# =============================================================================

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
+size 33384568

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
+size 4689074

tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff