anemll commited on
Commit
7d31b6d
·
verified ·
1 Parent(s): 8bd1ad7

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,153 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - coreml
5
+ - ANE
6
+ - LLaMA
7
+ - Qwen
8
+ - DeepSeek
9
+ - Gemma
10
+ - Apple
11
+ - Apple Neural Engine
12
+ - DeepHermes
13
+ ---
14
+ # ANEMLL
15
+
16
+ **ANEMLL** (pronounced like "animal") is an open-source project focused on accelerating the porting of Large Language Models (LLMs) to tensor processors, starting with the Apple Neural Engine (ANE).
17
+
18
+ The goal is to provide a fully open-source pipeline from model conversion to inference for common LLM architectures running on ANE.
19
+
20
+ This enables seamless integration and on-device inference for low-power applications on edge devices, ensuring maximum privacy and security.
21
+
22
+ This is critical for autonomous applications, where models run directly on the device without requiring an internet connection.
23
+
24
+ For more information, visit the [ANEMLL GitHub repository](https://github.com/anemll/anemll).
25
+
26
+
27
+ ---
28
+
29
+ ## License
30
+
31
+ ANEMLL is licensed under the [MIT License](https://opensource.org/license/mit).
32
+ The original model may require a separate license depending on the architecture:
33
+ - LLaMA models: Based on Meta's LLaMA and may require Meta's license
34
+ - Qwen models: Based on Alibaba's Qwen and may require Alibaba's license
35
+ - Gemma models: Based on Google's Gemma and subject to Gemma Terms of Use
36
+ - Other models: Check respective original model licenses
37
+
38
+ This model is converted for CoreML using ANEMLL's open-source conversion pipeline. It supports multiple LLM architectures including LLaMA, Qwen, Gemma, and DeepSeek variants.
39
+
40
+ ---
41
+
42
+ ## Requirements
43
+
44
+ - **macOS 15 (Sequoia)** or later with Apple Neural Engine and 8GB RAM or more
45
+ - **CoreML Tools 8.x+** and **HuggingFace Transformers** libraries
46
+ - **Python 3.9+**
47
+
48
+ `chat.py` provides a sample inference script.
49
+ `chat_full.py` provides a sample inference script with history and conversation management.
50
+
51
+ **Installation**
52
+
53
+ 1. Download the model from Hugging Face:
54
+ ```bash
55
+ # Install required tools
56
+ pip install huggingface_hub
57
+
58
+ # Install Git LFS (Large File Support)
59
+ # macOS with Homebrew:
60
+ brew install git-lfs
61
+ # Or Ubuntu/Debian:
62
+ # sudo apt-get install git-lfs
63
+
64
+ # Initialize Git LFS
65
+ git lfs install
66
+
67
+ # Clone the repository with model files
68
+ git clone https://huggingface.co/anemll/anemll-google-gemma-3-1b-it-ctx4096-monolithic_0.3.5
69
+ ```
70
+
71
+ 2. Extract model files:
72
+ ```bash
73
+ # Navigate to cloned directory
74
+ cd anemll-google-gemma-3-1b-it-ctx4096-monolithic_0.3.5
75
+
76
+ # Pull LFS files (model weights)
77
+ git lfs pull
78
+
79
+ # Extract CoreML model files
80
+ find . -type f -name "*.zip" -exec unzip {} \;
81
+ ```
82
+
83
+ 3. Install dependencies:
84
+ ```bash
85
+ pip install coremltools transformers
86
+ ```
87
+
88
+ **Coremltools:**
89
+
90
+ See coremltools installation guide at https://apple.github.io/coremltools/docs-guides/source/installing-coremltools.html
91
+
92
+ **How to Run**
93
+
94
+ 1. Basic chat interface:
95
+ ```bash
96
+ python chat.py --meta ./meta.yaml
97
+ ```
98
+
99
+ 2. Full conversation mode with history:
100
+ ```bash
101
+ python chat_full.py --meta ./meta.yaml
102
+ ```
103
+
104
+ > Note: The first time the model loads, macOS will take some time to place it on the device.
105
+ > Subsequent loads will be instantaneous.
106
+ > Use Ctrl-D to exit, Ctrl-C to interrupt inference.
107
+
108
+ **More Info**
109
+ Please check following links for later updates:
110
+
111
+ * [GitHub](https://github.com/anemll)
112
+ * [Hugging Face Models](https://huggingface.co/anemll)
113
+ * [Twitter/X](https://x.com/anemll)
114
+ * [Website](https://anemll.com)
115
+
116
+
117
+ realanemll@gmail.com
118
+
119
+ # anemll-google-gemma-3-1b-it-ctx4096-monolithic_0.3.5
120
+
121
+ This is a CoreML model converted using ANEMLL for Apple Neural Engine inference.
122
+
123
+ ## Available Distributions
124
+
125
+ ### Standard Distribution
126
+ - Contains zipped MLMODELC files
127
+ - Suitable for macOS and development
128
+
129
+ ### iOS Distribution
130
+ - Contains unzipped MLMODELC files
131
+ - Ready for iOS deployment
132
+ - Includes offline tokenizer support
133
+
134
+ ## Model Information
135
+ - Context Length: 4096
136
+ - Batch Size: 64
137
+ - Number of Chunks: 1
138
+ - LUT Quantization: 6
139
+
140
+ ## Quick Start
141
+
142
+ ### Test in iOS/macOS App
143
+ Try our sample Chat-Bot app on TestFlight:
144
+ 1. Install TestFlight from App Store
145
+ 2. Join beta test: [TestFlight Link](https://testflight.apple.com/join/jrQq1D1C)
146
+ 3. App includes a small demo model pre-installed
147
+ 4. You can add custom models via HuggingFace URLs
148
+
149
+ > [!Note]
150
+ > - The TestFlight app works on both iOS and macOS
151
+ > - Demonstrates proper model integration and provides a reference implementation
152
+ > - iOS requires unzipped MLMODELC files and config.json for offline tokenizer
153
+ > - macOS supports both zipped and unzipped model formats
chat.py ADDED
The diff for this file is too large to render. See raw diff
 
chat_full.py ADDED
The diff for this file is too large to render. See raw diff
 
config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "tokenizer_class": "GemmaTokenizer",
3
+ "model_type": "gemma"
4
+ }
gemma3_monolithic_full_lut6.mlmodelc/analytics/coremldata.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ccbbe004298fb437653080fa0c45e763b19cc7ba7d95c6fce2d7593ba890fa38
3
+ size 243
gemma3_monolithic_full_lut6.mlmodelc/coremldata.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e8d658544e93bde37ab740ad203068453595c50a1ab528f857e981c82a37a90
3
+ size 1596
gemma3_monolithic_full_lut6.mlmodelc/metadata.json ADDED
@@ -0,0 +1,590 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "metadataOutputVersion" : "3.0",
4
+ "userDefinedMetadata" : {
5
+ "com.github.apple.coremltools.source" : "torch==2.5.0",
6
+ "com.github.apple.coremltools.version" : "9.0",
7
+ "com.anemll.context_length" : "4096",
8
+ "com.github.apple.coremltools.source_dialect" : "TorchScript",
9
+ "com.anemll.batch_size" : "64",
10
+ "com.anemll.info" : "Converted with Anemll v0.1.1",
11
+ "com.anemll.lut_bits" : "6"
12
+ },
13
+ "availability" : {
14
+ "macOS" : "15.0",
15
+ "tvOS" : "18.0",
16
+ "visionOS" : "2.0",
17
+ "watchOS" : "11.0",
18
+ "iOS" : "18.0",
19
+ "macCatalyst" : "18.0"
20
+ },
21
+ "inputSchema" : [
22
+ {
23
+ "hasShapeFlexibility" : "0",
24
+ "isOptional" : "0",
25
+ "dataType" : "Int32",
26
+ "formattedType" : "MultiArray (Int32 1 × 1)",
27
+ "shortDescription" : "",
28
+ "shape" : "[1, 1]",
29
+ "name" : "input_ids",
30
+ "type" : "MultiArray"
31
+ },
32
+ {
33
+ "hasShapeFlexibility" : "0",
34
+ "isOptional" : "0",
35
+ "dataType" : "Int32",
36
+ "formattedType" : "MultiArray (Int32 1)",
37
+ "shortDescription" : "",
38
+ "shape" : "[1]",
39
+ "name" : "position_ids",
40
+ "type" : "MultiArray"
41
+ },
42
+ {
43
+ "hasShapeFlexibility" : "0",
44
+ "isOptional" : "0",
45
+ "dataType" : "Float16",
46
+ "formattedType" : "MultiArray (Float16 1 × 1 × 1 × 4096)",
47
+ "shortDescription" : "",
48
+ "shape" : "[1, 1, 1, 4096]",
49
+ "name" : "causal_mask",
50
+ "type" : "MultiArray"
51
+ },
52
+ {
53
+ "hasShapeFlexibility" : "0",
54
+ "isOptional" : "0",
55
+ "dataType" : "Int32",
56
+ "formattedType" : "MultiArray (Int32 1)",
57
+ "shortDescription" : "",
58
+ "shape" : "[1]",
59
+ "name" : "current_pos",
60
+ "type" : "MultiArray"
61
+ }
62
+ ],
63
+ "outputSchema" : [
64
+ {
65
+ "hasShapeFlexibility" : "0",
66
+ "isOptional" : "0",
67
+ "dataType" : "Int32",
68
+ "formattedType" : "MultiArray (Int32 16)",
69
+ "shortDescription" : "",
70
+ "shape" : "[16]",
71
+ "name" : "argmax_idx",
72
+ "type" : "MultiArray"
73
+ },
74
+ {
75
+ "hasShapeFlexibility" : "0",
76
+ "isOptional" : "0",
77
+ "dataType" : "Float16",
78
+ "formattedType" : "MultiArray (Float16 16)",
79
+ "shortDescription" : "",
80
+ "shape" : "[16]",
81
+ "name" : "argmax_val",
82
+ "type" : "MultiArray"
83
+ }
84
+ ],
85
+ "modelParameters" : [
86
+
87
+ ],
88
+ "storagePrecision" : "Mixed (Float16, Palettized (12 bits), Palettized (14 bits), Palettized (15 bits), Palettized (17 bits), Palettized (18 bits), Palettized (24 bits), UInt6, UInt8)",
89
+ "method" : "predict",
90
+ "functions" : [
91
+ {
92
+ "inputSchema" : [
93
+ {
94
+ "hasShapeFlexibility" : "0",
95
+ "isOptional" : "0",
96
+ "dataType" : "Int32",
97
+ "formattedType" : "MultiArray (Int32 1 × 1)",
98
+ "shortDescription" : "",
99
+ "shape" : "[1, 1]",
100
+ "name" : "input_ids",
101
+ "type" : "MultiArray"
102
+ },
103
+ {
104
+ "hasShapeFlexibility" : "0",
105
+ "isOptional" : "0",
106
+ "dataType" : "Int32",
107
+ "formattedType" : "MultiArray (Int32 1)",
108
+ "shortDescription" : "",
109
+ "shape" : "[1]",
110
+ "name" : "position_ids",
111
+ "type" : "MultiArray"
112
+ },
113
+ {
114
+ "hasShapeFlexibility" : "0",
115
+ "isOptional" : "0",
116
+ "dataType" : "Float16",
117
+ "formattedType" : "MultiArray (Float16 1 × 1 × 1 × 4096)",
118
+ "shortDescription" : "",
119
+ "shape" : "[1, 1, 1, 4096]",
120
+ "name" : "causal_mask",
121
+ "type" : "MultiArray"
122
+ },
123
+ {
124
+ "hasShapeFlexibility" : "0",
125
+ "isOptional" : "0",
126
+ "dataType" : "Int32",
127
+ "formattedType" : "MultiArray (Int32 1)",
128
+ "shortDescription" : "",
129
+ "shape" : "[1]",
130
+ "name" : "current_pos",
131
+ "type" : "MultiArray"
132
+ }
133
+ ],
134
+ "computePrecision" : "Mixed (Float16, Int32, UInt16)",
135
+ "storagePrecision" : "Mixed (Float16, Palettized (12 bits), Palettized (14 bits), Palettized (15 bits), Palettized (17 bits), Palettized (18 bits), Palettized (24 bits), UInt6, UInt8)",
136
+ "stateSchema" : [
137
+ {
138
+ "dataType" : "Float16",
139
+ "isOptional" : "0",
140
+ "formattedType" : "State (Float16 44 × 1 × 512 × 256)",
141
+ "shortDescription" : "",
142
+ "shape" : "[44, 1, 512, 256]",
143
+ "name" : "model_model_kv_cache_local",
144
+ "type" : "State"
145
+ },
146
+ {
147
+ "dataType" : "Float16",
148
+ "isOptional" : "0",
149
+ "formattedType" : "State (Float16 8 × 1 × 4096 × 256)",
150
+ "shortDescription" : "",
151
+ "shape" : "[8, 1, 4096, 256]",
152
+ "name" : "model_model_kv_cache_global",
153
+ "type" : "State"
154
+ }
155
+ ],
156
+ "outputSchema" : [
157
+ {
158
+ "hasShapeFlexibility" : "0",
159
+ "isOptional" : "0",
160
+ "dataType" : "Int32",
161
+ "formattedType" : "MultiArray (Int32 16)",
162
+ "shortDescription" : "",
163
+ "shape" : "[16]",
164
+ "name" : "argmax_idx",
165
+ "type" : "MultiArray"
166
+ },
167
+ {
168
+ "hasShapeFlexibility" : "0",
169
+ "isOptional" : "0",
170
+ "dataType" : "Float16",
171
+ "formattedType" : "MultiArray (Float16 16)",
172
+ "shortDescription" : "",
173
+ "shape" : "[16]",
174
+ "name" : "argmax_val",
175
+ "type" : "MultiArray"
176
+ }
177
+ ],
178
+ "name" : "infer",
179
+ "mlProgramOperationTypeHistogram" : {
180
+ "Ios18.expandDims" : 53,
181
+ "Ios18.softmax" : 26,
182
+ "Ios18.mul" : 523,
183
+ "Ios18.matmul" : 52,
184
+ "Identity" : 1,
185
+ "Ios18.greaterEqual" : 2,
186
+ "Select" : 2,
187
+ "Ios18.readState" : 54,
188
+ "Tile" : 52,
189
+ "Ios18.gather" : 5,
190
+ "Ios18.add" : 133,
191
+ "Ios18.layerNorm" : 157,
192
+ "Ios18.sliceUpdate" : 52,
193
+ "Ios18.writeState" : 52,
194
+ "Ios18.reshape" : 108,
195
+ "Ios18.reduceArgmax" : 16,
196
+ "Ios16.reduceMax" : 16,
197
+ "Ios18.constexprLutToDense" : 199,
198
+ "Ios18.conv" : 198,
199
+ "Ios18.concat" : 299,
200
+ "Ios18.transpose" : 173,
201
+ "Ios18.cast" : 3,
202
+ "Ios18.gelu" : 26,
203
+ "Ios18.sliceByIndex" : 314,
204
+ "Ios18.squeeze" : 46
205
+ }
206
+ },
207
+ {
208
+ "inputSchema" : [
209
+ {
210
+ "hasShapeFlexibility" : "0",
211
+ "isOptional" : "0",
212
+ "dataType" : "Int32",
213
+ "formattedType" : "MultiArray (Int32 1 × 1)",
214
+ "shortDescription" : "",
215
+ "shape" : "[1, 1]",
216
+ "name" : "input_ids",
217
+ "type" : "MultiArray"
218
+ },
219
+ {
220
+ "hasShapeFlexibility" : "0",
221
+ "isOptional" : "0",
222
+ "dataType" : "Int32",
223
+ "formattedType" : "MultiArray (Int32 1)",
224
+ "shortDescription" : "",
225
+ "shape" : "[1]",
226
+ "name" : "position_ids",
227
+ "type" : "MultiArray"
228
+ },
229
+ {
230
+ "hasShapeFlexibility" : "0",
231
+ "isOptional" : "0",
232
+ "dataType" : "Float16",
233
+ "formattedType" : "MultiArray (Float16 1 × 1 × 1 × 4096)",
234
+ "shortDescription" : "",
235
+ "shape" : "[1, 1, 1, 4096]",
236
+ "name" : "causal_mask",
237
+ "type" : "MultiArray"
238
+ },
239
+ {
240
+ "hasShapeFlexibility" : "0",
241
+ "isOptional" : "0",
242
+ "dataType" : "Int32",
243
+ "formattedType" : "MultiArray (Int32 1)",
244
+ "shortDescription" : "",
245
+ "shape" : "[1]",
246
+ "name" : "current_pos",
247
+ "type" : "MultiArray"
248
+ }
249
+ ],
250
+ "computePrecision" : "Mixed (Float16, Int32, UInt16)",
251
+ "storagePrecision" : "Mixed (Float16, Palettized (12 bits), Palettized (14 bits), Palettized (15 bits), Palettized (17 bits), Palettized (18 bits), Palettized (24 bits), UInt6, UInt8)",
252
+ "stateSchema" : [
253
+ {
254
+ "dataType" : "Float16",
255
+ "isOptional" : "0",
256
+ "formattedType" : "State (Float16 44 × 1 × 512 × 256)",
257
+ "shortDescription" : "",
258
+ "shape" : "[44, 1, 512, 256]",
259
+ "name" : "model_model_kv_cache_local",
260
+ "type" : "State"
261
+ },
262
+ {
263
+ "dataType" : "Float16",
264
+ "isOptional" : "0",
265
+ "formattedType" : "State (Float16 8 × 1 × 4096 × 256)",
266
+ "shortDescription" : "",
267
+ "shape" : "[8, 1, 4096, 256]",
268
+ "name" : "model_model_kv_cache_global",
269
+ "type" : "State"
270
+ }
271
+ ],
272
+ "outputSchema" : [
273
+ {
274
+ "hasShapeFlexibility" : "0",
275
+ "isOptional" : "0",
276
+ "dataType" : "Int32",
277
+ "formattedType" : "MultiArray (Int32 16)",
278
+ "shortDescription" : "",
279
+ "shape" : "[16]",
280
+ "name" : "argmax_idx",
281
+ "type" : "MultiArray"
282
+ },
283
+ {
284
+ "hasShapeFlexibility" : "0",
285
+ "isOptional" : "0",
286
+ "dataType" : "Float16",
287
+ "formattedType" : "MultiArray (Float16 16)",
288
+ "shortDescription" : "",
289
+ "shape" : "[16]",
290
+ "name" : "argmax_val",
291
+ "type" : "MultiArray"
292
+ }
293
+ ],
294
+ "name" : "infer_rotate",
295
+ "mlProgramOperationTypeHistogram" : {
296
+ "Ios18.expandDims" : 53,
297
+ "Ios18.softmax" : 26,
298
+ "Ios18.mul" : 523,
299
+ "Ios18.matmul" : 52,
300
+ "Identity" : 1,
301
+ "Ios18.greaterEqual" : 2,
302
+ "Select" : 2,
303
+ "Ios18.readState" : 54,
304
+ "Tile" : 52,
305
+ "Ios18.gather" : 5,
306
+ "Ios18.add" : 133,
307
+ "Ios18.layerNorm" : 157,
308
+ "Ios18.sliceUpdate" : 52,
309
+ "Ios18.writeState" : 52,
310
+ "Ios18.reshape" : 108,
311
+ "Ios18.reduceArgmax" : 16,
312
+ "Ios16.reduceMax" : 16,
313
+ "Ios18.constexprLutToDense" : 199,
314
+ "Ios18.conv" : 198,
315
+ "Ios18.concat" : 271,
316
+ "Ios18.transpose" : 173,
317
+ "Ios18.cast" : 3,
318
+ "Ios18.gelu" : 26,
319
+ "Ios18.sliceByIndex" : 402,
320
+ "Ios18.squeeze" : 46
321
+ }
322
+ },
323
+ {
324
+ "inputSchema" : [
325
+ {
326
+ "hasShapeFlexibility" : "0",
327
+ "isOptional" : "0",
328
+ "dataType" : "Int32",
329
+ "formattedType" : "MultiArray (Int32 1 × 64)",
330
+ "shortDescription" : "",
331
+ "shape" : "[1, 64]",
332
+ "name" : "input_ids",
333
+ "type" : "MultiArray"
334
+ },
335
+ {
336
+ "hasShapeFlexibility" : "0",
337
+ "isOptional" : "0",
338
+ "dataType" : "Int32",
339
+ "formattedType" : "MultiArray (Int32 64)",
340
+ "shortDescription" : "",
341
+ "shape" : "[64]",
342
+ "name" : "position_ids",
343
+ "type" : "MultiArray"
344
+ },
345
+ {
346
+ "hasShapeFlexibility" : "0",
347
+ "isOptional" : "0",
348
+ "dataType" : "Float16",
349
+ "formattedType" : "MultiArray (Float16 1 × 1 × 64 × 4096)",
350
+ "shortDescription" : "",
351
+ "shape" : "[1, 1, 64, 4096]",
352
+ "name" : "causal_mask",
353
+ "type" : "MultiArray"
354
+ },
355
+ {
356
+ "hasShapeFlexibility" : "0",
357
+ "isOptional" : "0",
358
+ "dataType" : "Int32",
359
+ "formattedType" : "MultiArray (Int32 1)",
360
+ "shortDescription" : "",
361
+ "shape" : "[1]",
362
+ "name" : "current_pos",
363
+ "type" : "MultiArray"
364
+ }
365
+ ],
366
+ "computePrecision" : "Mixed (Float16, Int32, UInt16)",
367
+ "storagePrecision" : "Mixed (Float16, Palettized (12 bits), Palettized (14 bits), Palettized (15 bits), Palettized (17 bits), Palettized (24 bits), UInt6, UInt8)",
368
+ "stateSchema" : [
369
+ {
370
+ "dataType" : "Float16",
371
+ "isOptional" : "0",
372
+ "formattedType" : "State (Float16 44 × 1 × 512 × 256)",
373
+ "shortDescription" : "",
374
+ "shape" : "[44, 1, 512, 256]",
375
+ "name" : "model_model_kv_cache_local",
376
+ "type" : "State"
377
+ },
378
+ {
379
+ "dataType" : "Float16",
380
+ "isOptional" : "0",
381
+ "formattedType" : "State (Float16 8 × 1 × 4096 × 256)",
382
+ "shortDescription" : "",
383
+ "shape" : "[8, 1, 4096, 256]",
384
+ "name" : "model_model_kv_cache_global",
385
+ "type" : "State"
386
+ }
387
+ ],
388
+ "outputSchema" : [
389
+ {
390
+ "hasShapeFlexibility" : "0",
391
+ "isOptional" : "0",
392
+ "dataType" : "Float16",
393
+ "formattedType" : "MultiArray (Float16 1 × 1 × 1152)",
394
+ "shortDescription" : "",
395
+ "shape" : "[1, 1, 1152]",
396
+ "name" : "output_hidden_states",
397
+ "type" : "MultiArray"
398
+ }
399
+ ],
400
+ "name" : "prefill",
401
+ "mlProgramOperationTypeHistogram" : {
402
+ "Ios18.expandDims" : 52,
403
+ "Ios18.softmax" : 26,
404
+ "Ios18.mul" : 523,
405
+ "Ios18.matmul" : 52,
406
+ "Ios18.greaterEqual" : 2,
407
+ "Select" : 2,
408
+ "Ios18.readState" : 54,
409
+ "Tile" : 52,
410
+ "Ios18.gather" : 5,
411
+ "Ios18.add" : 133,
412
+ "Ios18.layerNorm" : 157,
413
+ "Ios18.sliceUpdate" : 52,
414
+ "Ios18.writeState" : 52,
415
+ "Ios18.reshape" : 186,
416
+ "Ios18.constexprLutToDense" : 183,
417
+ "Ios18.conv" : 182,
418
+ "Ios18.concat" : 297,
419
+ "Ios18.transpose" : 238,
420
+ "Ios18.cast" : 1,
421
+ "Ios18.gelu" : 26,
422
+ "Ios18.sliceByIndex" : 315,
423
+ "Ios18.squeeze" : 26
424
+ }
425
+ },
426
+ {
427
+ "inputSchema" : [
428
+ {
429
+ "hasShapeFlexibility" : "0",
430
+ "isOptional" : "0",
431
+ "dataType" : "Int32",
432
+ "formattedType" : "MultiArray (Int32 1 × 64)",
433
+ "shortDescription" : "",
434
+ "shape" : "[1, 64]",
435
+ "name" : "input_ids",
436
+ "type" : "MultiArray"
437
+ },
438
+ {
439
+ "hasShapeFlexibility" : "0",
440
+ "isOptional" : "0",
441
+ "dataType" : "Int32",
442
+ "formattedType" : "MultiArray (Int32 64)",
443
+ "shortDescription" : "",
444
+ "shape" : "[64]",
445
+ "name" : "position_ids",
446
+ "type" : "MultiArray"
447
+ },
448
+ {
449
+ "hasShapeFlexibility" : "0",
450
+ "isOptional" : "0",
451
+ "dataType" : "Float16",
452
+ "formattedType" : "MultiArray (Float16 1 × 1 × 64 × 4096)",
453
+ "shortDescription" : "",
454
+ "shape" : "[1, 1, 64, 4096]",
455
+ "name" : "causal_mask",
456
+ "type" : "MultiArray"
457
+ },
458
+ {
459
+ "hasShapeFlexibility" : "0",
460
+ "isOptional" : "0",
461
+ "dataType" : "Int32",
462
+ "formattedType" : "MultiArray (Int32 1)",
463
+ "shortDescription" : "",
464
+ "shape" : "[1]",
465
+ "name" : "current_pos",
466
+ "type" : "MultiArray"
467
+ }
468
+ ],
469
+ "computePrecision" : "Mixed (Float16, Int32, UInt16)",
470
+ "storagePrecision" : "Mixed (Float16, Palettized (12 bits), Palettized (14 bits), Palettized (15 bits), Palettized (17 bits), Palettized (24 bits), UInt6, UInt8)",
471
+ "stateSchema" : [
472
+ {
473
+ "dataType" : "Float16",
474
+ "isOptional" : "0",
475
+ "formattedType" : "State (Float16 44 × 1 × 512 × 256)",
476
+ "shortDescription" : "",
477
+ "shape" : "[44, 1, 512, 256]",
478
+ "name" : "model_model_kv_cache_local",
479
+ "type" : "State"
480
+ },
481
+ {
482
+ "dataType" : "Float16",
483
+ "isOptional" : "0",
484
+ "formattedType" : "State (Float16 8 × 1 × 4096 × 256)",
485
+ "shortDescription" : "",
486
+ "shape" : "[8, 1, 4096, 256]",
487
+ "name" : "model_model_kv_cache_global",
488
+ "type" : "State"
489
+ }
490
+ ],
491
+ "outputSchema" : [
492
+ {
493
+ "hasShapeFlexibility" : "0",
494
+ "isOptional" : "0",
495
+ "dataType" : "Float16",
496
+ "formattedType" : "MultiArray (Float16 1 × 1 × 1152)",
497
+ "shortDescription" : "",
498
+ "shape" : "[1, 1, 1152]",
499
+ "name" : "output_hidden_states",
500
+ "type" : "MultiArray"
501
+ }
502
+ ],
503
+ "name" : "prefill_rotate",
504
+ "mlProgramOperationTypeHistogram" : {
505
+ "Ios18.expandDims" : 52,
506
+ "Ios18.softmax" : 26,
507
+ "Ios18.mul" : 523,
508
+ "Ios18.matmul" : 52,
509
+ "Identity" : 1,
510
+ "Ios18.greaterEqual" : 2,
511
+ "Select" : 2,
512
+ "Ios18.readState" : 54,
513
+ "Tile" : 52,
514
+ "Ios18.gather" : 5,
515
+ "Ios18.add" : 132,
516
+ "Ios18.layerNorm" : 157,
517
+ "Ios18.sliceUpdate" : 52,
518
+ "Ios18.writeState" : 52,
519
+ "Ios18.reshape" : 186,
520
+ "Ios18.constexprLutToDense" : 183,
521
+ "Ios18.conv" : 182,
522
+ "Ios18.concat" : 253,
523
+ "Ios18.transpose" : 238,
524
+ "Ios18.cast" : 1,
525
+ "Ios18.gelu" : 26,
526
+ "Ios18.sliceByIndex" : 403,
527
+ "Ios18.squeeze" : 26
528
+ }
529
+ }
530
+ ],
531
+ "version" : "0.1.1",
532
+ "isUpdatable" : "0",
533
+ "defaultFunctionName" : "infer",
534
+ "specificationVersion" : 9,
535
+ "stateSchema" : [
536
+ {
537
+ "dataType" : "Float16",
538
+ "isOptional" : "0",
539
+ "formattedType" : "State (Float16 44 × 1 × 512 × 256)",
540
+ "shortDescription" : "",
541
+ "shape" : "[44, 1, 512, 256]",
542
+ "name" : "model_model_kv_cache_local",
543
+ "type" : "State"
544
+ },
545
+ {
546
+ "dataType" : "Float16",
547
+ "isOptional" : "0",
548
+ "formattedType" : "State (Float16 8 × 1 × 4096 × 256)",
549
+ "shortDescription" : "",
550
+ "shape" : "[8, 1, 4096, 256]",
551
+ "name" : "model_model_kv_cache_global",
552
+ "type" : "State"
553
+ }
554
+ ],
555
+ "computePrecision" : "Mixed (Float16, Int32, UInt16)",
556
+ "mlProgramOperationTypeHistogram" : {
557
+ "Ios18.expandDims" : 53,
558
+ "Ios18.softmax" : 26,
559
+ "Ios18.mul" : 523,
560
+ "Ios18.matmul" : 52,
561
+ "Identity" : 1,
562
+ "Ios18.greaterEqual" : 2,
563
+ "Select" : 2,
564
+ "Ios18.readState" : 54,
565
+ "Tile" : 52,
566
+ "Ios18.gather" : 5,
567
+ "Ios18.add" : 133,
568
+ "Ios18.layerNorm" : 157,
569
+ "Ios18.sliceUpdate" : 52,
570
+ "Ios18.writeState" : 52,
571
+ "Ios18.reshape" : 108,
572
+ "Ios18.reduceArgmax" : 16,
573
+ "Ios16.reduceMax" : 16,
574
+ "Ios18.constexprLutToDense" : 199,
575
+ "Ios18.conv" : 198,
576
+ "Ios18.concat" : 299,
577
+ "Ios18.transpose" : 173,
578
+ "Ios18.cast" : 3,
579
+ "Ios18.gelu" : 26,
580
+ "Ios18.sliceByIndex" : 314,
581
+ "Ios18.squeeze" : 46
582
+ },
583
+ "shortDescription" : "Anemll Model: Multifunction Combined",
584
+ "generatedClassName" : "gemma3_monolithic_full_lut6",
585
+ "author" : "Converted with Anemll v0.1.1",
586
+ "modelType" : {
587
+ "name" : "MLModelType_mlProgram"
588
+ }
589
+ }
590
+ ]
gemma3_monolithic_full_lut6.mlmodelc/model.mil ADDED
The diff for this file is too large to render. See raw diff
 
gemma3_monolithic_full_lut6.mlmodelc/weights/weight.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:335ccb97bc9e3aa1e08dd41615c295cea927b01531e66222c11db070042cdd6d
3
+ size 1125524736
meta.yaml ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model_info:
2
+ name: anemll-google-gemma-3-1b-it-ctx4096-monolithic
3
+ version: 0.3.5
4
+ description: |
5
+ Monolithic model running google-gemma-3-1b-it on Apple Neural Engine
6
+ Context length: 4096
7
+ Batch size: 64
8
+ Type: Monolithic (single file with embed+FFN+lm_head)
9
+ license: MIT
10
+ author: Anemll
11
+ framework: Core ML
12
+ language: Python
13
+ architecture: gemma3_text
14
+ model_type: monolithic
15
+ parameters:
16
+ context_length: 4096
17
+ batch_size: 64
18
+ lut_bits: 6
19
+ lut_per_channel: 4
20
+ lut_embeddings: 8
21
+ lut_embeddings_per_channel: 4
22
+ model_prefix: gemma3
23
+ monolithic_model: gemma3_monolithic_full_lut6.mlmodelc
24
+ split_lm_head: 16
25
+ argmax_in_model: true
26
+ vocab_size: 262144
27
+ lm_head_chunk_sizes: [16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384]
28
+ prefill_dynamic_slice: true
29
+ functions:
30
+ - infer
31
+ - prefill
32
+
33
+ # =============================================================================
34
+ # Conversion Parameters (for troubleshooting)
35
+ # =============================================================================
36
+ # Generated: 2026-02-12 19:35:18
37
+ #
38
+ # model_path: /Users/anemll/.cache/huggingface/hub/models--google--gemma-3-1b-it/snapshots/dcc83ea841ab6100d6b47a070329e1ba4cf78752
39
+ # output_dir: /Volumes/Models/ANE/gemma3_1b_mono_argmax_lut6_ctx4096
40
+ # command_line: "./anemll/utils/convert_monolith.sh --model google/gemma-3-1b-it --output /Volumes/Models/ANE/gemma3_1b_mono_argmax_lut6_ctx4096 --context 4096 --batch 64 --lut 6\\,4 --lut-embeddings 8\\,4 --lut-lmhead 6\\,4 --prefix gemma3 --argmax --restart 2c"
41
+ # context_length: 4096
42
+ # batch_size: 64
43
+ # prefix: gemma3
44
+ # architecture: gemma3_text
45
+ # argmax_in_model: true
46
+ # sliding_window: 512
47
+ # single_cache: false
48
+ # dynamic_prefill_slice: true
49
+ # monolithic: true
50
+ # anemll_version: 0.3.5
51
+ # lut_bits: 6,4
52
+ # lut_embeddings: 8,4
53
+ # lut_lmhead: 6,4
54
+ # rotate: true
55
+ # vocab_size: 262144
56
+ # lm_head_chunk_sizes: "[16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384]"
57
+ # =============================================================================
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
3
+ size 33384568
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
3
+ size 4689074
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff