majentik
/

Nemotron-3-Nano-Omni-30B-A3B-Reasoning-TurboQuant-GGUF-IQ4_XS-TQ-KV

+NVIDIA Open Model License Agreement
+This NVIDIA Open Model License Agreement (the “Agreement”) is a legal agreement between the Legal Entity You represent, or if no
+entity is identified, You and NVIDIA Corporation and its Affiliates (“NVIDIA”) and governs Your use of the Models that NVIDIA
+provides to You under this Agreement. NVIDIA and You are each a “party” and collectively the “parties.”
+NVIDIA models released under this Agreement are intended to be used permissively and enable the further development of AI
+technologies. Subject to the terms of this Agreement, NVIDIA confirms that:
+•
+Models are commercially useable.
+•
+You are free to create and distribute Derivative Models.
+•
+NVIDIA does not claim ownership to any outputs generated using the Models or Model Derivatives.
+By using, reproducing, modifying, distributing, performing or displaying any portion or element of the Model or Derivative Model, or
+otherwise accepting the terms of this Agreement, you agree to be bound by this Agreement.
+1.
+Definitions. The following definitions apply to this Agreement:
+1.1.
+“Derivative Model” means all (a) modifications to the Model, (b) works based on the Model, and (c) any other derivative
+works of the Model. An output is not a Derivative Model.
+1.2.
+“Legal Entity” means the union of the acting entity and all other entities that control, are controlled by, or are under common
+control with that entity. For the purposes of this definition, “control” means (a) the power, direct or indirect, to cause the
+direction or management of such entity, whether by contract or otherwise, or (b) ownership of fifty percent (50%) or more
+of the outstanding shares, or (c) beneficial ownership of such entity.
+1.3.
+“Model” means the machine learning model, software, checkpoints, learnt weights, algorithms, parameters, configuration
+files and documentation shared under this Agreement.
+1.4.
+“You” or “Your” means an individual or Legal Entity exercising permissions granted by this Agreement.
+2.
+Conditions for Use, License Grant, AI Ethics and IP Ownership.
+2.1.
+Conditions for Use. The Model and any Derivative Model are subject to additional terms as described in Section 2 and
+Section 3 of this Agreement and govern Your use. If You institute copyright or patent litigation against any entity (including a crossclaim or counterclaim in a lawsuit) alleging that the Model or a Derivative Model constitutes direct or contributory copyright or
+patent infringement, then any licenses granted to You under this Agreement for that Model or Derivative Model will terminate as of
+the date such litigation is filed. NVIDIA may update this Agreement to comply with legal and regulatory requirements at any time
+and You agree to either comply with any updated license or cease Your copying, use, and distribution of the Model and any
+Derivative Model.
+2.2.
+License Grant. The rights granted herein are explicitly conditioned on Your full compliance with the terms of this
+Agreement. Subject to the terms and conditions of this Agreement, NVIDIA hereby grants to You a perpetual, worldwide, nonexclusive, no-charge, royalty-free, revocable (as stated in Section 2.1) license to publicly perform, publicly display, reproduce, use,
+create derivative works of, make, have made, sell, offer for sale, distribute (through multiple tiers of distribution) and import the
+Model.
+2.3.
+AI Ethics. NVIDIA is committed to safety, trust and transparency in AI development. NVIDIA encourages You to (a) ensure
+that the product or service You develop, use, offer as a service or distributes meets the legal and ethical requirements of the
+relevant industry or use case, (b) take reasonable measures to address unintended bias and to mitigate harm to others, including
+underrepresented or vulnerable groups, and (c) inform users of the nature and limitations of the product or service. NVIDIA
+expressly prohibits the use of its products or services for any purpose in violation of applicable law or regulation, including but not
+limited to (a) illegal surveillance, (b) illegal collection or processing of biometric information without the consent of the subject
+where required under applicable law, or (c) illegal harassment, abuse, threatening or bullying of individuals or groups of individuals
+or intentionally misleading or deceiving others.
+2.4.
+NVIDIA owns the Model and any Model Derivatives created by NVIDIA. Subject to NVIDIA’s underlying ownership rights in
+the Model or its Model Derivatives, You are and will be the owner of Your Model Derivatives. NVIDIA claims no ownership rights in
+outputs. You are responsible for outputs and their subsequent uses. Except as expressly granted in this Agreement, (a) NVIDIA
+reserves all rights, interests and remedies in connection with the Model and (b) no other license or right is granted to you by
+implication, estoppel or otherwise.
+3.
+Redistribution. You may reproduce and distribute copies of the Model or Derivative Models thereof in any medium, with or
+without modifications, provided that You meet the following conditions:
+3.1.
+If you distribute the Model, You must give any other recipients of the Model a copy of this Agreement and include the
+following attribution notice within a “Notice” text file with such copies: “Licensed by NVIDIA Corporation under the NVIDIA Open
+Model License”; and
+3.2.
+You may add Your own copyright statement to Your modifications and may provide additional or different license terms
+and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Models as a whole, provided
+Your use, reproduction, and distribution of the Model otherwise complies with the conditions stated in this Agreement.
+4.
+Trademarks. This Agreement does not grant permission to use the trade names, trademarks, service marks, or product
+names of NVIDIA, except as required for reasonable and customary use in describing the origin of the Model and reproducing the
+content of the “Notice” text file.
+5.
+Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, NVIDIA provides the Model on an “AS
+IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any
+warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are
+solely responsible for determining the appropriateness of using or redistributing the Model, Derivative Models and outputs and
+assume any risks associated with Your exercise of permissions under this Agreement.
+6.
+Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or
+otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, will NVIDIA be
+liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a
+result of this Agreement or out of the use or inability to use the Model, Derivative Models or outputs (including but not limited to
+damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or
+losses), even if NVIDIA has been advised of the possibility of such damages.
+7.
+Indemnity. You will indemnify and hold harmless NVIDIA from and against any claim by any third party arising out of or
+related to your use or distribution of the Model, Model Derivatives or outputs.
+8.
+You.
+Feedback. NVIDIA appreciates your feedback, and You agree that NVIDIA may use it without restriction or compensation to
+9.
+Governing Law. This Agreement will be governed in all respects by the laws of the United States and the laws of the State
+of Delaware, without regard to conflict of laws principles or the United Nations Convention on Contracts for the International Sale of
+Goods. The state and federal courts residing in Santa Clara County, California will have exclusive jurisdiction over any dispute or
+claim arising out of or related to this Agreement, and the parties irrevocably consent to personal jurisdiction and venue in those
+courts; except that, either party may apply for injunctive remedies or an equivalent type of urgent legal relief in any jurisdiction.
+10.
+Trade and Compliance. You agree to comply with all applicable export, import, trade and economic sanctions laws and
+regulations, as amended, including without limitation U.S. Export Administration Regulations and Office of Foreign Assets Control
+regulations. These laws include restrictions on destinations, end-users and end-use.
+Version Release Date: June 14, 2024

README.md ADDED Viewed

	@@ -0,0 +1,52 @@

+---
+license: other
+license_name: nvidia-open-model-license
+license_link: https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
+base_model: nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16
+tags: [nemotron, multimodal, turboquant, kv-cache, gguf, combo-card]
+---
+# Nemotron-3-Nano-Omni-30B-A3B-Reasoning - TurboQuant GGUF IQ4_XS + TurboQuant KV-Cache (matched stack)
+Documentation card for the matched TurboQuant weight + TurboQuant KV-cache stack
+of `Nemotron-3-Nano-Omni-30B-A3B-Reasoning` at GGUF IQ4_XS.
+**No new weights are published here.** This card describes a runtime configuration:
+load the weights from [`majentik/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-TurboQuant-GGUF-IQ4_XS`](https://huggingface.co/majentik/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-TurboQuant-GGUF-IQ4_XS)
+(forthcoming in Phase 2.2 of the publication plan) and apply the KV-cache modifier
+documented in [`majentik/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-TurboQuant`](https://huggingface.co/majentik/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-TurboQuant).
+## Modality matrix
+| Modality | Encoder | Quantization in this variant |
+|---|---|---|
+| Text | LLM backbone (Mamba-2 + Transformer hybrid Sparse MoE) | per the variant suffix |
+| Image | CRADIO v4-H | **BF16** (kept full-precision in every non-GGUF variant; GGUF uses mmproj-F16 split file) |
+| Audio | Parakeet-TDT-0.6B-v2 | **BF16** (same rationale) |
+| Video | Parakeet-TDT-0.6B-v2 + frame sampler | **BF16** (≤ 2 min, 256 frames @ 2 FPS) |
+NVIDIA's official FP8 / NVFP4 recipe keeps both encoders + the cross-modal
+MLP projectors in BF16 to preserve multimodal accuracy. We follow that
+convention in every quantized variant we ship.
+## Runtime quirks
+### llama.cpp
+Use `llama-mtmd-cli` for multimodal inference; pass `--mmproj mmproj-F16.gguf`
+(see `majentik/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-mmproj-F16`).
+**Do NOT use CUDA 13.2** — produces gibberish. Pin CUDA 12.x or
+use the Metal/CPU paths.
+### Ollama
+Text-only; multimodal is blocked because Ollama doesn't yet support
+the mmproj split-file pattern.
+### Reasoning mode
+`enable_thinking` defaults to `True`. To disable extended reasoning
+(e.g., for latency-sensitive cases), pass `enable_thinking=False`
+to the chat template / generate call. No separate "no-think"
+variant card exists — this is a runtime flag, not a model variant.