AI & ML interests

Local LLMs

Recent Activity

Ujjwal-Tyagiย 
posted an update about 15 hours ago
view post
Post
54
This is the best set of AI and ML books and a full guide to learning machine learning from the ground up. This is my study material that I used, so I thought it would be helpful to share it with others. Like, share, and add it to your collection at Ujjwal-Tyagi/ai-ml-foundations-book-collection.
prithivMLmodsย 
posted an update about 24 hours ago
view post
Post
241
Now, a collection of various compression schemes for Qwen3.6 and the abliterated version 1 of dense models is available on the Hub. Check it out via the links below. ๐Ÿ‘‡

๐Ÿ”— Qwen3.6-MoE: https://huggingface.co/collections/prithivMLmods/qwen36-35b-a3b-compressions
๐Ÿ”— Qwen3.6-27B Compressions: https://huggingface.co/collections/prithivMLmods/qwen36-27b-compressions

๐Ÿค— > To learn more, visit the app page or the respective model pages.
Ujjwal-Tyagiย 
posted an update 3 days ago
view post
Post
3842
We are hiring at Shirova AI. We need AI researchers and engineers to work in our research lab. Shirova AI is a research lab in India, so we can help our researchers move to nearby workspaces or let them work from home without ever coming to the lab. We're building our founding team, so the pay will be good. You can learn, so don't hesitate to mail us at: careers@shirova.com
prithivMLmodsย 
posted an update 6 days ago
view post
Post
4082
HY-World-2.0 โ€” A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds is now available on Spaces, and it works both as native Gradio components and in Gradio server mode.

> HY-World-2.0-Demo: prithivMLmods/HY-World-2.0-Demo
> HY-World-2.0 [Server Mode]: prithivMLmods/HY-World-2.0-Demo
> Featuring 3D reconstruction and Gaussian splats with the Rerun viewer, along with camera poses, depth maps, and surface normals.
> In Server Mode, Gradio is served via FastAPI, with FastAPI remaining the top-level server.
> Model: tencent/HY-World-2.0
> GitHub: https://github.com/PRITHIVSAKTHIUR/HY-World-2.0-Demo

๐Ÿค—To learn more, visit the app page or the respective model pages.
Parveshiiiiย 
posted an update 10 days ago
view post
Post
499
๐Ÿš€ Sonic: A lightweight Python audio processing library with tempo matching, BPM detection, time-stretching, resampling & track blending โ€” now with GPU (CUDA) acceleration for 10x speed!

Perfect for quick remixes, batch edits or syncing tracks.

๐Ÿ‘‰ https://github.com/Parveshiiii/Sonic

#Python #AudioProcessing #OpenSource #PyTorch
Aurelien-Morganย 
posted an update 11 days ago
view post
Post
198
Launching a workweek of @retrain-pipelines wheels.

Day #1 : Compose
  • 4 replies
ยท
prithivMLmodsย 
posted an update 11 days ago
view post
Post
6150
A new comparator on Spaces showcases Standard FLUX.2 Decoder vs. FLUX.2 Small Decoder. The Small Decoder is ~1.4ร— faster, uses ~1.4ร— less VRAM, and maintains near-identical image quality. It has ~28M parameters with narrower channels [96, 192, 384, 384] vs. [128, 256, 512, 512], and the demo supports sequence generation by running both decoders simultaneously and comparing the results side by side.

๐Ÿค— Comparator: prithivMLmods/Flux.2-4B-Decoder-Comparator
๐Ÿ”— FLUX.2-small-decoder: black-forest-labs/FLUX.2-small-decoder
๐Ÿ”— GitHub: https://github.com/PRITHIVSAKTHIUR/Flux.2-4B-Encoder-Comparator
๐Ÿš Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

๐Ÿค— > App built on the Gradio SDK. To learn more, visit the app page or the respective model pages.
prithivMLmodsย 
posted an update 12 days ago
view post
Post
4194
Now, a collection of various compression schemes for Gemma 4 and the abliterated version 1 of dense models is available on the Hub. Check it out via the links below. ๐Ÿ‘‡

๐Ÿ”—Gemma 4 Compression(s)- https://huggingface.co/collections/prithivMLmods/gemma-4-compressions
๐Ÿ”—Gemma 4 Uncensored [MAX] + Compression(s) - [`ฮฒ ]- https://huggingface.co/collections/prithivMLmods/gemma-4-uncensored-max-compressions
๐Ÿ”—Gemma 4 Compression(s) - MoE- https://huggingface.co/collections/prithivMLmods/gemma-4-compressions-moe
๐Ÿ”—Gemma-4 F32 GGUF- https://huggingface.co/collections/prithivMLmods/gemma-4-f32-gguf

๐Ÿค— > To learn more, visit the app page or the respective model pages.
prithivMLmodsย 
posted an update 15 days ago
view post
Post
2288
Now the demo for image detection based on SAM3 and Gemma-4 (*Filter) is available on Spaces, using full-fledged Transformers inference with multimodal reasoning for processed images. It also supports video segmentation (mask), video segmentation (annotation), and image click segmentation.

๐Ÿค— Demo Space: prithivMLmods/SAM3-Gemma4-CUDA
๐Ÿฅฝ SAM3: facebook/sam3
๐Ÿ”— gemma-4-E2B-it: google/gemma-4-E2B-it

To learn more, visit the app page or the respective model pages.
  • 1 reply
ยท
Parveshiiiiย 
posted an update 17 days ago
view post
Post
1609
Excited to announce my latest open-source release on Hugging Face: Parveshiiii/breast-cancer-detector.

This model has been trained and validated on external datasets to support medical research workflows. It is designed to provide reproducible benchmarks and serve as a foundation for further exploration in healthcare AI.

Key highlights:
- Built for medical research and diagnostic study contexts
- Validated against external datasets for reliability
- Openly available to empower the community in building stronger, more effective solutions

This release is part of my ongoing effort to make impactful AI research accessible through **Modotte**. A detailed blog post explaining the methodology, dataset handling, and validation process will be published soon.

You can explore the model here: Parveshiiii/breast-cancer-detector

#AI #MedicalResearch #DeepLearning #Healthcare #OpenSource #HuggingFace

prithivMLmodsย 
posted an update 18 days ago
view post
Post
4748
The demo for Image Detection (*Filter) based on SAM3 and Qwen-3.5 is now available on Hugging Face Spaces using Transformers inference, with multimodal reasoning for processed images, and it also supports video segmentation (mask), video segmentation (annotation), and image click segmentation.

๐Ÿค— Demo Space: prithivMLmods/SAM3-Plus-Qwen3.5
๐Ÿฅฝ SAM3: facebook/sam3
๐Ÿ”— Qwen-3.5: Qwen/Qwen3.5-2B

To learn more, visit the app page or the respective model pages.
  • 5 replies
ยท
MaziyarPanahiย 
posted an update 24 days ago
view post
Post
1674
Training mRNA Language Models Across 25 Species for $165

We built an end-to-end protein AI pipeline covering structure prediction, sequence design, and codon optimization. After comparing multiple transformer architectures for codon-level language modeling, CodonRoBERTa-large-v2 emerged as the clear winner with a perplexity of 4.10 and a Spearman CAI correlation of 0.40, significantly outperforming ModernBERT. We then scaled to 25 species, trained 4 production models in 55 GPU-hours, and built a species-conditioned system that no other open-source project offers. Complete results, architectural decisions, and runnable code below.

https://huggingface.co/blog/OpenMed/training-mrna-models-25-species
OzTianluย 
posted an update 24 days ago
view post
Post
1398
https://github.com/lizixi-0x2F/March
I just released March, an open-source high-performance KV cache sharing library for LLM inference that uses Trie-based prefix deduplication.
When you run LLM services, you often see thousands of requests sharing the same system prompt and conversation history. But traditional KV cache systems store each sequence separately โ€” duplicating the exact same data over and over again. Pure waste.
March uses a Trie structure to automatically detect and reuse identical token prefixes. Instead of storing [system_prompt + history] 1000 times, it's stored once. Everyone shares it.
- 80-97% memory reduction in prefix-heavy workloads (tested on SmolLM2-135M with 500 multi-turn conversations)
- Zero-copy queries โ€” returns direct pointers into the memory pool, no expensive memcpy on the hot path
- Predictable memory usage โ€” fixed-size page pool with O(L) complexity
- Trade-off: slightly slower than dict O(1) lookup, but the memory savings are worth it in production
  • 1 reply
ยท
Ujjwal-Tyagiย 
posted an update 25 days ago
view post
Post
2803
I am sharing my study material for AI & ML, these books are really a "bible" and gives very strong foundation, I also have given guidance, introduction and my master notes in the dataset repo card! I hope you will find them helpful, if you have any queries, just start a discussion and I am always there to help you out!
Ujjwal-Tyagi/ai-ml-foundations-book-collection
  • 4 replies
ยท
prithivMLmodsย 
posted an update 28 days ago
view post
Post
5303
Flux-Klein-KV-Edit-Consistency demo is now available on Spaces. It preserves character identity and delivers high-quality, realistic results after edits. No need for any special prompts, just upload the image, type your prompt, and get the resulting image blazing fast.

๐Ÿ”ฅ Demo Space: prithivMLmods/flux-klein-kv-edit-consistency
๐Ÿค— Model: black-forest-labs/FLUX.2-klein-9b-kv
๐Ÿค— Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
๐Ÿ”— Gradio Server Mode: https://www.gradio.app/main/guides/server-mode

โž” Built with Headless Gradio, an alternative to using gr.Blocks for creating the frontend and triggering events, powered by FastAPI + Gradio. You can now design the frontend however you want, with continued support for APIs, MCP, and ZeroGPU.

โž” Gradio Server Mode is now available from gradio@v6.10.0.

To learn more, visit the app page or the respective model pages.
Parveshiiiiย 
posted an update 29 days ago
view post
Post
2898
Just did something Iโ€™ve been meaning to try for ages.

In only 3 hours, on 10 billion+ tokens, I trained a custom BPE + tiktoken-style tokenizer using my new library microtok โ€” and it hits the same token efficiency as Qwen3.

Tokenizers have always felt like black magic to me. We drop them into every LLM project, but actually training one from scratch? That always seemed way too complicated.

Turns out it doesnโ€™t have to be.

microtok makes the whole process stupidly simple โ€” literally just 3 lines of code. No heavy setup, no GPU required. I built it on top of the Hugging Face tokenizers library so it stays clean, fast, and actually understandable.

If youโ€™ve ever wanted to look under the hood and build your own optimized vocabulary instead of just copying someone elseโ€™s, this is the entry point youโ€™ve been waiting for.

I wrote up the full story, threw in a ready-to-run Colab template, and dropped the trained tokenizer on Hugging Face.

Blog โ†’ https://parveshiiii.github.io/blogs/microtok/
Trained tokenizer โ†’ https://huggingface.co/Parveshiiii/microtok
GitHub repo โ†’ https://github.com/Parveshiiii/microtok
Severianย 
posted an update 30 days ago
view post
Post
4433
Iโ€™ve been working on a new mathematical approach to real-time video compositing and background removal, and I wanted to share a live demo.

Traditionally, real-time keyers either use 3D color-space bounding boxes (which struggle with semi-transparent hair and motion blur) or heavy Machine Learning models (which require massive GPU compute and often suffer from temporal "jitter" on the edges).

I wanted to see if I could solve this using purely deterministic math so it could run client-side in a standard browser.

The engine uses a custom mathematical framework I call CMT SRL SEFA. Instead of looking at raw color values or guessing semantics like an AI, it treats the video feed as complex-encoded sequences. It uses harmonic frequencies to map phase geometry and applies a "Stability Cost Function" to find the global minimum stability. In short: it isolates the foreground from the background by measuring signal complexity and structural contradictions.

Give it a try using your own messy plates and such. As I am not a VFX artist, I am curious to hear thoughts and what should be improved upon and made better

https://severian-cmt-sefa-realtime-vfx-keyer.hf.space/
  • 2 replies
ยท
MaziyarPanahiย 
posted an update about 1 month ago
view post
Post
2234
We annotated 119K medical images with two frontier VLMs (Qwen 3.5, Kimi K2.5), cross-validated at 93% agreement, and produced 110K training records, all for under $500. Fine-tuning 3 small models (2-3B params) improved all benchmarks: best model reaches +15.0% average exact match.

Everything is open-sourced: datasets, adapters, and code.

https://huggingface.co/blog/OpenMed/synthvision
  • 2 replies
ยท
prithivMLmodsย 
posted an update about 1 month ago
view post
Post
4476
Map-Anything v1 (Universal Feed-Forward Metric 3D Reconstruction) demo is now available on Hugging Face Spaces. Built with Gradio and integrated with Rerun, it performs multi-image and video-based 3D reconstruction, depth, normal map, and interactive measurements.

๐Ÿค— Demo: prithivMLmods/Map-Anything-v1
๐Ÿค— Model: facebook/map-anything-v1
๐Ÿค— Hf-Papers: MapAnything: Universal Feed-Forward Metric 3D Reconstruction (2509.13414)