Instructions to use tjarvis91/vfai-x-3.5-9b-options with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tjarvis91/vfai-x-3.5-9b-options with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="tjarvis91/vfai-x-3.5-9b-options") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("tjarvis91/vfai-x-3.5-9b-options") model = AutoModelForMultimodalLM.from_pretrained("tjarvis91/vfai-x-3.5-9b-options") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use tjarvis91/vfai-x-3.5-9b-options with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tjarvis91/vfai-x-3.5-9b-options" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tjarvis91/vfai-x-3.5-9b-options", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/tjarvis91/vfai-x-3.5-9b-options
- SGLang
How to use tjarvis91/vfai-x-3.5-9b-options with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "tjarvis91/vfai-x-3.5-9b-options" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tjarvis91/vfai-x-3.5-9b-options", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "tjarvis91/vfai-x-3.5-9b-options" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tjarvis91/vfai-x-3.5-9b-options", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use tjarvis91/vfai-x-3.5-9b-options with Docker Model Runner:
docker model run hf.co/tjarvis91/vfai-x-3.5-9b-options
Devlog 1/7 β AI Without Big Data Centers: Notes from a frontier built on a single consumer GPU
AI Without Big Data Centers
Notes from a frontier built on a single consumer GPU
There is a quiet assumption underneath most frontier AI work in 2026: that the next leap in intelligence will arrive from somebody else's data center. A bigger cluster, a wider pipe, a deeper budget. The model you actually use will be a thin client over an API you do not own.
We do not think that is the only future. We do not even think it is the most interesting one.
This is the first in a series of devlogs from an ongoing research effort to build a frontier-grade decision system on hardware that fits under a desk. The constraint is not nostalgia. It is design pressure. When you cannot brute-force your way out of a problem, you have to think about it more clearly.
The constraint shapes the architecture
The working envelope is a single consumer GPU β currently a 5070 Ti β and the design rule is that nothing in the system gets to assume otherwise. Quantized weights, sparse activation paths, compressed memory, and retrieval-as-cognition are not optimizations bolted on at deployment time. They are first-class architectural commitments from the very first commit.
The phrase we keep returning to internally is intelligence per watt. Not parameters. Not tokens. Not FLOPs. The unit that matters is how much useful, audited, decision-grade cognition you can extract from a fixed envelope of stored bits, active compute, and electricity.
That metric reframes almost every choice. Suddenly, doubling the parameter count is not progress unless it doubles intelligence per watt. Suddenly, a 30B model that requires a rented cluster forever is not a competitor to a 1B model that runs on your machine β it is a different product category.
Compact cognition is not "small AI"
The temptation in the small-model world is to behave apologetically: position yourself as the budget option, the offline option, the fallback. We reject that frame.
Compact cognition is its own discipline. It asks different questions:
- How much structure can a single stored bit carry before it collapses?
- How much of "reasoning" was always a routing problem, not a parameter problem?
- How many redundant tokens does a frontier model actually need, versus how many it has simply been given because nobody made it count them?
- What happens when the model is forced to treat its own weights as scarce?
The answers, so far, are surprising. There is a great deal of slack in the standard recipe. Most of it shows up only when you press against the envelope hard enough that the system has to start telling the truth.
Local sovereign AI
The other half of this thesis is political, not technical, and we will not pretend otherwise.
A model you can run, inspect, retrain on your own data, and refuse to share belongs to you in a way an API endpoint never will. For workloads where the data is sensitive, the cadence is fast, or the operator is small β private telemetry, personal finance, regulated workflows, edge deployments β that ownership is not a luxury. It is the actual product.
We use the phrase local sovereign AI internally, and we mean it literally: the deployed system should run on hardware the operator owns, learn from a corpus the operator controls, and emit an audit trail the operator can read end to end. No hidden dependency on a frontier API. No silent phone-home. No assumption that the operator will always have a working internet connection or a credit card on file with a hyperscaler.
This is harder than it sounds. Almost every modern training recipe quietly assumes the opposite. Removing those assumptions one at a time is most of the work.
Why this matters
Three reasons, in increasing order of seriousness.
Cost. A frontier model you rent is a frontier model you stop using the moment the price changes. A frontier model you own keeps working.
Latency and reliability. Decisions that have to happen in under a second do not benefit from a round trip to a data center. Decisions that have to happen at all do not benefit from someone else's outage.
Sovereignty. The next decade of AI will determine whether intelligence is a service rented from a small number of vendors or a capability that ordinary operators can own outright. Both will exist. We are interested in making sure the second one is real.
What comes next
This series will walk through a set of architectural and training discoveries we have been making while building toward this thesis. Each entry stands alone. Together, they describe a coherent attempt to build frontier-grade cognition inside consumer-hardware constraints.
Upcoming entries:
- How we replaced redundant text replay with crystallized state-action memory, and what that did to our training economics.
- The discovery that the shell around the model turned out to be alpha-bearing in its own right β and what that means for verifier-governed inference.
- Adaptive compute architectures β letting the model decide how hard to think, on a per-decision basis, rather than burning the same flops on every input.
- Compact frontier architectures β ternary FFNs, routed low-rank experts, and the parameter systems we believe will define local AI.
- The EVO20 training genome: a staged, verifier-first curriculum for growing cognition rather than dumping tokens at it.
- The execution reality layer: why gross alpha is fake, and what it takes to publish numbers under realistic market friction.
None of this is solved. All of it is in motion. We are publishing these notes because the constraint is real, the discoveries are interesting, and the future where ordinary people can own their own frontier is worth working out loud for.
Part 1 of an ongoing devlog series. The work is in progress and the claims are provisional. Follow along.
Live discussion + the deployed Q-Chat router:
- π« Discord community (builders training their own trading/finance models) β https://discord.gg/PtuHZDv5ju
- π Public research devlog β https://github.com/thron-j/qovaryx-ai-research
- π€ All published models β https://huggingface.co/tjarvis91
- β Support the next training run β https://ko-fi.com/tjarvis91
Type /qchat ask <question> in the server to send a query through our compact intent-router (live demo of the published thesis, running on free HF CPU).
No signals. No financial advice. Engineering only.