BaGua Architecture — 0.5B Base Model (八卦架构基座模型)

A novel non-Transformer neural network architecture inspired by the Eight Trigrams (易经八卦)

始于AI,不止于AI。/ Born from AI. Not limited to AI.


What is BaGua Architecture?

BaGua Architecture is a ground-up redesign of neural network architecture — no Transformer, no fixed attention mechanism.

Core idea: Eight trigram partitions dynamically determine information flow impedance through real-time polarity vectors. Every forward pass, the network topology is completely rebuilt from scratch.

This is fundamentally different from Transformer-based models (GPT, LLaMA, Claude, Gemini) where attention weights are fixed after training.


Model Card

Item Detail
Architecture BaGua Architecture (non-Transformer)
Parameters 504M (0.5B)
Dimensions dim=1024, 24 layers
Vocabulary 119,547 (bert-base-multilingual-cased)
Languages Chinese + English
Training Data OpenWebText (40GB English) + Chinese Wikipedia (1.7GB)
Training Steps ~170,000 steps
Best Val PPL ~106
Stage Early base model — not instruction tuned

Nine Core Modules

Module Function
动态八卦阵 Dynamic BaGua Field 8 trigram partitions; impedance matrix controls flow
卦象对冲 Polarity Clash Engine Real-time polarity vectors; opposite=low impedance
淘汰审核 Elimination Auditor Quality scoring per trigram head
淘汰低效机制 Low-Efficiency Eliminator Zero out low-value pathways
算力缓冲区 Compute Buffer Smooth signal jumps from pruning
去中心化自运算 Decentralized Self-Evolution Global survival tracking; auto pressure regulation
左耳进右耳出 In-and-Out Memory Intra-sequence memory; inter-sequence reset
九州编码 Nine Zones Encoding 3-level hierarchical position encoding; zero memory cost
任务自我感知 Task Self-Awareness First-token task detection; 23-scene soft-weight switching

Honest Assessment

This is an early-stage base model. Current limitations:

  • PPL ≈ 106 (fluent conversation typically requires PPL < 50)
  • Not instruction-tuned — responds with continuation, not answers
  • Chinese output quality lower than English (data imbalance)
  • Output may be incoherent at this training stage

What it demonstrates:

  • A working implementation of a novel non-Transformer architecture
  • Dynamic impedance mechanism functions correctly
  • Anti-overfitting properties verified in classification experiments (28% lower loss than BERT-like)
  • Foundation for further community training and research

Quick Start

# Install dependencies
pip install torch transformers

# Download inference script from GitHub
# https://github.com/123456yy384/bagua-Architecture
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("./tokenizer")

# Load model (requires architecture definition from GitHub)
# See bagua_chat_1b_v2.py for full inference code

Training Results

Classification Experiments

Task BaGua Baseline Note
AG News (20 epochs) 89.54% / Loss=0.31 91.75% / Loss=0.43 BaGua Loss 28% lower, zero overfitting
SST-2 73.4% / stable 79.7% / severe overfit BERT-like overfit from epoch 8
Random sequences 0.468M / 6.3201 0.784M / 6.3304 40% fewer params, lower loss

LLM Pretraining

  • Started from PPL ~116,000 (random init)
  • Reached PPL ~106 after ~170,000 steps
  • Training on dual RTX 4090D (cloud) + Tesla V100-SXM2-32GB (local)

Architecture Comparison

Feature Transformer BaGua Architecture
Network Topology Fixed after training Rebuilt every forward pass
Attention Independent parallel heads Polarity-driven interaction
Overfitting Defense External (dropout) Built-in dynamic sparsity
Task Adaptation Different model per task Auto-switching (23 scenes)
Memory KV Cache accumulates Reset between sequences
Position Encoding Unified external Neuron-level 3-tier

Citation

@misc{yang2026bagua,
  title  = {BaGua Architecture: Polarity-Driven Dynamic Impedance Neural Network},
  author = {Yang, Enshuo},
  year   = {2026},
  url    = {https://github.com/123456yy384/bagua-Architecture}
}

About the Author

Yang Enshuo (阳恩硕) — 17 years old, vocational school student, independent researcher.

No institution. No supervisor. No research funding.

Hardware: RTX 4060 laptop + second-hand Tesla V100 server + rented cloud GPU.

Contact: Oyes13619690046@outlook.com GitHub: https://github.com/123456yy384/bagua-Architecture CSDN: https://blog.csdn.net/2504_93363461/article/details/159346941


"Innovation has no age limit. Creativity has no institutional boundary."

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support