Original Reddit post

The Bit-Mass determines the information capacity and thus the model accuracy, not the chosen computation format. The Bit-Mass Theory presented here reorders neural networks by considering the total number of weight bits as the central quantity. Float32 matrix multiplication and BV32 with XNOR-plus-Popcount achieve exactly comparable results on MNIST with an identical Bit-Mass of 203264 bits. Comparison of three trainers (architecture 784→8→10, three epochs):

  • AdamW with Momentum and adaptive learning rate: 81.3 %
  • Vanilla-SGD (Float32): 76.0 %
  • BV32-Hebbian (binary): 76.4 % Further central findings:
  • Float32 and binary containers deliver nearly identical accuracy at the same Bit-Mass.
  • The remaining distance to AdamW is based solely on Momentum and adaptive learning rates.
  • Pure change of the arithmetic does not improve the result. Each neuron functions as a container for 32 binary decisions. The classical neuron perspective therefore leads to systematic misjudgments: eight Float neurons correspond informationally to 256 binary neurons. This insight is supported by three equivalent descriptions of the same weight matrix (neuron, bits, and data view). It is critical to note that this is a previously non-peer-reviewed single study with a future date. An independent reproduction by multiple laboratories remains essential. Nevertheless, the theory provides a consistent explanation for why Hebbian updates without backpropagation achieve the same performance as classical SGD. Historically, the Hebbian rule was long considered unstable. The present work shows that a simple error in the update formula was responsible for a performance loss of over 65 percentage points. After correction, the binary method converges exactly at the level of Vanilla-SGD. From an architectural theoretical perspective, a clear consequence emerges: Performance increases require either more bits through wider layers or a more efficient use of existing bits through Momentum and adaptive methods. The computation format itself is secondary. The experimental control is high: all trainers use identical data (50,000 MNIST examples), identical number of epochs, and identical architecture. Only the update rule varies. This allows effects to be clearly isolated. Long-term implications for research: The Bit-Mass Theory enables hardware-independent comparability of models. A wide Float network with 64 hidden neurons has the same Bit-Mass as a binary network with 2048 neurons. This opens new paths to model compression and the development of specialized accelerators. In summary, the work provides a fact-based contribution to the debate on efficient neural networks. The results are documented in a reproducible manner, but require further external validation before one can speak of a generally valid paradigm shift. 📎 Source 1: https://forward-prop.nhi1.de/ submitted by /u/aotto1968_2

Originally posted by u/aotto1968_2 on r/ArtificialInteligence