Who Needs Attention? Spiking Language Modeling via Synaptogenic Adaptive Processing Units.

zenodo.org

Who Needs Attention? Spiking Language Modeling via Synaptogenic Adaptive Processing Units.

zenodo.org

eifachposteMB to AI (Reddit RSS)English · 2 days ago

Who Needs Attention? Spiking Language Modeling via Synaptogenic Adaptive Processing Units

zenodo.org

Who Needs Attention? Spiking Language Modeling via Synaptogenic Adaptive Processing UnitsA spiking neural network generates coherent multi-turn conversation from pure next-token prediction, without attention, without RLHF, and without filtering — running on a $290 used GPU. We introduce the Synaptogenic Adaptive Processing Unit Language Model (SAPU-LM), a multi-timescale spiking reservoir architecture that replaces attention entirely with trained recurrent dynamics in leaky integrate-and-fire neurons. The chatbot "Nemo" emerges from freezing the learned spiking topology and retraining only 8.5% of parameters on conversational data, achieving 38.05 test perplexity on DailyDialog. The architecture spans a lineage from a frozen Echo State Network (~19,500 perplexity) to 84.15 perplexity (M-SAPU-LM) on a WikiText-103 10M-token subsample — an ~80× improvement from training reservoir weights via surrogate gradients. A Tiling Parallel SAPU (TPSAPU) shares a single 512×512 recurrent weight matrix across three timescales and recovers to 84.67 perplexity after L1 pruning, suggesting that membrane time constant τ alone creates functional differentiation. Ternary quantization compresses the learned recurrent core to ~45 KB at 93.6% sparsity. L1 pruning reveals timescale-dependent topology emergence: fast reservoirs maintain distributed connectivity while slow reservoirs self-organize into diagonal self-excitatory memory cells — a structure discovered by the network, not imposed by design. The trained ternary spiking core maps directly to analog resistor-capacitor-comparator circuits; a proof-of-concept hardware exporter has been developed. To our knowledge, this is the first demonstration of open-ended next-token prediction using a trained spiking reservoir with no attention mechanism. Code and checkpoints: https://gitlab.com/AntonioGCGonzalez/synaptogenic-adaptive-processing-unit-language-models This is a preliminary technical report. Several configurations are ongoing; results will be updated in subsequent revisions.

Original Reddit post

submitted by /u/killerjag

Originally posted by u/killerjag on r/ArtificialInteligence

You must log in or # to comment.

Chat