| Dataset | Model | Acc | F1 | Δ Log | Δ Static | Params (Avg) | Steps | Infer ms | Size |
|---|---|---|---|---|---|---|---|---|---|
| Banking77-20 | Logistic TF-IDF | 92.37% | 0.9230 | +0.00 | +0.76 | 64,940 | 0.00M | 0.473 | 1.00x |
| Static Seed | 91.61% | 0.9164 | -0.76 | +0.00 | 52,052 | 94.56M | 0.264 | 0.80x | |
| Dynamic Seed (Distill) | 93.53% | 0.9357 | +1.17 | +1.92 | 12,648 | 70.46M | 0.232 | 0.20x | |
| CLINC150 | Logistic TF-IDF | 97.00% | 0.9701 | +0.00 | +1.78 | 41,020 | 0.00M | — | 1.00x |
| Static Seed | 95.22% | 0.9521 | -1.78 | +0.00 | 52,052 | 66.80M | 0.302 | 1.27x | |
| Dynamic Seed | 94.78% | 0.9485 | -2.22 | -0.44 | 10,092 | 28.41M | 0.324 | 0.25x | |
| Dynamic Seed (Distill) | 95.44% | 0.9544 | -1.56 | +0.22 | 9,956 | 32.69M | 0.255 | 0.24x | |
| HWU64 | Logistic TF-IDF | 87.94% | 0.8725 | +0.00 | +0.81 | 42,260 | 0.00M | — | 1.00x |
| Static Seed | 87.13% | 0.8674 | -0.81 | +0.00 | 52,052 | 146.61M | 0.300 | 1.23x | |
| Dynamic Seed | 86.63% | 0.8595 | -1.31 | -0.50 | 12,573 | 62.54M | 0.334 | 0.30x | |
| Dynamic Seed (Distill) | 87.23% | 0.8686 | -0.71 | +0.10 | 13,117 | 62.86M | 0.340 | 0.31x | |
| MASSIVE-20 | Logistic TF-IDF | 86.06% | 0.7324 | +0.00 | -1.92 | 74,760 | 0.00M | — | 1.00x |
| Static Seed | 87.98% | 0.8411 | +1.92 | +0.00 | 52,052 | 129.26M | 0.247 | 0.70x | |
| Dynamic Seed | 86.94% | 0.7364 | +0.88 | -1.04 | 11,595 | 47.62M | 0.257 | 0.16x | |
| Dynamic Seed (Distill) | 86.45% | 0.7380 | +0.39 | -1.53 | 11,851 | 51.90M | 0.442 | 0.16x | |
| I set out to build a | |||||||||
| memory-first AI system | |||||||||
| and accidentally ended up building two. | |||||||||
| Magnus | |||||||||
| → a memory-first system that organizes knowledge | |||||||||
| Seed | |||||||||
| → an architecture discovery system that finds the smallest model that still wins | |||||||||
| I ran Seed across multiple real intent datasets. | |||||||||
| What stood out: | |||||||||
| On Banking77 → | |||||||||
| better accuracy + ~5x smaller model | |||||||||
| On MASSIVE → | |||||||||
| consistent wins | |||||||||
| On CLINC150 / HWU64 → | |||||||||
| not always higher accuracy, but ~4–5x smaller models | |||||||||
| The pattern is clear: | |||||||||
| 👉 smaller, structured models can compete with — and sometimes beat — larger baselines | |||||||||
| Traditional approach: | |||||||||
| scale model size → hope for gains | |||||||||
| Seed: | |||||||||
| search for structure → compress intelligently | |||||||||
| This isn’t about bigger models. | |||||||||
| 👉 it’s about | |||||||||
| finding the smallest model that still wins | |||||||||
| Not AGI | |||||||||
| Not “we solved NLU” | |||||||||
| But a real signal that: | |||||||||
| 👉 | |||||||||
| structure > scale | |||||||||
| submitted by | |||||||||
| /u/califalcon |
Originally posted by u/califalcon on r/ArtificialInteligence
You must log in or # to comment.
