Original Reddit post

hey guys, been messing around with 4-bit ptq lately and standard uniform quantization is just completely lobotomizing my models. i had an idea and wanted to see if anyone thinks this would actually work or if i’m missing something obvious. the main issue is those massive outlier weights, right? if u use a normal grid, u either clip the outliers (brain damage) or stretch the scale so much that 95% of teh normal weights get crushed to zero. so i was thinking of 3 things to try and fix it: an elastic grid? instead of evenly spaced buckets, what if we keep the middle buckets normal to preserve the dense weights, but make the outer buckets scale exponentially to catch the huge outliers? i think we could just map the outer ones using a simple bit-shift hack << so there’s zero floating point overhead on the gpu. idek if standard kernels would hate this though? hessian targeting. since the grid would be weird, we cant just do normal division rounding. what if we run some text through it first to get the mean activations, and use that as a proxy for the hessian? then we basically just force the “load-bearing” weights into the absolute perfect bucket, even if it means we have to sacrifice the math on the useless weights nearby. bias error compensation (this is the part i’m most unsure about tbh). whenever u round weights into buckets, u get a tiny error matrix. in a big model, that error snowballs across like 80 layers until it starts hallucinating. what if we calculate the expected drift (error * mean_activations) and just… fold that exact opposite value into the layer’s bias? kinda like noise-canceling headphones but for math? we’d prob have to clamp it so weird tokens don’t blow up the activation functions, but theoretically it should zero out the cascade effect? anyway, calling it hero-quant in my head for now. let me know if there’s some glaring math flaw here or if this might actually preserve a model’s logic at 4-bits. does this make sense to anyone else? submitted by /u/CryOrganic8886

Originally posted by u/CryOrganic8886 on r/ArtificialInteligence