Original Reddit post

Good day fellow nerds. I’m just spitballing a new concept for embedding retrieval and I was hoping for some industry input. The way it works is: Embeddings are generated, PCA projects the vector to 3 dimensions to form a 3 dimensional auditable position in space that we can visualize with our feeble brains. When we look to perform a retrieval, the input is vectorized and projected onto a 3 dimensional vector, where we then only compare the high dimensional vectors and take whatever KNN we determine. On a separate thread, an slm runs in ram and consolidates like embeddings and text into higher quality embeddings that better explain topics etc. This forms a human memory REM cycle of embedding management and quality control that makes your model have the ability to brake down subjects it’s learning and internalize it’s thoughts, as well as being able to manage the size of the vector database as it grows in size. Where GLiNER comes into the mix, is it extracts key concepts, terms, actions, entities, and uses them to cluster embeddings by their situational context, so that I can chain together concepts that on the surface had no relation, but are part of the same action, person, etc. Is this being done already? can I just download it? or do I have to make this myself? Please give me your thoughts on this idea. submitted by /u/Educational-Luck1286

Originally posted by u/Educational-Luck1286 on r/ArtificialInteligence