Hey everybody! I’m building a new type of graph based AI model, and to prove to the world how fast the model generation is, I really need to somebody pick something for me to encode, so I can record the amount of time that it takes to complete the task. So, the way this works is: I start with a pile of content that is somewhat related, and then take all of the piles and merge them together into a composite model. So, you don’t have to have any interest in my project at all, but just pick one of these, so that proves that I didn’t do it ahead of time and am faking the time. https://huggingface.co/collections/common-pile/common-pile-v01-filtered-data I don’t really want to do the arxiv ones right now because this is the ascii version and arxiv is going to need utf-8 or unicode, but the rest seem okay besides the coding ones because I’m going to use a different encoding scheme for those. If you have some other set of training material that you think would be more useful, let me know and I’ll run that instead. I really don’t actually care what I train on, because I need to work through the process and figure out the issues. Thanks for your time. submitted by /u/Actual__Wizard
Originally posted by u/Actual__Wizard on r/ArtificialInteligence
