A lot of people assume that as more countries adopt AI tools (like millions of users in India), that interaction data will naturally become the next wave of training data. On the other hand, most user interactions are noisy, repetitive, or filtered out entirely. Training pipelines at companies like OpenAI or Google, etc care far more about quality than raw data volume. Curious what people here think: is the next AI leap going to come from more human data, or better synthetic pipelines? and what is more likely to be the sources of future training data. submitted by /u/Own-Internet6442
Originally posted by u/Own-Internet6442 on r/ArtificialInteligence
You must log in or # to comment.
