I went through an EE program in the 80s. By that point the steepest descent algo was known. We even built small AI classifiers to feel the innards, but the profs always told us we knew AI but the problem was computer power. There wouldn’t be enough for decades. Here is my question then: how do they use multiple processors to run an AI query or do training? The descent algo sets the weights for every node at the same time by searching for a minimum in a huge function. It baffles me how you spread that across multiple CPUs with their own memory. So training would seem to be a very slow task that might not break up well Same question for queries. If your goal is as simple as to find the next most probable word, I don’t see how to slice that up, send it for parralell processing, and reaggregate. How’s this scaling made to work? submitted by /u/Recent-Day3062
Originally posted by u/Recent-Day3062 on r/ArtificialInteligence
