Original Reddit post

On of the main goals in designing benchmarks is to probe the weakness of current models, hence, as we are doing that, we are also unintentionally creating a high quality training dataset/playground to improve the model on their weaknesses. An analogy could be: A good test that can gauge students’ ability well can also be used as an excellent teaching material to improve students’ ability. That is why I also believe that good benchmarks can be used to train and test humans for their abilities to do the things that models might not yet capable of doing. Curious to know what your thoughts are. submitted by /u/LeadershipBoring2464

Originally posted by u/LeadershipBoring2464 on r/ArtificialInteligence