Original Reddit post

Im going to pre-face this with: these models have been extremely useful and it is extremely impressive how well they can work on software code bases. Im trying to understand what is the frontier technology and how these models can perform so well. Please correct me if Im wrong but this is how I understand the technology:

  • Pre-training on a general corpus with lots of clean data The gains is found in post training:
  • Predominantly with fine tuning reinforcement learning human in the loop.
  • These humans train a constitutional ai,… and every thing from the inception of the AI company get reused in training the constitutional ai. Essentially they are brute forcing and fuzzing every possible edge case in reasoning traces. This constitutional AI can then very accurstely understand end point goals of the users and set benchmark criteria for the end model.
  • We then generate validators for pure logic which is for rl of syntheic data which further refines the solutions the end model trains for.
  • The constitutional model then trains the end model refining its reasoning traces. Harness, tool and skill integration - huge gains here:
  • The end model doesnt deal with raw data, but uses things like tree sitters, programs to find relevant files/ things in the file and take out noise so the model can better understand the solution. It also may use algebraic solvers (for math),… etc and other things to reconstruct things like abstract syntax trees and then design a solution that then transforms into the final generated code.
  • The harness guides the solution in a very organized manner calling on the main model for directions on what to do next as well as plan, execute etc… So what I see is that a significant reason for the models to get better is that these fro tier ai companies spend billions on human experts to fill in the gaps for constitutional training. This builds on itself with more being automated. The model architecture changes are mainly optimizstions here and there, but the majority of model improvement is in post training and filling out the gaps from edge cases. Is this currebtly the right understanding of frontier performance and model generations? submitted by /u/DopeyDonkeyUser

Originally posted by u/DopeyDonkeyUser on r/ArtificialInteligence