Where does Claude get its code training data?

eifachposteMB to AI (Reddit RSS)English · 21 days ago

It seems pretty well established that Claude is heads above its immediate competition. Was wondering two things:

Why?
Where the training data actually comes from? I would think the bulk of code trainable would be directly from Github. A very basic high-level process would probably be Github code -> base model -> RLHF for the instruct model. Sensible opinion would be ‘maybe Claude has stronger RLHF processes’ or something. But I am wondering if Anthropic actually does use different base corpora from other models. Is anyone more savvy than me able to comment on this? submitted by /u/MullingMulianto

Originally posted by u/MullingMulianto on r/ClaudeCode

You must log in or # to comment.

Chat