Due to curiosity of getting to know how an actually large language model like Chatgpt , gemini , claude work internally. I looked into the specific first principle based learning of the process. I have taken example of 4 training sentences. The boat floated down to the bank. The investor walked into the bank to open new account. the fisherman walked along the bank to cast his net. the bank has a vault. And one query sentence: Query: The investor walked to the bank to lock his money in … ( what can we put here ?) We first proceed by building a Language model head… wait what’s that ? that’s the Dictionary of tokens that is LLM’s like Chatgpt, gemini are trained, means all words which are present on the internet. We build our LM head with only of the tokens taken from above 4 training sentences. After that we go on with Tokenisation of query creating embeddings positional encoding Attention Feed forward networks LM head layer At the end it will be very exciting to see how in our query sentence: "The investor walked to the bank to lock his money in " we should predict next token as “vault” instead of any other token. https://www.youtube.com/watch?v=YTV5qUCpu2c submitted by /u/abhishekkumar333
Originally posted by u/abhishekkumar333 on r/ArtificialInteligence
