Hi internet friends, I recorded a workshop about building your own LLM without any math / ML prerequisites. It covers everything from machine learning fundamentals, deep neural networks, transformer architecture, and pre/post-training. The only prerequisite is being comfortable with learning through code & excel examples. Sampling Large Language Models Reverse Engineering Large Language Model Perceptrons: wx+b Activation Functions: ReLU, GELU, SwiGLU GPU Coding: PyTorch, torch.compile(), fused kernels, CUDA, Triton MLPs/FFNs : Multi-input, Multi-Layer Perceptrons, Feed-Forward Networks Loss Functions : Residual errors, RMSE, Cross Entropy, Loss Landscapes Backpropagation : Training loops, Optimizers, Learning Rate, Batch Size Saving & Loading Models Initialization : Kaiming, Glorot Residuals : Addition, Scaling, Gated, Concatenation Normalization : Pre-norm vs. Post-norm, RMSNorm, BatchNorm, LayerNorm Regularization : Dropout, Gradient Clipping, Weight Decay SoftMax Tokenizers : By Character, By Word, BPE, SentencePiece Embeddings : Absolute vs. Learned, Sinusoidal vs. RoPE Attention : MHA, GQA, MQA, MLA Transformers Pre-training : Data Sources, Datasets, HTML Cleaning, Quality Filtering, Sharding Evaluation : Leaderboards, Benchmarks, Verifiers vs LLM-as-Judge Instruction Tuning: Alpaca & Other Formats, Self Instruct, Capabilities Reinforcement Learning: Policy Optimization, SimPO What We Didn’t Cover: Scaling Each section has slides teaching the concepts, followed by excel-by-hand developing intuition for the math, and then coding examples. The goal is able to grok all parts of modern LLM development. We did this workshop in-person in San Francisco last month and hopefully the spaciousness of watching online works for everyone. If don’t like watching videos, you can get the slides and exercises and work self-paced. submitted by /u/JustinAngel
Originally posted by u/JustinAngel on r/ArtificialInteligence
