eifachposte

eifachposte

Wanted to understand how the core transformer papers actually connect at the concept level — not just “Paper B cites Paper A” but what specific methods, systems, and ideas flow between them. I ran 12 foundational papers (Attention Is All You Need, BERT, GPT-2/3, Scaling Laws, ViT, LoRA, Chain-of-Thought, FlashAttention, InstructGPT, LLaMA, DPO) through https://github.com/juanceresa/sift-kg (open-source CLI) — point it at a folder of documents + any LLM, get a knowledge graph. 435-entity knowledge graph with 593 relationships for ~$0.72 in API calls (gpt 4o-mini). Graph: https://juanceresa.github.io/sift-kg/transformers/graph.html — interactive and runs in browser. Some interesting structural patterns:

GPT-2 is the most connected node — it’s the hub everything flows through. BERT extends it, FlashAttention speeds it up, LoRA compresses it, InstructGPT fine-tunes it with RLHF
The graph splits into 9 natural communities. “Human Feedback and Reinforcement Learning” is the largest (24 entities), which tracks with how much of recent progress is RLHF-shaped
Chain-of-Thought Prompting bridges the reasoning cluster to the few-shot learning cluster — it’s structurally a connector between two different research threads
Common Crawl and BooksCorpus show up as shared infrastructure nodes connecting multiple model lineages Fully explorable focus view on any node to highlight it’s connections and traverse using arrow keys. Enter to select the next node to start a trail! submitted by /u/garagebandj

Originally posted by u/garagebandj on r/ArtificialInteligence

Knowledge graph of the transformer paper lineage — from Attention Is All You Need to DPO, mapped as an interactive concept graph [generated from a CLI + 12 PDFs]

Knowledge graph of the transformer paper lineage — from Attention Is All You Need to DPO, mapped as an interactive concept graph [generated from a CLI + 12 PDFs]