eifachposte

eifachposte

Genuine curiosity question. When you navigate from one page or topic to another online — by clicking links, searching, or just drifting — there’s an intuitive sense that you’ve “gone far” from where you started. But I keep getting stuck trying to think about what that actually means in a measurable way. A few candidates I’ve considered: Hop count (links or search steps between origin and current): simple, but coarse — one hop can take you across an enormous topic gap. Embedding cosine distance (sentence transformers, BERT-style): captures semantic drift, but feels fuzzy and threshold-dependent. Knowledge graph distance (Wikipedia link graph, ConceptNet): clean when both endpoints exist in the graph, breaks down otherwise. KL divergence between topic distributions (LDA-style): theoretically elegant but compute-heavy. Information gain / surprise (how unexpected the current content is given the start): same trade-off — clean in theory, expensive in practice. Each captures something different — semantic relatedness, structural connectedness, surprise/novelty, raw effort. None feels like THE answer. Is there established literature that’s thought about this carefully? Or do practitioners just pick whichever proxy fits the use case (recsys uses embeddings, search engines use something else)? Would love to hear how folks in IR, graph theory, recsys, or web crawling actually approach this in practice. submitted by /u/retarded_770

Originally posted by u/retarded_770 on r/ArtificialInteligence

How would you actually measure "distance" between two pieces of content on the web?

How would you actually measure "distance" between two pieces of content on the web?