Hey everyone, I’m working on an intent classification pipeline for a specialized domain assistant and running into challenges with semantic overlap between categories. I’d love to get input from folks who’ve tackled similar problems using lightweight or classical NLP approaches. The Setup: ~20+ functional tasks mapped to broader intent categories Very limited labeled data per task (around 3–8 examples each) Rich, detailed task descriptions (including what each task should not handle) The Core Problem: There’s a mismatch between surface-level signals (keywords) and functional intent . Standard semantic similarity approaches tend to over-prioritize shared vocabulary, leading to misclassification when different intents use overlapping terminology. What I’ve Tried So Far: SetFit-style approaches: Good for general patterns but struggle with niche terminology Semantic anchoring: Breaking descriptions into smaller units and using max-similarity scoring NLI-based reranking: As a secondary check for logical consistency These have helped somewhat, but high-frequency, low-precision terms still dominate over more meaningful functional cues. Constraints: I’m trying to avoid using large LLMs due to latency, cost, and explainability concerns. Prefer solutions that are more deterministic and interpretable. Looking For: Techniques for building a signal hierarchy (e.g., prioritizing verbs/functional cues over generic terms) Ways to incorporate negative constraints (explicit signals that should rule out a class) without relying on brittle rules Recommendations for discriminative embeddings or representations suited for low-data, domain-specific settings Any architectures that handle shared vocabulary across intents more robustly If you’ve worked on similar problems or have pointers to relevant methods, I’d really appreciate your insights! Thanks in advance 🙏 submitted by /u/Formal-Author-2755
Originally posted by u/Formal-Author-2755 on r/ArtificialInteligence
