i’ve been grinding on this project called oransim and i’m starting to think the whole approach might be a dead end. i’m trying to build a counterfactual engine for marketing—basically a simulator where u can ask “what if i move 30% of my budget from creator a to b?” before u actually spend the money.most people do this retrospectively, but i wanted a forward-looking predictive engine. here’s the strawman architecture i built:the scm backbone: using do-calculus to keep the dependencies clean (creative → platform → user).the timing (hawkes processes): i used hawkes instead of poisson because viral cascades need that self-excitation logic.the agents: i use llms to represent user archetypes that “react” to content via an embedding bus.here is why i think this is falling apart (please tear this apart):the scm vs llm boundary: scms need clean structural equations. llms are black boxes. right now, i’m treating the agent outputs as a noisy observation layer that feeds into the scm. is this even theoretically defensible? or am i just mixing oil and water?identifiability: once an llm mediates a causal node, do prompt-level interventions actually map to (do)-operators on latent user states? or am i just hand-waving and calling it “science”?the sim-to-real gap: fitting hawkes parameters on agent-generated data gives me marginals that look okay, but the covariance is dogshit compared to real-world logs. has anyone actually solved this for point processes?honestly, i’m not here to hype the repo—i’m here because i’m skeptical of my own factoring of scms and agents. if u work in causal inference or agent-based modeling, tell me why this architecture is a dead end. i’d rather know now before i sink another 6 months into it.repo: https://github.com/OranAi-Ltd/oransim ) submitted by /u/NoYoung7229
Originally posted by u/NoYoung7229 on r/ArtificialInteligence
