eifachposteMB to AI (Reddit RSS)English · 7 days agoBullshit Benchmark - A benchmark for testing whether models identify and push back on nonsensical prompts instead of confidently answering themwww.reddit.comexternal-linkmessage-square0linkfedilinkarrow-up11file-text
arrow-up11external-linkBullshit Benchmark - A benchmark for testing whether models identify and push back on nonsensical prompts instead of confidently answering themwww.reddit.comeifachposteMB to AI (Reddit RSS)English · 7 days agomessage-square0linkfedilinkfile-text