eifachposteMB to AI (Reddit RSS)English · 8 hours ago

my AI agent ran for 6 hours scraping garbage data and i didn't notice until i got the AWS bill

1

1

my AI agent ran for 6 hours scraping garbage data and i didn't notice until i got the AWS bill

eifachposteMB to AI (Reddit RSS)English · 8 hours ago

1

Original Reddit post

built a research agent last week that scrapes competitor landing pages and summarizes changes. felt pretty clean honestly. except i didn’t account for one thing, half the sites it was hitting had started serving bot detection pages instead of real content. my agent didn’t know the difference. just kept “summarizing” cloudflare challenges and empty divs like they were real content. 6 hours. hundreds of API calls to my LLM. all on garbage HTML. the actual useful data i got back? maybe 12 pages out of 200. i’m not managing my own scraping infrastructure for AI agents anymore. what are you guys using that actually returns clean content and fails gracefully when it hits a wall? tired of babysitting this stuff submitted by /u/LxM420

Originally posted by u/LxM420 on r/ArtificialInteligence

Chat

Treczoks@lemmy.world
link
fedilink
English
arrow-up
1·
7 hours ago
Anyone who wastes other peoples resources to scrape the net deserves any bill that comes for it. You are the problem that needs to be fixed ASAP.