eifachposte

eifachposte

So I’ve been messing around with KDE Plasma 6 on Wayland trying to get Claude to actually navigate my desktop without burning through tokens like crazy. Every existing MCP desktop tool I found does the same thing: take a screenshot, send it to the model, let it figure out where to click. That’s like 3000 tokens per screenshot and it’s slow as hell. I figured there had to be a better way. KDE has DBus for literally everything. KWin exposes window geometry, Konsole gives you terminal text, Klipper has clipboard access. So I started pulling all of that into an MCP server and Claude can just read structured text instead of processing images. But the real breakthrough was the browser. I forked KDE’s Plasma Browser Integration extension and added a PageContent plugin to it. A C++ DBus interface on the host side, a content script that extracts the DOM on the extension side. Now Claude can: Read any page in my live browser — fully rendered, JavaScript executed, logged in sessions and everything Click elements by text ( text:Sign In ), CSS selector, or index ( link:3 ) Type into inputs , scroll, navigate, go back/forward, manage tabs Search Google and get structured results back All through DBus. All structured text. The page content comes back as title, headings, links, buttons, inputs — maybe 200-500 tokens instead of a 3000 token screenshot. And because it’s my actual browser session, there’s no bot detection, no CAPTCHAs, no auth issues. It just reads what I see. The kicker is this basically makes WebFetch obsolete for most things. WebFetch can’t render JavaScript, gets blocked by Cloudflare, has no session state. This reads the fully rendered page from your authenticated browser. Same data, way fewer tokens, and you can actually interact with it. It’s not just browser stuff though. The MCP server also does: - Window geometry and stacking order directly from KWin compositor - Terminal text from every Konsole tab via DBus - Clipboard contents via Klipper - Hit maps — every clickable element with exact screen coordinates - Mouse/keyboard control via ydotool Everything is KDE-native. No Electron wrappers, no screenshot OCR, no hacky workarounds. Just DBus all the way down. Still early but it works. Built it in one session with Claude lol. The irony of using Claude to build tools that make Claude cheaper to use is not lost on me. Repo: https://github.com/Niek-Kamer/waytrash TL;DR : MCP server for KDE Wayland that lets Claude read and control your browser through structured text instead of screenshots. ~15-30x token reduction. Uses a modified Plasma Browser Integration extension for full DOM access via DBus. ~Bypass any robot.txt? ;) submitted by /u/Diligent_Comb5668

Originally posted by u/Diligent_Comb5668 on r/ClaudeCode

I made Claude control my entire browser through DBus — no screenshots, 15-30x fewer tokens

I made Claude control my entire browser through DBus — no screenshots, 15-30x fewer tokens