eifachposte

eifachposte

I have been working with web agents a bit to try and automate some tasks and realized that they are not very effective at the moment because they need to figure out how to use the web applications themselves. The UI is built for humans and does not have much metadata for AI agents to learn how to use the interface. So I tried creating a small specification for adding the metadata and related functionality to web applications so that they can be used by AI agents for effectively. I also created the SDKs to easily add the metadata to web applications and a MCP server which can be used by AI agents to interact with any web app which has added support for SID. This is all very experimental at the moment and I have not yet tried to benchmark agent performance on applications which support this metadata vs the ones which don’t but plan to do that soon. SID (Semantic Interaction Description) SID (Semantic Interaction Description) is an open accessibility standard designed specifically for AI agents. It enables AI agents to navigate and interact with web applications effectively by providing structured metadata that explicitly describes what each element does, how to interact with it, and how to track the results. 🚀 Why SID? Current approaches for AI agents to interact with the web face significant challenges: DOM Parsing: Text is arranged for humans, making semantic meaning and relationships unclear for machines. Accessibility Attributes (ARIA): Designed for assistive technologies, not AI agents; they don’t describe interaction outcomes. Vision/Screenshots: Slow, expensive in tokens, and prone to errors when layouts change. SID solves this by allowing web applications to expose their interactive capabilities as structured metadata, accessible via a standard JavaScript API and HTML attributes. 🛠️ Subprojects The SID ecosystem consists of two primary components:

SID SDK The SID SDK is a lightweight library for web developers to easily implement SID in their applications. It provides: Automatic Discovery: Tools to scan and register interactive elements. JavaScript API: A standard window.SID interface for agents to query. Operation Tracking: Built-in support for tracking asynchronous operations, so agents know exactly when an action (like a form submission) has completed. Human Input Handling: A secure way to handle sensitive data (like passwords or credit card info) by requesting human intervention via JSON Schema.
SID MCP Server The SID MCP Server implements the Model Context Protocol to bridge the gap between AI models and SID-enabled websites. It allows any MCP-compatible agent (like Claude Desktop or Cursor) to: Discover Elements: Automatically list all interactive SID elements on a page. Perform Actions: Execute clicks, form fills, and selections using high-level semantic commands. Reliable Automation: Leverage SID’s operation tracking to ensure tasks are completed before moving to the next step. 📖 How it Works SID uses a combination of HTML attributes and a JavaScript API: HTML Attributes Developers add data-sid-* attributes to their interactive elements: <button data-sid=“btn-save” data-sid-desc=“Saves the current document” data-sid-action=“click” > Save </button> JavaScript API AI agents (or the SID MCP Server) query the window.SID object: // Discover all interactive elements const elements = window.SID.getElements(); // Trigger an interaction and wait for completion const result = await window.SID.interact(‘btn-save’, { type: ‘click’ }); 🌟 Key Benefits 🔌 Universal: Any agent that understands SID can interact with any SID-enabled application. ⚡ Fast: No vision processing or complex DOM parsing required. 💰 Efficient: Minimal token usage compared to screenshot-based approaches. 🎯 Reliable: Explicit operation tracking tells agents exactly when actions complete. 📖 Documentation & Agent Specs We provide comprehensive documentation and self-contained specifications designed to be fed directly to AI agents. Main Documentation: https://sid-standard.github.io/ Coding Agent Specification : For AI agents that build websites . Includes HTML attribute reference, JavaScript API implementation guidance, and examples for common UI patterns. Browser Agent Specification : For AI agents that interact with websites . Includes discovery procedures, interaction execution, operation tracking, and integration guidance. These specs are optimized for LLM context windows, providing everything an agent needs to implement or use SID without extra fluff. For more details, visit the SID Documentation . submitted by /u/Vaibhav_Sinha

Originally posted by u/Vaibhav_Sinha on r/ArtificialInteligence

Semantic Interaction Description - Enable AI agents to navigate and interact with web applications effectively

Semantic Interaction Description - Enable AI agents to navigate and interact with web applications effectively