Reverse engineering "Imagine with Claude"
- Last updated
- Reid Barber
Anthropic recently released a temporary research preview of Imagine with Claude, an experimental web application that allows Anthropic's Claude models to generate and manage a complete GUI application in real-time, directly within the user's browser. While the application is not open-source, its client-side code is delivered to the browser, allowing for an in-depth analysis of its architecture. This post breaks down how Imagine with Claude works under the hood. This is the third installment in my Reverse engineering Claude series, so check out Reverse engineering Claude Artifacts and Reverse engineering Claude Code for similar analyses on those applications.
Overview
Imagine with Claude operates on a client-server architecture where the "client" is a large language model and the "server" runs entirely within the user's browser. This browser-based server exposes a set of tools to the model through the Model Context Protocol (MCP). Communication happens over a persistent WebSocket connection, enabling real-time two-way interaction between the model and the UI.
The model can request the use of tools to perform a wide range of actions, from direct DOM manipulation (creating, updating, and removing UI elements) to integrating services like Google Maps, Chart.js, and device hardware (e.g. camera, microphone). A permission system ensures the user remains in control of sensitive actions like accessing hardware. This framework allows the model to not just respond with text, but to autonomously build, display, and manage a dynamic, multi-modal UI experience from scratch.
Main Components
- UI Layer: A dynamic interface that is constructed and manipulated by the large language model. Unlike traditional applications with predefined interfaces, the UI is a "canvas" that the model paints on using specialized DOM tools. It leverages libraries like morphdom for efficient DOM patching and updating without full page reloads.
- Core Logic (MCP Server): The central orchestrator running in the browser. Implemented as an
McpServerclass, it listens for JSON-RPC messages from the model, parses tool-use requests, checks permissions, and executes the corresponding tool functions. - Communication Layer (WebSocketTransport): This component establishes and maintains the persistent WebSocket connection to the backend model. It handles message serialization, heartbeats for connection stability, and automatic reconnection attempts.
- Tools: A suite of functions the model can invoke. These range from fundamental UI controls (
window_new,dom_replace_html) to integrations with rich services (chart:render,google_map:show_map) and browser APIs (camera:take_photo,speech_synthesis:speak). - Permission System: A
PermissionManagerclass that gates access to sensitive browser features like the camera, microphone, and geolocation. It requests user consent and remembers choices using session storage and cookies.
Data Flow
- User Input: The user enters a prompt into a chat interface. The application's
InputTrackercaptures this and other interactions like clicks or form submissions. - Model Query: The input is sent to the backend Claude model over the WebSocket connection.
- Tool Use Request: The model processes the request and decides which tool(s) to use. It formats a JSON-RPC request, such as
tools/call, specifying the tool name and arguments, and sends it back to the browser. - MCP Server Processing: The
McpServerin the browser receives the request. - Permission Check: If the requested tool requires special access (e.g.,
init_camera), thePermissionManagerchecks if permission has already been granted. - User Prompt (if needed): If permission is not granted, a dialog is displayed to the user requesting approval. Denial prevents the tool from running and sends an error back to the model.
- Tool Execution: If permitted, the corresponding tool function is executed.
- For DOM manipulation tools like
dom_replace_html, the provided HTML string is first sanitized using DOMPurify to prevent XSS attacks. - The sanitized HTML is then efficiently patched into the live DOM using morphdom.
- For content generation tools like
screenshot, html2canvas is used.
- For DOM manipulation tools like
- Tool Result: The tool's execution returns a result (e.g., "OK" or a data payload), which is sent back to the model.
- Continuous Interaction: The model receives the tool result and decides on the next step—either generating a text response, calling another tool, or waiting for further user input. This loop continues until the context window is exhausted.
Permission System
Imagine with Claude includes a PermissionManager to safeguard user privacy and security. This system governs access to sensitive browser APIs that could compromise user data if used maliciously. It applies to tools that require access to:
- Camera (
camera:*tools) - Microphone (
speech_recognition:*tools) - Geolocation (
init_geolocation)
Permission Flow:
- When a tool requiring special permissions is first called, the
PermissionManagerchecks its records. - If permission has not been previously granted or denied, it displays a user-friendly dialog explaining what is being requested and why.
- The user can choose to "Allow once" (for the current session) or "Always allow" (which stores the preference in a cookie for future sessions).
- If the user denies the request, the tool execution is blocked, and a notification is sent to the model.
- Granting permission triggers the corresponding browser permission prompt (e.g., the browser's native "Allow this site to use your camera" dialog).
Note that Claude Code has a similar permission system.
Tools
The tools are the core of what makes Imagine with Claude powerful. They are grouped below by their primary function.
UI Manipulation
window_new/window_close/window_change_title: Create, destroy, and manage floating windows on the screen, forming the basic building blocks of the GUI.dom_replace_html/dom_append_html: The primary tools for building the UI. They take a CSS selector and an HTML string, usingDOMPurifyfor security andmorphdomfor efficient rendering.dom_classes_replace/dom_set_attr/dom_remove: Allow for fine-grained, targeted modifications to existing DOM elements, enabling dynamic updates, style changes, and interactions without full redraws.private_streamable_*: Allows the model to stream HTML content chunk by chunk, enabling the UI to be built progressively in real-time.
User Interaction
InputTrackerService: A background service that listens for all user interactions (clicks, form submissions, key presses) and reports them to the model as context for its next action.private_loading_start/private_loading_end: Controls a global loading bar, giving the user visual feedback that the model is working.private_thinking_start/delta/end: Manages a "thinking bubble" UI element, allowing the model to stream its thought process or current task to the user.private_error/private_context_limit_reached: Tools for displaying system-level modal dialogs to the user for errors or session limits.
Content Generation
chart:render/chart:update_data: Integrates with Chart.js to render a wide variety of data visualizations inside a specified canvas element.google_map:*: A full suite of tools for embedding and controlling Google Maps, including showing a map, adding/updating markers, and fitting bounds.screenshot: Uses html2canvas to capture a screenshot of a specified DOM element and returns it to the model as a base64 image.qr_code_render: Generates and displays a QR code from a given text string.image_editing:*: Provides tools for image manipulation, such as compositing and extracting regions.init_pdf_builder: Initializes tools that use jsPDF to convert HTML content from a preview element into a downloadable PDF.
Device and Browser APIs
init_geolocation: Requests permission and enables tools to access the user's geographical location.camera:*: A set of tools to request camera access, display the camera feed in a video element, and capture photos.speech_recognition:*: Tools to access the microphone and perform speech-to-text.speech_synthesis:*: Tools that use the browser's text-to-speech engine to speak text aloud.
Zod is used to define tools and validate every message exchanged over the WebSocket transport.
Conclusion
Imagine with Claude is an early preview of the potential future of on-demand AI-generate UIs. There is definitely room for improvement in terms of capabilities, latency, and accessibility, but it's great to see this concept in a working product.