Skip to content

How the Agent Works

Frontman is an AI agent that sits between your browser and your source code. You describe a change in natural language, and it executes that change by looking at your running app, reading relevant files, and editing them — all without you leaving the browser.

This page explains what happens under the hood so you can work with the agent more effectively.

Frontman has three main components:

  1. The browser client — a chat interface that sits alongside a live preview of your app. It also runs browser-side tools (screenshots, DOM inspection, clicking elements).

  2. The Frontman server — receives your prompts, calls the LLM (Claude, GPT, Gemini, etc.), and orchestrates the agent loop.

  3. Your dev server plugin — a framework integration (Astro, Next.js, or Vite) that gives the agent access to your project files and component structure.

┌─────────────────────────────────────────────┐
│ Your Browser │
│ ┌────────────┐ ┌────────────────────┐ │
│ │ Chat UI │ │ Live Preview │ │
│ │ │ │ (your running app)│ │
│ └──────┬─────┘ └──────────┬─────────┘ │
└─────────┼───────────────────┼───────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Frontman Server │ │ Your Dev Server │
│ (agent loop, │ │ (file tools, │
│ LLM calls) │ │ project info) │
└────────┬─────────┘ └──────────────────┘
┌──────────────────┐
│ LLM Provider │
│ (Claude, GPT, │
│ Gemini, etc.) │
└──────────────────┘

When you type a message and hit send, the client packages it — text, images, and any annotations you’ve added — and sends it to the Frontman server over a WebSocket connection.

The server resolves which AI model and API key to use, checking in this order:

  1. OAuth connection — if you’ve linked your Anthropic or OpenAI account directly
  2. Your API key — a key you’ve saved in Frontman settings
  3. Environment key — a key from your project’s .env file
  4. Free tier — Frontman’s built-in key (limited to 10 runs/day)

See API Keys & Providers for setup details.

The server builds a context package — system prompt, available tools, conversation history — and submits it to the LLM. This begins the agent loop: a back-and-forth between the LLM and your browser that continues until the task is done.

On each turn, the LLM either:

  • Returns text — streamed to your chat in real time as it’s generated
  • Calls tools — requests actions like “take a screenshot” or “read this file”

Tools run in different places depending on what they do:

Tool typeWhere it runsExamples
Browser toolsIn your browser, against the live previewScreenshot, DOM inspection, clicking elements, navigating
Dev server toolsOn your dev server, via the framework pluginReading files, editing code, discovering project structure
Server toolsOn the Frontman serverTodo list management, plan tracking

The results are sent back to the LLM, which uses them to decide its next action.

Steps 4–5 repeat until the LLM determines the task is complete. A typical flow looks like this:

  1. Take a screenshot to see the current state
  2. Read the DOM to understand the page structure
  3. Read the relevant source file
  4. Edit the file
  5. Take another screenshot to verify the change
  6. Report back to you

The agent might loop 3–15 times depending on complexity. Simple text changes might take 3 steps. A multi-component layout rework might take 15.

The agent’s core workflow is a perception-action loop:

  1. See — take a screenshot of the live preview to understand the visual state
  2. Understand — inspect the DOM, find interactive elements, or search for text to map what’s visible to underlying structure
  3. Locate — identify the source file and line responsible for what needs to change
  4. Edit — modify the code with a targeted diff
  5. Verify — take another screenshot to confirm the change looks right

This is why Frontman can make precise visual changes that other AI coding tools struggle with — it has the same feedback loop a human developer uses: look at the page, find the code, change it, check the result.

When the LLM requests a tool that runs in the browser (like a screenshot), the server sends the request to your browser over the WebSocket. The browser executes it against the live preview iframe and returns the result.

For tools that need your dev server (like editing a file), the browser acts as a bridge — it receives the request from the server, forwards it to your dev server’s Frontman plugin over HTTP, and returns the result.

Agent → Server → Browser → Dev Server → Browser → Server → Agent

This relay architecture means the agent can access your files without the Frontman server needing direct access to your filesystem. Your code stays on your machine.

The agent has access to a rich set of tools. Here’s a summary — see Tool Capabilities for the full reference.

CapabilityWhat the agent gets
ScreenshotsA pixel-accurate capture of your running app
DOM treeA structured representation of the page with CSS selectors, component names, and text content
Interactive elementsAll buttons, links, inputs, and other clickable elements with their ARIA roles and names
Text searchFind any visible text on the page
File readingRead source files with line numbers
File editingMake targeted edits using fuzzy text matching
NavigationChange the URL in the preview
Device emulationSwitch between desktop, tablet, and mobile viewports
QuestionsPause and ask you for clarification when it’s unsure

Sometimes the agent needs more information before proceeding. When this happens, it uses the Question tool to pause the loop and show you a UI drawer with the question and suggested options.

The agent loop is literally paused — no LLM calls happen until you respond. Once you answer, your response is fed back to the LLM and the loop continues.

See The Question Flow for more detail.

For complex tasks, the agent creates a structured plan — a list of steps with statuses (pending, in progress, completed). This plan is visible in the chat UI so you can track progress.

The agent updates the plan as it works, marking items complete and adding new ones as it discovers subtasks. See Plans & Todo Lists.