How the Agent Works
Frontman is an AI agent that sits between your browser and your source code. You describe a change in natural language, and it executes that change by looking at your running app, reading relevant files, and editing them — all without you leaving the browser.
This page explains what happens under the hood so you can work with the agent more effectively.
The three-part system
Section titled “The three-part system”Frontman has three main components:
-
The browser client — a chat interface that sits alongside a live preview of your app. It also runs browser-side tools (screenshots, DOM inspection, clicking elements).
-
The Frontman server — receives your prompts, calls the LLM (Claude, GPT, Gemini, etc.), and orchestrates the agent loop.
-
Your dev server plugin — a framework integration (Astro, Next.js, or Vite) that gives the agent access to your project files and component structure.
┌─────────────────────────────────────────────┐│ Your Browser ││ ┌────────────┐ ┌────────────────────┐ ││ │ Chat UI │ │ Live Preview │ ││ │ │ │ (your running app)│ ││ └──────┬─────┘ └──────────┬─────────┘ │└─────────┼───────────────────┼───────────────┘ │ │ ▼ ▼┌──────────────────┐ ┌──────────────────┐│ Frontman Server │ │ Your Dev Server ││ (agent loop, │ │ (file tools, ││ LLM calls) │ │ project info) │└────────┬─────────┘ └──────────────────┘ │ ▼┌──────────────────┐│ LLM Provider ││ (Claude, GPT, ││ Gemini, etc.) │└──────────────────┘What happens when you send a prompt
Section titled “What happens when you send a prompt”1. Your message reaches the server
Section titled “1. Your message reaches the server”When you type a message and hit send, the client packages it — text, images, and any annotations you’ve added — and sends it to the Frontman server over a WebSocket connection.
2. The server picks an LLM
Section titled “2. The server picks an LLM”The server resolves which AI model and API key to use, checking in this order:
- OAuth connection — if you’ve linked your Anthropic or OpenAI account directly
- Your API key — a key you’ve saved in Frontman settings
- Environment key — a key from your project’s
.envfile - Free tier — Frontman’s built-in key (limited to 10 runs/day)
See API Keys & Providers for setup details.
3. The agent loop starts
Section titled “3. The agent loop starts”The server builds a context package — system prompt, available tools, conversation history — and submits it to the LLM. This begins the agent loop: a back-and-forth between the LLM and your browser that continues until the task is done.
4. The LLM decides what to do
Section titled “4. The LLM decides what to do”On each turn, the LLM either:
- Returns text — streamed to your chat in real time as it’s generated
- Calls tools — requests actions like “take a screenshot” or “read this file”
5. Tools execute where they need to
Section titled “5. Tools execute where they need to”Tools run in different places depending on what they do:
| Tool type | Where it runs | Examples |
|---|---|---|
| Browser tools | In your browser, against the live preview | Screenshot, DOM inspection, clicking elements, navigating |
| Dev server tools | On your dev server, via the framework plugin | Reading files, editing code, discovering project structure |
| Server tools | On the Frontman server | Todo list management, plan tracking |
The results are sent back to the LLM, which uses them to decide its next action.
6. The loop repeats until done
Section titled “6. The loop repeats until done”Steps 4–5 repeat until the LLM determines the task is complete. A typical flow looks like this:
- Take a screenshot to see the current state
- Read the DOM to understand the page structure
- Read the relevant source file
- Edit the file
- Take another screenshot to verify the change
- Report back to you
The agent might loop 3–15 times depending on complexity. Simple text changes might take 3 steps. A multi-component layout rework might take 15.
The screenshot → read → edit cycle
Section titled “The screenshot → read → edit cycle”The agent’s core workflow is a perception-action loop:
- See — take a screenshot of the live preview to understand the visual state
- Understand — inspect the DOM, find interactive elements, or search for text to map what’s visible to underlying structure
- Locate — identify the source file and line responsible for what needs to change
- Edit — modify the code with a targeted diff
- Verify — take another screenshot to confirm the change looks right
This is why Frontman can make precise visual changes that other AI coding tools struggle with — it has the same feedback loop a human developer uses: look at the page, find the code, change it, check the result.
How tools get routed
Section titled “How tools get routed”When the LLM requests a tool that runs in the browser (like a screenshot), the server sends the request to your browser over the WebSocket. The browser executes it against the live preview iframe and returns the result.
For tools that need your dev server (like editing a file), the browser acts as a bridge — it receives the request from the server, forwards it to your dev server’s Frontman plugin over HTTP, and returns the result.
Agent → Server → Browser → Dev Server → Browser → Server → AgentThis relay architecture means the agent can access your files without the Frontman server needing direct access to your filesystem. Your code stays on your machine.
What the agent can see
Section titled “What the agent can see”The agent has access to a rich set of tools. Here’s a summary — see Tool Capabilities for the full reference.
| Capability | What the agent gets |
|---|---|
| Screenshots | A pixel-accurate capture of your running app |
| DOM tree | A structured representation of the page with CSS selectors, component names, and text content |
| Interactive elements | All buttons, links, inputs, and other clickable elements with their ARIA roles and names |
| Text search | Find any visible text on the page |
| File reading | Read source files with line numbers |
| File editing | Make targeted edits using fuzzy text matching |
| Navigation | Change the URL in the preview |
| Device emulation | Switch between desktop, tablet, and mobile viewports |
| Questions | Pause and ask you for clarification when it’s unsure |
The Question flow
Section titled “The Question flow”Sometimes the agent needs more information before proceeding. When this happens, it uses the Question tool to pause the loop and show you a UI drawer with the question and suggested options.
The agent loop is literally paused — no LLM calls happen until you respond. Once you answer, your response is fed back to the LLM and the loop continues.
See The Question Flow for more detail.
Plans and todo lists
Section titled “Plans and todo lists”For complex tasks, the agent creates a structured plan — a list of steps with statuses (pending, in progress, completed). This plan is visible in the chat UI so you can track progress.
The agent updates the plan as it works, marking items complete and adding new ones as it discovers subtasks. See Plans & Todo Lists.
Next steps
Section titled “Next steps”- Sending Prompts — how to write prompts that get good results
- Annotations — point at elements instead of describing them
- Tool Capabilities — full reference for every tool
- Architecture Overview — the full technical deep-dive