
Visual Regression Patch Agent
An agentic tool using Node.js, TypeScript, LangChain, and GPT-4o vision to diagnose frontend UI regressions from source code and screenshots.
Timeline
2 Weeks
Role
Full Stack Engineer
Team
Solo
Status
CompletedTechnology Stack
Key Challenges
- Handling complex multipart directory uploads
- Optimizing the context window for the LLM
- Prompt engineering for multimodal vision tasks
Key Learnings
- Integrating LangChain with vision models
- Parsing and filtering file streams with Multer
- Structuring robust LLM JSON outputs
Visual Regression Patch Agent
Situation
UI visual regressions are notoriously frustrating to debug. A small CSS tweak can break layouts across an entire application, and tracing the root cause from a broken screenshot back to the exact line of source code is tedious and time-consuming. Developers needed an automated assistant that could "see" the visual bug and cross-reference it against the source code context to quickly pinpoint the issue.
Task
The objective was to build an agentic tool capable of diagnosing visual UI regressions. It needed to:
- Accept a user-uploaded frontend project structure alongside a screenshot of the broken UI.
- Filter and process relevant source files while ignoring build artifacts.
- Use advanced vision-language models to analyze the discrepancy between the intended code and the actual visual output.
- Output a structured root-cause explanation along with the corrected code.
Action
I developed a full-stack solution utilizing Next.js for the frontend workbench and Node.js/Express for the backend upload API.
1. Context Collection & Filtering
- Built a multipart upload system using Multer to handle both deeply nested directory uploads and image files.
- Implemented file filtering logic to exclusively extract relevant frontend source code (e.g.,
.tsx,.css) while stripping out unnecessary dependencies likenode_modulesor build folders to optimize the LLM context window.
2. Vision-Agent Integration
- Integrated LangChain and OpenAI GPT-4o Vision to construct an agent capable of interpreting both textual code context and image-based visual state simultaneously.
- Crafted a specialized prompt chain that guides the LLM to act as a frontend debugging expert, cross-referencing the uploaded screenshot against the provided source code.
3. Structured Analysis & Output
- Engineered the LLM response to return a structured JSON payload containing a detailed
explanation(root-cause analysis) and thefixedCode(the patched source file). - Designed a sleek Next.js frontend with a side-by-side analysis workbench to seamlessly present the AI's findings to the user.
Result
The Visual Regression Patch Agent successfully accelerates the UI debugging workflow.
- Faster Debugging: By immediately highlighting the problematic code and providing a fix, it heavily reduces the time spent manually inspecting DOM elements and CSS rules.
- Agentic Workflow: Showcases how multimodal LLMs can act as highly specialized assistants when provided with the correct, filtered context.
- Clean Architecture: The separation of the Express API and Next.js frontend ensures that heavy multipart parsing and LangChain processing don't block or slow down the UI.
