CompletedNode.jsTypeScriptExpress.js+3 more

Visual Regression Patch Agent

An agentic tool using Node.js, TypeScript, LangChain, and GPT-4o vision to diagnose frontend UI regressions from source code and screenshots.

Timeline

2 Weeks

Role

Full Stack Engineer

Team

Solo

Status

Completed

Live Demo Source Code

Technology Stack

Node.js

TypeScript

Express.js

LangChain

GPT-4o Vision

Next.js

Key Challenges

Handling complex multipart directory uploads
Optimizing the context window for the LLM
Prompt engineering for multimodal vision tasks

Key Learnings

Integrating LangChain with vision models
Parsing and filtering file streams with Multer
Structuring robust LLM JSON outputs

Visual Regression Patch Agent

Situation

UI visual regressions are notoriously frustrating to debug. A small CSS tweak can break layouts across an entire application, and tracing the root cause from a broken screenshot back to the exact line of source code is tedious and time-consuming. Developers needed an automated assistant that could "see" the visual bug and cross-reference it against the source code context to quickly pinpoint the issue.

Task

The objective was to build an agentic tool capable of diagnosing visual UI regressions. It needed to:

Accept a user-uploaded frontend project structure alongside a screenshot of the broken UI.
Filter and process relevant source files while ignoring build artifacts.
Use advanced vision-language models to analyze the discrepancy between the intended code and the actual visual output.
Output a structured root-cause explanation along with the corrected code.

Action

I developed a full-stack solution utilizing Next.js for the frontend workbench and Node.js/Express for the backend upload API.

1. Context Collection & Filtering

Built a multipart upload system using Multer to handle both deeply nested directory uploads and image files.
Implemented file filtering logic to exclusively extract relevant frontend source code (e.g., .tsx, .css) while stripping out unnecessary dependencies like node_modules or build folders to optimize the LLM context window.

2. Vision-Agent Integration

Integrated LangChain and OpenAI GPT-4o Vision to construct an agent capable of interpreting both textual code context and image-based visual state simultaneously.
Crafted a specialized prompt chain that guides the LLM to act as a frontend debugging expert, cross-referencing the uploaded screenshot against the provided source code.

3. Structured Analysis & Output

Engineered the LLM response to return a structured JSON payload containing a detailed explanation (root-cause analysis) and the fixedCode (the patched source file).
Designed a sleek Next.js frontend with a side-by-side analysis workbench to seamlessly present the AI's findings to the user.

Result

The Visual Regression Patch Agent successfully accelerates the UI debugging workflow.

Faster Debugging: By immediately highlighting the problematic code and providing a fix, it heavily reduces the time spent manually inspecting DOM elements and CSS rules.
Agentic Workflow: Showcases how multimodal LLMs can act as highly specialized assistants when provided with the correct, filtered context.
Clean Architecture: The separation of the Express API and Next.js frontend ensures that heavy multipart parsing and LangChain processing don't block or slow down the UI.

Previous Project

Production RAG System with Evaluation Layer

Next Project

Draftly