Back to Projects
Visual Regression Patch Agent
CompletedNode.jsTypeScriptExpress.js+3 more

Visual Regression Patch Agent

An agentic tool using Node.js, TypeScript, LangChain, and GPT-4o vision to diagnose frontend UI regressions from source code and screenshots.

Timeline

2 Weeks

Role

Full Stack Engineer

Team

Solo

Status
Completed

Technology Stack

Node.js
TypeScript
Express.js
LangChain
GPT-4o Vision
Next.js

Key Challenges

  • Handling complex multipart directory uploads
  • Optimizing the context window for the LLM
  • Prompt engineering for multimodal vision tasks

Key Learnings

  • Integrating LangChain with vision models
  • Parsing and filtering file streams with Multer
  • Structuring robust LLM JSON outputs

Visual Regression Patch Agent

Situation

UI visual regressions are notoriously frustrating to debug. A small CSS tweak can break layouts across an entire application, and tracing the root cause from a broken screenshot back to the exact line of source code is tedious and time-consuming. Developers needed an automated assistant that could "see" the visual bug and cross-reference it against the source code context to quickly pinpoint the issue.

Task

The objective was to build an agentic tool capable of diagnosing visual UI regressions. It needed to:

  • Accept a user-uploaded frontend project structure alongside a screenshot of the broken UI.
  • Filter and process relevant source files while ignoring build artifacts.
  • Use advanced vision-language models to analyze the discrepancy between the intended code and the actual visual output.
  • Output a structured root-cause explanation along with the corrected code.

Action

I developed a full-stack solution utilizing Next.js for the frontend workbench and Node.js/Express for the backend upload API.

1. Context Collection & Filtering

  • Built a multipart upload system using Multer to handle both deeply nested directory uploads and image files.
  • Implemented file filtering logic to exclusively extract relevant frontend source code (e.g., .tsx, .css) while stripping out unnecessary dependencies like node_modules or build folders to optimize the LLM context window.

2. Vision-Agent Integration

  • Integrated LangChain and OpenAI GPT-4o Vision to construct an agent capable of interpreting both textual code context and image-based visual state simultaneously.
  • Crafted a specialized prompt chain that guides the LLM to act as a frontend debugging expert, cross-referencing the uploaded screenshot against the provided source code.

3. Structured Analysis & Output

  • Engineered the LLM response to return a structured JSON payload containing a detailed explanation (root-cause analysis) and the fixedCode (the patched source file).
  • Designed a sleek Next.js frontend with a side-by-side analysis workbench to seamlessly present the AI's findings to the user.

Result

The Visual Regression Patch Agent successfully accelerates the UI debugging workflow.

  • Faster Debugging: By immediately highlighting the problematic code and providing a fix, it heavily reduces the time spent manually inspecting DOM elements and CSS rules.
  • Agentic Workflow: Showcases how multimodal LLMs can act as highly specialized assistants when provided with the correct, filtered context.
  • Clean Architecture: The separation of the Express API and Next.js frontend ensures that heavy multipart parsing and LangChain processing don't block or slow down the UI.

Design & Developed by saikatD
© 2026. All rights reserved.