CompletedNode.jsTypeScriptPostgreSQL+4 more

Production RAG System with Evaluation Layer

A hybrid retrieval architecture combining vector similarity, BM25 ranking, and HNSW indexing via pgvector for high-precision document retrieval.

Timeline

1 Month

Role

Full Stack Engineer

Team

Solo

Status

Completed

Live Demo Source Code

Technology Stack

Node.js

TypeScript

PostgreSQL

Supabase

OpenAI

Next.js

Turborepo

Key Challenges

Optimizing hybrid search with pgvector
Streaming answers in real-time
Evaluating LLM faithfulness
Monorepo configuration

Key Learnings

Vector database design with pgvector
LLM-as-a-Judge evaluation techniques
Turborepo workspace management
Next.js App Router streaming APIs

Production RAG System with Evaluation Layer

Situation

As AI applications scale, hallucination and retrieval inaccuracy become critical bottlenecks. Relying purely on semantic search often misses exact keyword matches, and without an automated way to evaluate the generated answers, systems can silently degrade over time. The need was for a robust Retrieval-Augmented Generation (RAG) architecture that not only retrieved accurate context but also constantly monitored its own performance.

Task

The goal was to design and implement a complete RAG pipeline from scratch. It needed to:

Ingest large documents and store them efficiently.
Retrieve context using a hybrid approach (combining semantic and keyword search).
Provide a streaming web interface for real-time interaction.
Include an automated evaluation layer to score the faithfulness of the LLM's responses against a golden dataset.

Action

I engineered the solution as a Turborepo monorepo to cleanly separate the core RAG engine, utility scripts, and the web application.

1. Document Ingestion & Storage

Built a data ingestion pipeline to chunk large documents into smaller pieces.
Generated vector embeddings using OpenAI's text-embedding-3-small.
Stored chunks and embeddings in Supabase using PostgreSQL + pgvector, managed by Drizzle ORM for type-safe database operations.

2. Hybrid Retrieval Engine

Implemented Hybrid Search that seamlessly combines dense semantic search (pgvector cosine similarity with HNSW indexing) with sparse keyword matching (TF-IDF BM25) for best-of-both retrieval quality.

3. Generation & Streaming API

Retrieved top relevant chunks and prompted GPT-4o-mini to generate context-grounded answers.
Built a Next.js App Router endpoint to stream answers back to the client in real-time, prepending citation metadata so users know exactly where the information came from.

4. Automated Evaluation Pipeline

Created an LLM-as-a-Judge pipeline that runs automated evaluations on the system's outputs.
Scored answer faithfulness on a scale of 0 to 1 against a curated golden Q&A dataset, persisting these metrics back to Supabase to actively monitor response drift over time.

Result

The outcome is a highly modular, production-ready RAG system that is transparent about its performance.

Enhanced Reliability: Hybrid search significantly improved retrieval precision compared to pure semantic search.
Seamless User Experience: The Next.js streaming API ensures the web UI feels instantaneous, just like ChatGPT.
Clean Maintainability: The monorepo structure allows the core RAG logic (packages/rag-core) to be swapped, tested, or upgraded without touching the front-end web application.
Built-in Quality Assurance: The dedicated evaluation layer guarantees that hallucination rates are measured on every run, turning a manual debugging step into a continuous integration check.

Next Project

Visual Regression Patch Agent