Back to Projects
Production RAG System with Evaluation Layer
CompletedNode.jsTypeScriptPostgreSQL+4 more

Production RAG System with Evaluation Layer

A hybrid retrieval architecture combining vector similarity, BM25 ranking, and HNSW indexing via pgvector for high-precision document retrieval.

Timeline

1 Month

Role

Full Stack Engineer

Team

Solo

Status
Completed

Technology Stack

Node.js
TypeScript
PostgreSQL
Supabase
OpenAI
Next.js
Turborepo

Key Challenges

  • Optimizing hybrid search with pgvector
  • Streaming answers in real-time
  • Evaluating LLM faithfulness
  • Monorepo configuration

Key Learnings

  • Vector database design with pgvector
  • LLM-as-a-Judge evaluation techniques
  • Turborepo workspace management
  • Next.js App Router streaming APIs

Production RAG System with Evaluation Layer

Situation

As AI applications scale, hallucination and retrieval inaccuracy become critical bottlenecks. Relying purely on semantic search often misses exact keyword matches, and without an automated way to evaluate the generated answers, systems can silently degrade over time. The need was for a robust Retrieval-Augmented Generation (RAG) architecture that not only retrieved accurate context but also constantly monitored its own performance.

Task

The goal was to design and implement a complete RAG pipeline from scratch. It needed to:

  • Ingest large documents and store them efficiently.
  • Retrieve context using a hybrid approach (combining semantic and keyword search).
  • Provide a streaming web interface for real-time interaction.
  • Include an automated evaluation layer to score the faithfulness of the LLM's responses against a golden dataset.

Action

I engineered the solution as a Turborepo monorepo to cleanly separate the core RAG engine, utility scripts, and the web application.

1. Document Ingestion & Storage

  • Built a data ingestion pipeline to chunk large documents into smaller pieces.
  • Generated vector embeddings using OpenAI's text-embedding-3-small.
  • Stored chunks and embeddings in Supabase using PostgreSQL + pgvector, managed by Drizzle ORM for type-safe database operations.

2. Hybrid Retrieval Engine

  • Implemented Hybrid Search that seamlessly combines dense semantic search (pgvector cosine similarity with HNSW indexing) with sparse keyword matching (TF-IDF BM25) for best-of-both retrieval quality.

3. Generation & Streaming API

  • Retrieved top relevant chunks and prompted GPT-4o-mini to generate context-grounded answers.
  • Built a Next.js App Router endpoint to stream answers back to the client in real-time, prepending citation metadata so users know exactly where the information came from.

4. Automated Evaluation Pipeline

  • Created an LLM-as-a-Judge pipeline that runs automated evaluations on the system's outputs.
  • Scored answer faithfulness on a scale of 0 to 1 against a curated golden Q&A dataset, persisting these metrics back to Supabase to actively monitor response drift over time.

Result

The outcome is a highly modular, production-ready RAG system that is transparent about its performance.

  • Enhanced Reliability: Hybrid search significantly improved retrieval precision compared to pure semantic search.
  • Seamless User Experience: The Next.js streaming API ensures the web UI feels instantaneous, just like ChatGPT.
  • Clean Maintainability: The monorepo structure allows the core RAG logic (packages/rag-core) to be swapped, tested, or upgraded without touching the front-end web application.
  • Built-in Quality Assurance: The dedicated evaluation layer guarantees that hallucination rates are measured on every run, turning a manual debugging step into a continuous integration check.

Design & Developed by saikatD
© 2026. All rights reserved.