TRIM-QA

Table QA retrieval pipeline that improved recall from 84.67% to 96.67%

Technologies Used

PythonBM25SBERTNLPInformation Retrieval

Project Overview

TRIM-QA is a table question-answering retrieval pipeline built to improve how language models handle structured data. The system combines BM25 retrieval, table pruning, and semantic reranking so models can focus on the most relevant rows and columns instead of processing noisy table context.

Challenges

Traditional retrieval pipelines often return large tables with too much irrelevant information, which can distract downstream models and weaken answer quality on structured-data questions.

Solution

The project uses stronger tokenization for BM25, hierarchical row and column chunking, semantic pruning with SBERT, and reranking to minimize irrelevant context before question answering.

Impact & Results

Improved retrieval recall from 84.67% to 96.67% and strengthened top-ranked results on the NQ-Tables benchmark.