Show HN: Ranked Search for Semi-Structured Data

📰 Hacker News · alrudolph

We’ve been working on a search problem that requires querying both text and numbers simultaneously. For example, in a dataset of clothing items with descriptions and prices, a search for “slim pants for $20” should prioritize skinny jeans for $25 over slim pants for $50 because they are semantically similar and the price is closer. I’ve found that standard embedding models struggle with numerical ordering, while text-to-SQL methods rely on exact matches and often filter out too many results. To solve this, we built a system designed specifically for structured datasets like CSVs or tables. Here’s a demo link where you can upload a small CSV to try out (no login required): https://demo.tryvoker.com . Unlike most RAG approaches, we process each column independently, handling text with embeddings and numbers with custom scoring. When a user submits a query, we parse it into relevant fields—for instance, extracting “slim pants” as the description and “20” as the price. We then co

Published 27 Feb 2025
Read full article → ← Back to Reads