OCR vs. Image Embeddings for PDF RAG: Which One is Better?
Skills:
RAG Basics90%
My colleagues at Weaviate released IRPAPERS, a benchmark comparing ๐ถ๐บ๐ฎ๐ด๐ฒ-๐ฏ๐ฎ๐๐ฒ๐ฑ and ๐๐ฒ๐
๐-๐ฏ๐ฎ๐๐ฒ๐ฑ retrieval over 3,230 pages from 166 scientific papers.
The setup: Take the same PDFs and process them two ways. For text, run OCR with GPT-4.1 and embed with Arctic 2.0 + BM25 hybrid search. For images, embed raw page images with ColModernVBERT multi-vector embeddings. Test both on 180 needle-in-the-haystack questions.
๐ง๐ต๐ฒ ๐ฟ๐ฒ๐๐๐น๐๐:
Text edges out images at the top rank: 46% vs 43% Recall@1
But images match or exceed text at deeper recall: 93% vs 91% Recall@20
But text and image based methods actually fail on ๐ฅ๐ช๐ง๐ง๐ฆ๐ณ๐ฆ๐ฏ๐ ๐ฒ๐ถ๐ฆ๐ณ๐ช๐ฆ๐ด.
At Recall@1:
โข 22 queries succeed with text but fail with images
โข 18 queries succeed with images but fail with text
This complementarity is what makes ๐ ๐๐น๐๐ถ๐บ๐ผ๐ฑ๐ฎ๐น ๐๐๐ฏ๐ฟ๐ถ๐ฑ ๐ฆ๐ฒ๐ฎ๐ฟ๐ฐ๐ต work. By fusing scores from both text and image retrieval, they achieved:
โข 49% Recall@1 (beating either modality alone)
โข 81% Recall@5
โข 95% Recall@20
00:00 - Intro
00:08 - Visual- vs Text-based methods
01:04 - The IRPapers dataset
01:59 - The 6 different search strategies
03:43 - The results
04:30 - The paper's most interesting finding...
05:11 - Conclusion
Watch on YouTube โ
(saves to browser)
Sign in to unlock AI tutor explanation ยท โก30
More on: RAG Basics
View skill โRelated AI Lessons
โก
โก
โก
โก
Zero-Trust RAG: Defeating the Shared Private Link Deadlock in Azure Terraform
Dev.to ยท david
Choosing the Right RAG Strategy A Complete Decision Guide to Chunking, Agentic RAG, and GraphRAG
Dev.to ยท Seenivasa Ramadurai
The simplest self-hosted RAG you'll ever set up (Apache 2.0, 20K stars)
Dev.to ยท retrovirusretro
Tencent just released a RAG framework and nobody's talking about it
Dev.to ยท retrovirusretro
Chapters (7)
Intro
0:08
Visual- vs Text-based methods
1:04
The IRPapers dataset
1:59
The 6 different search strategies
3:43
The results
4:30
The paper's most interesting finding...
5:11
Conclusion
๐
Tutor Explanation
DeepCamp AI