Retrieval Optimization: Tokenization to Vector Quantization

External: Coursera Courses ↗ · Coursera

Open Course on External: Coursera

Free to audit · Opens on External: Coursera

Retrieval Optimization: Tokenization to Vector Quantization

Coursera · Advanced ·🔍 RAG & Vector Search ·3mo ago

Skills: RAG Basics90%

Key Takeaways

Optimizing retrieval with tokenization and vector quantization for RAG applications

Original Description

In Retrieval Optimization: Tokenization to Vector Quantization, taught by Kacper Łukawski, Developer Relations Lead of Qdrant, you’ll learn all about tokenization and also how to optimize vector search in your large-scale customer-facing RAG applications. You’ll explore the technical details of how vector search works and how to optimize it for better performance. This course focuses on optimizing the first step in your RAG and search results. You’ll see how different tokenization techniques like Byte-Pair Encoding, WordPiece, and Unigram work and how they affect search relevancy. You’ll also learn how to address common challenges such as terminology mismatches and truncated chunks in embedding models. To optimize your search, you need to be able to measure its quality. You will learn several quality metrics for this purpose. Most vector databases use Hierarchical Navigable Small Worlds (HNSW) for approximate nearest-neighbor search. You’ll see how to balance the HNSW parameters for higher speed and maximum relevance. Finally, you would use different vector quantization techniques to enhance memory usage and search speed. What you’ll do, in detail: 1. Learn about the internal workings of the embedding model and how your text is turned into vectors. 2. Understand how several tokenizers such as Byte-Pair Encoding, WordPiece, Unigram, and SentencePiece are trained. 3. Explore common challenges with tokenizers such as unknown tokens, domain-specific identifiers, and numerical values, that can negatively affect your vector search. 4. Understand how to measure the quality of your search across several quality metrics. 5. Understand how the main parameters in HNSW algorithms affect the relevance and speed of vector search and how to optimally adjust these parameters. 6. Experiment with the three major quantization methods, product, scalar, and binary, and learn how they impact memory requirements, search quality, and speed. By the end of this course, you’l

Watch on External: Coursera ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: RAG Basics

View skill →

High Performance (Realtime) RAG Chains: From Basic to Advanced

High Performance (Realtime) RAG Chains: From Basic to Advanced

Coding the Ultimate RAG Engine from Zero

Coding the Ultimate RAG Engine from Zero

Building Agentic RAG From Scratch in Pure Python

Building Agentic RAG From Scratch in Pure Python

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

I Built a RAG App to Decode Airline Bureaucracy (So You Don't Have To)

I Built a RAG App to Decode Airline Bureaucracy (So You Don't Have To)

Akamai Developers

RAG Demo for Beginners: Full Hands-On Tutorial in Tamil | Build Your Own RAG AI | Karthik's Show

RAG Demo for Beginners: Full Hands-On Tutorial in Tamil | Build Your Own RAG AI | Karthik's Show

Related Reads

A RAG evaluator that admits what it can't judge

Learn how to build a reliable RAG evaluator that acknowledges its limitations, a crucial aspect of AI safety and robustness

Dev.to · Melissa D. Ellison

RAG on Google Cloud in Regulated Environments: A Lifecycle Playbook from Inception to…

Learn to implement RAG on Google Cloud in regulated environments with a lifecycle playbook

Medium · Machine Learning

Solving One of the Hardest Problems in Code RAG: Context Retrieval

Learn to solve context retrieval in code RAG systems, a crucial challenge in automation code generation, and improve your skills in RAG and code analysis.

Practical RAG, Part 1: The Simplest RAG That Actually Works

Learn to build a simple Retrieval-Augmented Generation pipeline from scratch in Python and understand its limitations

Dev.to · Suman Nath

RRF vs DBSF with Qdrant: Hybrid Retrieval Fusion for RAG in Python

Professor Py: AI Engineering