The Small Model Infrastructure Nobody Built (So We Did) — Filip Makraduli, Superlinked

AI Engineer · Intermediate ·🔍 RAG & Vector Search ·2w ago
Most embedding infrastructure assumes you know exactly which model you want ahead of time. This talk starts where that assumption breaks. Filip Makraduli walks through the real profiling mistakes, infrastructure gaps, and production constraints that led to building an embedding inference engine designed for dynamic model loading, hot-swapping, and memory-aware eviction instead of brittle one-model-per-container deployments. If you're working on small-model inference, embeddings, or GPU infrastructure, this is a practical look at what breaks in the real world and how to design around it. Speaker info: - https://www.linkedin.com/in/filipmakraduli/
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Zero-Trust RAG: Defeating the Shared Private Link Deadlock in Azure Terraform
Learn to overcome the shared private link deadlock in Azure Terraform using Zero-Trust RAG
Dev.to · david
Choosing the Right RAG Strategy A Complete Decision Guide to Chunking, Agentic RAG, and GraphRAG
Learn how to choose the right RAG strategy for your pipeline, including chunking, agentic RAG, and GraphRAG, to improve performance and efficiency
Dev.to · Seenivasa Ramadurai
The simplest self-hosted RAG you'll ever set up (Apache 2.0, 20K stars)
Set up a simple self-hosted RAG with MaxKB, balancing simplicity and ease of use
Dev.to · retrovirusretro
Tencent just released a RAG framework and nobody's talking about it
Tencent's WeChat team releases WeKnora, a RAG framework, as open source, which can be utilized for various applications
Dev.to · retrovirusretro
Up next
Watch this before applying for jobs as a developer.
Tech With Tim
Watch →