Scaling Meta's Multi-Agent Systems to a Billion Videos

MLOps.community · Advanced ·🤖 AI Agents & Automation ·1w ago
Aditya Gautam (Generative AI, Meta) breaks down how Meta tackles two of the hardest problems in short-form video at the scale of hundreds of millions to a billion reads per day: modality misalignment and original content theft. He's spent years on Facebook Reels integrity and recommendations, and in this talk he gets practical about why a single giant LLM is the wrong tool for the job, and what to build instead. This is the application-layer view of multi-agent systems. Not infra, not orchestration theory, just what actually works when your input is messy user-generated video and your budget is a fraction of a cent per inference. What you'll learn: - The two problems Meta is solving: Modality misalignment (text says one thing, video shows another, sometimes only for a few frames) and content theft in the copycat creator economy. - Why a single big LLM falls over: Cost at 100M+ daily inferences, modality bias in VLMs, and the inability to bring in external context like the rest of the video corpus. - The 3-agent architecture: Perceiver (signal acquisition, scene boundaries, VLM embeddings, OCR), Retriever (KNN over vector DBs, similarity matrices, creator-to-creator graphs), and Reasoner (chain-of-thought, confidence scores, can request more context). - Why small specialized models win: 3B to 11B fine-tuned LLMs per agent beat a 200B generalist on both quality and cost, and let each agent ship on its own CI/CD like a microservice. - The evaluation stack: Precision/recall/F1, isolated retrieval evaluation, reasoning quality via LLM-as-judge plus human-in-the-loop, hallucination rate, and full system efficiency logging at every hop. - Four real optimizations that drop cost 10x: Spatial and temporal frame merging (don't send 300 similar cloud frames to a VLM), semantic hashing of viral content, dynamic routing that skips reasoning for obvious copies, and metadata pruning based on creator reputation and topic safety. - The honest hard parts: Inference latency stacks a
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Indian Small Businesses Don’t Want AI.
Indian small businesses are hesitant to adopt AI, despite being active on WhatsApp Business, highlighting a gap in understanding their needs
Medium · AI
AWS DevOps Agent + CDK Python: Building a Real Agentic SRE From Scratch (No Splunk, No 3 Accounts)
Learn to build a real agentic SRE from scratch using AWS DevOps Agent and CDK Python, automating incident response without Splunk or multiple accounts
Medium · Python
Building the Future of Fintech & AI with BotSathi
Learn how BotSathi is revolutionizing fintech with AI, transforming the financial landscape in India
Medium · Startup
AiFinPay: The AiFinPay SDK offers a seamless and efficient w
Integrate AI-powered payment processing into your application with AiFinPay SDK for secure and reliable transactions
Dev.to AI
Up next
Your Agent Is an Infinite Canvas — RL Nabors, Dressed for Space
AI Engineer
Watch →