Show HN: Route your prompts to the best LLM

📰 Hacker News · danlenton

Hey HN, we've just finished building a dynamic router for LLMs, which takes each prompt and sends it to the most appropriate model and provider. We'd love to know what you think! Here is a quick(ish) screen-recroding explaining how it works: https://youtu.be/ZpY6SIkBosE Best results when training a custom router on your own prompt data: https://youtu.be/9JYqNbIEac0 The router balances user preferences for quality, speed and cost. The end result is higher quality and faster LLM responses at lower cost. The quality for each candidate LLM is predicted ahead of time using a neural scoring function, which is a BERT-like architecture conditioned on the prompt and a latent representation of the LLM being scored. The different LLMs are queried across the batch dimension, with the neural scoring architecture taking a single latent representation of the LLM as input per forward pass. This makes the scoring function very modular to query for different LLM c

Published 22 May 2024

Read full article → ← Back to Reads