Shuo Li Liu - Coherence in RLHF Preference Data

Cohere · Advanced ·📄 Research Papers Explained ·1w ago
RLHF usually learn from pairwise comparisons, often through Bradley-Terry-style models. I will discuss what coherence requirements, such as Weak Stochastic Transitivity and the Weak Axiom of Revealed Preference, mean for preference trained AI systems. Shuo Li Liu is a PhD student in Economics at Princeton University. His work connects axiomatic decision theory and AI alignment, with current projects on stochastic choice, preference learning, and the foundations of RLHF evaluation. This session is brought to you by the Cohere Labs Open Science Community - a space where ML researchers, engineers, linguists, social scientists, and lifelong learners connect and collaborate with each other. We'd like to extend a special thank you to Katrina Lawrence and Neel Ghoshal, Leads of our ML Math group for their dedication in organizing this event. If you’re interested in sharing your work, we welcome you to join us! Simply fill out the form at https://forms.gle/ALND9i6KouEEpCnz6 to express your interest in becoming a speaker. Join the Cohere Labs Open Science Community to see a full list of upcoming events (https://tinyurl.com/CohereLabsCommunityApp).
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Learn to manage research paper tabs efficiently and apply meta-research techniques to improve productivity
Dev.to AI
How to Set Up a Karpathy-Style Wiki for Your Research Field
Learn to set up a Karpathy-style wiki for your research field to organize and share knowledge effectively
Medium · AI
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
Scientific knowledge may be stuck in a local minimum, hindering optimal progress, and understanding this concept is crucial for advancing research
ArXiv cs.AI
How Archimedes Started: A Research Tool I Built for Myself
Learn how Archimedes started as a personal research tool to streamline the research process and reduce inefficiencies
Dev.to AI
Up next
Meta Ads New Targeting Feature Explained (No More Manual Interest Selection)
D2C Decoded By Nikhil
Watch →