Write For Us

RLHF for finer alignment with Gemma 3

E-Commerce Solutions SEO Solutions Marketing Solutions
34 Views
Published
How to best align a model for human interaction? In RLHF we first learn a proxy for the human preferences: the reward model (RM), that is later used to align a language policy. Yet the RM is an imperfect approximator of human preferences, and its prolonged usage leads inevitably to reward hacking. In this presentation we’ll review techniques developed in Gemma to mitigate this hacking and allow for longer training.

Subscribe to Google for Developers → https://goo.gle/developers

Speakers: Louis Rouillard
Products Mentioned: Gemma
Category
Project
Tags
Google, developers, pr_pr: Gemma;
Sign in or sign up to post comments.
Be the first to comment