Direct Preference Optimization - Your Language Model is Secretly a Reward Model
Direct Preference Optimization - Your Language Model is Secretly a Reward Model
0 comments
Direct Preference Optimization - Your Language Model is Secretly a Reward Model